4

My assignment is to fix Larry Wall's original patch code so that it compiles in ANSI C and the debug it. However, I don't really understand what the code in the savestr function does well enough to fix it.

char *
savestr(s)
register char *s;
{
    register char  *rv,
                   *t;

    t = s;
    while (*t++)
    rv = malloc((MEM) (t - s));
    if (rv == NULL)
        fatal ("patch: out of memory (savestr)\n");
    t = rv;
    while (*t++ = *s++);
    return rv;
}

My understanding is that savestr takes a string argument and returns a string. The function creates two strings, rv and t. and then assigns t to be s. After that line, I'm completely lost. Can someone please explain to me what the code is doing? I especially don't understand how the while loops are moving along, and I have no idea what (t-s) does.

Thank you for any guidance you may have for me.

  • 4
    Accept that the indentation is correct. Imagine a `;` on the end of the `while()`... Think it through. (Or, is the missing `;` just a typo in your question? If so, please correct that.) – Fe2O3 Feb 26 '23 at 05:52
  • The first one searches for a null, the second one makes a copy. The whole function is essentially `strcpy`. – Ouroborus Feb 26 '23 at 05:57
  • 2
    @Ouroborus More like `strdup`. – Retired Ninja Feb 26 '23 at 06:02
  • "The function creates two strings, rv and t." This is a fundamental misunderstanding. The function declares two variables of type `char *` named `rv` and `t`. It creates one string. `rv` holds the address of the first character of that string, and `t` is used to iterate through the string. But the word "string" is being abused here, since `t` is being used to write into memory, which does not contain a string until a `'\0'` char is written to it through `t`. – William Pursell Feb 26 '23 at 14:05

3 Answers3

4
char *
savestr(s)
register char *s;

This is an old K&R style declaration/definition.


The register keyword too, is completely obsolete these days. It was intended as an optimization hint to the compiler. But it is no longer required, except in some rare cases where compiler optimizations are not possible.


t = s;
while (*t++);
rv = malloc((MEM) (t - s));

t is assigned the value of s, and is incremented until it reaches the '\0' byte. t - s then gives the difference between the two pointers, which is equivalent to strlen (s) + 1, and then allocates memory for rv, much like strdup().


t = rv;

This assigns the pointer value of rv to t, such that they both now point to the same memory location. At the end of the function, only rv will be pointing to this location.

while (*t++ = *s++);

gets parsed as:

while ((*(t++) = *(s++)) != '\0');

In an expression, the post increment operator returns the old value of its operand. So this first copies the character pointed to by s to the character pointed to by t, and then increment both the pointers to point to the their next respective characters. In a loop, this copies the contents of the memory region pointed to by s to the memory region pointed to by t.


The mistake is pretty obvious, and has already been pointed out. So I will leave it for OP to figure out.


The modern version of savestr looks like this:

char *
savebuf (char const *s, size_t size)
{
  char *rv;

  if (! size)
    return NULL;

  rv = malloc (size);

  if (! rv)
    {
      if (! using_plan_a)
    xalloc_die ();
    }
  else
    memcpy (rv, s, size);

  return rv;
}

char *
savestr (char const *s)
{
  return savebuf (s, strlen (s) + 1);
}

void
xalloc_die (void)
{
  fatal ("out of memory");
}

Notice the ANSI style function declaration/definition and the lack of register keyword. The length is passed by the caller, and the second while loop has been replaced with memcpy(). The macro MEM, which originally evaluated to (unsigned), and the pointer t have been eliminated.

Harith
  • 4,663
  • 1
  • 5
  • 20
  • `t - s` is not the length of the string, it is 1 more than that, so it is indeed the correct size for a copy of the null terminated string. The only mistakes are the missing empty statement and the dubious cast – chqrlie Feb 26 '23 at 14:05
  • @chqrlie Indeed. I reformulated that part. – Harith Feb 26 '23 at 14:13
  • 1
    @Harris: I would also rewrite the `while` loop as `while ((*(t++) = *(s++)) != '\0') { }` for clarity. – chqrlie Feb 26 '23 at 14:15
2

t - s is size (difference between two pointers) and with the statements t = s and while(*t++) implement strlen(s) + 1.

while (*t++ = *s++) copies the value * pointed to by s to the string pointed to by s, and increment both pointers. Strings are terminated by a \0, which is false, so loop will terminate when the \0 has been copied.

All in this implements strdup().

The register keyword probably doesn't do anything on a current compiler. Also, I don't think the (MEM) cast from ptrdiff_t, which is a "signed integer type" (6.5.5), to size_t that malloc() expects is required.

Allan Wind
  • 23,068
  • 5
  • 28
  • 38
  • The cast quiets select warnings about conversion for a _signed_ to _unsigned_ and losing sign information. – chux - Reinstate Monica Feb 26 '23 at 07:24
  • `ptrdiff_t` being a *signed* type, might not be able to represent all the values of a `size_t`, which is an *unsigned* type. Hence the cast. Although `MEM` in the original code, evaluated to `unsigned`. Perhaps that's what `malloc()` used to take in the old ages. – Harith Feb 26 '23 at 14:18
  • Indeed if the string length is greater than `SIZE_MAX/2`, does `t-s` have defined behavior? Is it negative? Is the conversion to size_t guaranteed to produce the expected value? Such memory blocks are unlikely to ever occur on 64-bit systems, but they are possible on 32-bit and were common place on segmented 16-bit systems. – chqrlie Feb 26 '23 at 17:31
1

t and s are locations in memory. t-s computes how many bytes are between the pointers t and s themselves. The while loop walks through whatever string is pointed to by t, one character at a time (slowly moving t one byte down until t is finally pointing at the null character - which is the end of a cstring).

The second while loop walks through s and t and assigns the value pointed to by s to memory pointed to by t until they are null. When the characters are the null character (\0 == 0), the loop stops because 0 is false.

This isn't a complete walkthrough, but feel free to comment with more questions as you process this and I'll respond.

Shaun Ramsey
  • 562
  • 4
  • 14
  • So, the first while loop walks through t one character at a time and allocates memory to rv. After it reaches the null character, it assigns rv to t if rv is not NULL (i'm assuming this makes sure that t has enough memory). Then it copies s to t one character at a time with the second while statement. Then it returns rv? I don't see how rv contains anything in the original string s? – eternallymisty Feb 26 '23 at 06:17
  • So does this while loop have a semi colon missing? while (*t++); If it does, then that loop would just find the end of the t string. rv is then allocated as the same size as the string s, because t-s will be the number of elements of s. This line: t = rv; That's the one that says let t point to the same place as rv. So when you muck with t after that point, you're mucking with the same memory that rv points to. But, you can change t (next loop) without losing the start of the string, because its in rv. So t does all the copying and rv remembers the beginning. – Shaun Ramsey Feb 26 '23 at 06:24
  • So I looked at the master branch the code came from, and the semicolon is not in the master branch, but the semicolon is indeed in Larry Wall's original code that is published on the internet. Thank you for explaining it so well. – eternallymisty Feb 26 '23 at 06:30
  • Ahh, well if you were looking for an error, perhaps you have found the culprit! Good luck. – Shaun Ramsey Feb 26 '23 at 06:35