5

gcc 4.4.4 c89

My program does a lot of string coping. I don't want to use the strncpy as it doesn't nul terminate. And I can't use strlcpy as its not portable.

Just a few questions. How can I put my function through its paces to ensure that it is completely safe and stable. Unit testing?

Is this good enough for production?

size_t s_strlcpy(char *dest, const char *src, const size_t len)
{
    size_t i = 0;

    /* Always copy 1 less then the destination to make room for the nul */
    for(i = 0; i < len - 1; i++)
    {
        /* only copy up to the first nul is reached */
        if(*src != '\0') {
            *dest++ = *src++;
        }
        else {
            break;
        }
    }

    /* nul terminate the string */
    *dest = '\0';

    /* Return the number of bytes copied */
    return i;
}

Many thanks for any suggestions,

Bryan Oakley
  • 370,779
  • 53
  • 539
  • 685
ant2009
  • 27,094
  • 154
  • 411
  • 609

10 Answers10

13

Although you could simply use another strlcpy function as another post recommends, or use snprintf(dest, len, "%s", src) (which always terminates the buffer), here are the things I noticed looking at your code:

size_t s_strlcpy(char *dest, const char *src, const size_t len)
{
    size_t i = 0;

No need to make len const here, but it can be helpful since it checks to make sure you didn't modify it.

    /* Always copy 1 less then the destination to make room for the nul */
    for(i = 0; i < len - 1; i++)
    {

Oops. What if len is 0? size_t is usually unsigned, so (size_t)0 - 1 will end up becoming something like 4294967295, causing your routine to careen through your program's memory and crash into an unmapped page.

        /* only copy up to the first nul is reached */
        if(*src != '\0') {
            *dest++ = *src++;
        }
        else {
            break;
        }
    }

    /* nul terminate the string */
    *dest = '\0';

The above code looks fine to me.

    /* Return the number of bytes copied */
    return i;
}

According to Wikipedia, strlcpy returns strlen(src) (the actual length of the string), not the number of bytes copied. Hence, you need to keep counting the characters in src until you hit '\0', even if it exceeds len.

Also, if your for loop terminates on the len - 1 condition, your function will return len-1, not len like you'd expect it to.


When I write functions like this, I usually prefer to use a start pointer (call it S) and end pointer (call it E). S points to the first character, while E points to one character after the last character (which makes it so E - S is the length of the string). Although this technique may seem ugly and obscure, I've found it to be fairly robust.

Here's an over-commented version of how I would write strlcpy:

size_t s_strlcpy(char *dest, const char *src, size_t len)
{
    char *d = dest;
    char *e = dest + len; /* end of destination buffer */
    const char *s = src;

    /* Insert characters into the destination buffer
       until we reach the end of the source string
       or the end of the destination buffer, whichever
       comes first. */
    while (*s != '\0' && d < e)
        *d++ = *s++;

    /* Terminate the destination buffer, being wary of the fact
       that len might be zero. */
    if (d < e)        // If the destination buffer still has room.
        *d = 0;
    else if (len > 0) // We ran out of room, so zero out the last char
                      // (if the destination buffer has any items at all).
        d[-1] = 0;

    /* Advance to the end of the source string. */
    while (*s != '\0')
        s++;

    /* Return the number of characters
       between *src and *s,
       including *src but not including *s . 
       This is the length of the source string. */
    return s - src;
}
Joey Adams
  • 41,996
  • 18
  • 86
  • 115
  • 1
    Note that the begin, and one-past-the-end, is the same technique used in the C++ STL to great effect. – Puppy May 29 '10 at 10:38
  • `s_strlcpy(char *dest, const char *src, size_t len)` better as `s_strlcpy(char * restrict dest, const char * restrict src, size_t len)` as this code has trouble when `dest, src` overlap. – chux - Reinstate Monica Feb 17 '20 at 21:46
  • @chux-ReinstateMonica: Thanks for the suggestion. It's a nice optimization, but I'm leaving it out to keep the code simple. – Joey Adams Feb 27 '20 at 17:50
4

IMHO, just barrow the original strlcpy, which Ignacio Vazquez-Abram tersely stated. OpenBSDs code is battletested and the license terms rock ;).

As to your code, something I would add to what has already been said by others, is just a matter of personal taste:

/* only copy up to the first nul is reached */
if(*src != '\0') {
    *dest++ = *src++;
}
else {
    break;
}

I would have written this like this:

if(*src == '\0') {
    break;
}
*dest++ = *src++;

Both because it reduces on the amount of unnecessary code people need to read, and because it is my 'style' to consistently write that way, instead of if (ok) { do } else { handle error }. The comment above the if is also redundant (see comment above the for loop).

TerryP
  • 1,072
  • 1
  • 8
  • 13
  • 1
    I totally agree with regards to style (and many others do as well, http://stackoverflow.com/questions/114342/what-are-code-smells-what-is-the-best-way-to-correct-them/223881#223881) ; "normal" code should (when possible) be written without extra indentation, it should be exceptions that are tested for. – hlovdal Jun 09 '10 at 00:22
  • The link you provided is broken. – Zakk Aug 03 '22 at 10:06
3

Why don't you use something like memccpy() instead of rolling your own? You just have to terminate with a null byte, but it's easier and generally faster to augment a standard function than to start from scratch.

Some architectures will heavily optimize or even use assembly for string functions to squeeze good performance out of them.

Without building or debugging:

str = memccpy (dest, src, '\0', len);
if(str)
    *str = '\0';
WhirlWind
  • 13,974
  • 3
  • 42
  • 42
  • 1
    I am a junior programmer, so just looking to see what mistakes I make. Well, not to re-invent the wheel. But it more fun to develop your own. – ant2009 May 29 '10 at 04:16
  • 1
    @robUK there's nothing wrong with reinventing the wheel as a learning exercise, but I'd never do that in production code, as you asked. Also, you disparaged the standard library functions, but you can basically make them work if you try, and it's a much better way to go, again, for production code. – WhirlWind May 29 '10 at 04:19
  • 1
    @robUK One basic reason I would prefer to augment the standard library is if I run into a bug. If I see a standard library call, I probably won't look into it as a bug, but if it's a home grown function, the probability of bugs is higher, particularly if it's not called a lot, so I will spend more debugging time looking into it. – WhirlWind May 29 '10 at 04:27
  • 1
    The OP cannot use `memccpy` for the very same reason they can't use `strlcpy`. It is not portable. There's no such function is C standard library as `memccpy`. – AnT stands with Russia May 29 '10 at 21:39
2

Yes, unit testing. Check with lots of randomly generated strings.

Looks fine to me, though.

Tyler Menezes
  • 633
  • 1
  • 4
  • 17
  • random test input doesn't make much sense -- if the test finds a failure, how will you duplicate it? Each test run would potentially test different parts of the algorithm and could miss some edge cases altogether. – Bryan Oakley May 29 '10 at 05:40
  • 1
    @Bryan Oakley "if the test finds a failure, how will you duplicate it?" Use pseudo-random inputs, so you can always duplicate it. – Joey Adams May 29 '10 at 05:48
2

I would suggest that White-box testing is useful for a function like this (a form of unit testing).

caf
  • 233,326
  • 40
  • 323
  • 462
2

There is the DRY principle "don't repeat yourself". In other words do not create new code to do something that is already a done deal - check in the standard C library as the example above (WhilrWind) shows.

One reason is the testing mentioned. The standard C library has been tested for years, so it is a safe bet it works as advertised.

Learning by playing around with code is a great idea, keep on trying.

jim mcnamara
  • 16,005
  • 2
  • 34
  • 51
  • The whole point of the question is that the standard library does not provide immediate means to do what the OP wants to do. The answer you are referring to (WhirlWind) is based on non-standard means. – AnT stands with Russia May 29 '10 at 21:41
2

Unit testing? Is this good enough for production?

Probably for a "simple" function like this it may be sufficient, although the only real way to test a function is to try and break it.

Pass to it NULL pointers, 10k character long strings, negative values of len, somehow corrupted data, and so on. In general think: if you were a malicious user trying to break it, how would you do that?

See the link in my response here

Community
  • 1
  • 1
nico
  • 50,859
  • 17
  • 87
  • 112
1

There's always static code analysis. Edit: List of tools for static code analysis

Rooke
  • 2,013
  • 3
  • 22
  • 34
  • 2
    In my opinion, that link should point to a static code analyzer for C that you've found useful, not to a page defining it in abstract. – Joey Adams May 29 '10 at 04:59
1

I think it's a mistake to rely so much on the length, and doing arithmetic on it.

The size_t type is unsigned. Consider how your function will behave if called with a 0-sized destination.

unwind
  • 391,730
  • 64
  • 469
  • 606
1

Hmmm, did not realize this is an old post.

Is this good enough for production?
completely safe and stable (?)

Weaknesses:
Does not handle len == 0 correctly - easy to fix.
Return value questionable when source is long - easy to fix.
(Not discussed yet) Does not consider overlapping dest, src.

It is easy enough to incur an unexpected result with if(*src != '\0') { *dest++ = *src++; } overwriting the null chracter before it is read, thus iteration ventures beyond the original '\0'.

// pathological example                 
char buf[16] = "abc";
const char *src = buf;       // "abc"
const char *dest = buf + 2;  // "c"
size_t dest_sz = sizeof buf - 2;
s_strlcpy(dest, src, dest_sz);
puts(dest); // "ababababababa", usual expectation "abc"

Two solutions forward:

restrict
Since C99, C has restrict which indicates to the compiler that it can assume the data read via src and written via dest will not overlap. This allows the compiler to use certain optimizations it otherwise cannot use. restrict also informs the user should not provide over-lapping buffers.

  • Code can still fail as above, but then that is the caller breaking the contract, not s_strlcpy().

Notes: const in const size_t len is a distraction in the function declaration. Also clearer to use size_t size, than size_t len.

size_t s_strlcpy(char * restrict dest, const char * restrict src, size_t size);

This restrict usage is like the standard library strcpy() and others.

char *strcpy(char * restrict s1, const char * restrict s2);

Handle overlap
The other is to make s_strlcpy() tolerant of overlapping memory as below. That pretty much implies code needs to use memmove().

size_t s_strlcpy(char *dest, const char *src, const size_t dest_size) {
  size_t src_len = strlen(src);
  if (src_len < dest_size) {
    memmove(dest, src, src_len + 1);  // handles overlap without UB
  } else if (dest_size > 0) {
    // Not enough room
    memmove(dest, src, dest_size - 1);  // handles overlap without UB
    dest[dest_size - 1] = '\0';
  }
  return src_len;  // I do not think OP's return value is correct. S/B src length.
}

Hopefully I coded all the functionality of strlcpy() correctly. The edge cases take time to sort out.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256