When zeroing a struct such as sockaddr_in, sockaddr_in6 and addrinfo before use, which is correct: memset, an initializer or either?

Question

Whenever I look at real code or example socket code in books, man pages and websites, I almost always see something like:

struct sockaddr_in foo;
memset(&foo, 0, sizeof foo); 
/* or bzero(), which POSIX marks as LEGACY, and is not in standard C */
foo.sin_port = htons(42);

instead of:

struct sockaddr_in foo = { 0 }; 
/* if at least one member is initialized, all others are set to
   zero (as though they had static storage duration) as per 
   ISO/IEC 9899:1999 6.7.8 Initialization */ 
foo.sin_port = htons(42);

or:

struct sockaddr_in foo = { .sin_port = htons(42) }; /* New in C99 */

or:

static struct sockaddr_in foo; 
/* static storage duration will also behave as if 
   all members are explicitly assigned 0 */
foo.sin_port = htons(42);

The same can also be found for setting struct addrinfo hints to zero before passing it to getaddrinfo, for example.

Why is this? As far as I understand, the examples that do not use memset are likely to be the equivalent to the one that does, if not better. I realize that there are differences:

memset will set all bits to zero, which is not necessarily the correct bit representation for setting each member to 0.
memset will also set padding bits to zero.

Are either of these differences relevant or required behavior when setting these structs to zero and therefore using an initializer instead is wrong? If so, why, and which standard or other source verifies this?

If both are correct, why does memset/bzero tend to appear instead of an initializer? Is it just a matter of style? If so, that's fine, I don't think we need a subjective answer on which is better style.

The usual practice is to use an initializer in preference to memset precisely because all bits zero is not usually desired and instead we want the correct representation of zero for the type(s). Is the opposite true for these socket related structs?

In my research I found that POSIX only seems to require sockaddr_in6 (and not sockaddr_in) to be zeroed at http://www.opengroup.org/onlinepubs/000095399/basedefs/netinet/in.h.html but makes no mention of how it should be zeroed (memset or initializer?). I realise BSD sockets predate POSIX and it is not the only standard, so are their compatibility considerations for legacy systems or modern non-POSIX systems?

Personally, I prefer from a style (and perhaps good practice) point of view to use an initializer and avoid memset entirely, but I am reluctant because:

Other source code and semi-canonical texts like UNIX Network Programming use bzero (eg. page 101 on 2nd ed. and page 124 in 3rd ed. (I own both)).
I am well aware that they are not identical, for reasons stated above.

score 15 · Accepted Answer · edited May 23 '17 at 12:34

One problem with the partial initializers approach (that is '{ 0 }') is that GCC will warn you that the initializer is incomplete (if the warning level is high enough; I usually use '-Wall' and often '-Wextra'). With the designated initializer approach, that warning should not be given, but C99 is still not widely used - though these parts are fairly widely available, except, perhaps, in the world of Microsoft.

I ~~tend~~ used to favour an approach:

static const struct sockaddr_in zero_sockaddr_in;

Followed by:

struct sockaddr_in foo = zero_sockaddr_in;

The omission of the initializer in the static constant means everything is zero - but the compiler won't witter (shouldn't witter). The assignment uses the compiler's innate memory copy which won't be slower than a function call unless the compiler is seriously deficient.

GCC has changed over time

GCC versions 4.4.2 to 4.6.0 generate different warnings from GCC 4.7.1. Specifically, GCC 4.7.1 recognizes the = { 0 } initializer as a 'special case' and doesn't complain, whereas GCC 4.6.0 etc did complain.

Consider file init.c:

struct xyz
{
    int x;
    int y;
    int z;
};

struct xyz xyz0;                // No explicit initializer; no warning
struct xyz xyz1 = { 0 };        // Shorthand, recognized by 4.7.1 but not 4.6.0
struct xyz xyz2 = { 0, 0 };     // Missing an initializer; always a warning
struct xyz xyz3 = { 0, 0, 0 };  // Fully initialized; no warning

When compiled with GCC 4.4.2 (on Mac OS X), the warnings are:

$ /usr/gcc/v4.4.2/bin/gcc -O3 -g -std=c99 -Wall -Wextra -c init.c
init.c:9: warning: missing initializer
init.c:9: warning: (near initialization for ‘xyz1.y’)
init.c:10: warning: missing initializer
init.c:10: warning: (near initialization for ‘xyz2.z’)
$

When compiled with GCC 4.5.1, the warnings are:

$ /usr/gcc/v4.5.1/bin/gcc -O3 -g -std=c99 -Wall -Wextra -c init.c
init.c:9:8: warning: missing initializer
init.c:9:8: warning: (near initialization for ‘xyz1.y’)
init.c:10:8: warning: missing initializer
init.c:10:8: warning: (near initialization for ‘xyz2.z’)
$

When compiled with GCC 4.6.0, the warnings are:

$ /usr/gcc/v4.6.0/bin/gcc -O3 -g -std=c99 -Wall -Wextra -c init.c
init.c:9:8: warning: missing initializer [-Wmissing-field-initializers]
init.c:9:8: warning: (near initialization for ‘xyz1.y’) [-Wmissing-field-initializers]
init.c:10:8: warning: missing initializer [-Wmissing-field-initializers]
init.c:10:8: warning: (near initialization for ‘xyz2.z’) [-Wmissing-field-initializers]
$

When compiled with GCC 4.7.1, the warnings are:

$ /usr/gcc/v4.7.1/bin/gcc -O3 -g -std=c99 -Wall -Wextra  -c init.c
init.c:10:8: warning: missing initializer [-Wmissing-field-initializers]
init.c:10:8: warning: (near initialization for ‘xyz2.z’) [-Wmissing-field-initializers]
$

The compilers above were compiled by me. The Apple-provided compilers are nominally GCC 4.2.1 and Clang:

$ /usr/bin/clang -O3 -g -std=c99 -Wall -Wextra -c init.c
init.c:9:23: warning: missing field 'y' initializer [-Wmissing-field-initializers]
struct xyz xyz1 = { 0 };
                      ^
init.c:10:26: warning: missing field 'z' initializer [-Wmissing-field-initializers]
struct xyz xyz2 = { 0, 0 };
                         ^
2 warnings generated.
$ clang --version
Apple clang version 4.1 (tags/Apple/clang-421.11.65) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin11.4.2
Thread model: posix
$ /usr/bin/gcc -O3 -g -std=c99 -Wall -Wextra -c init.c
init.c:9: warning: missing initializer
init.c:9: warning: (near initialization for ‘xyz1.y’)
init.c:10: warning: missing initializer
init.c:10: warning: (near initialization for ‘xyz2.z’)
$ /usr/bin/gcc --version
i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$

As noted by SecurityMatt in a comment below, the advantage of memset() over copying a structure from memory is that the copy from memory is more expensive, requiring access to two memory locations (source and destination) instead of just one. By comparison, setting the values to zeroes doesn't have to access the memory for source, and on modern systems, the memory is a bottleneck. So, memset() coding should be faster than copy for simple initializers (where the same value, normally all zero bytes, is being placed in the target memory). If the initializers are a complex mix of values (not all zero bytes), then the balance may be changed in favour of an initializer, for notational compactness and reliability if nothing else.

There isn't a single cut and dried answer...there probably never was, and there isn't now. I still tend to use initializers, but memset() is often a valid alternative.

Thanks, with which flags can I reproduce this gcc warning? And why does it warn, given that it's common practice to initialize aggregates completely to zero with a single { 0 }? — Chris Young, May 21 '09 at 18:27
Ah, I see -Werror does the job. You do indeed provide a good case against { 0 }. Is this likely the reason why Stevens et al didn't use it? — Chris Young, May 21 '09 at 18:32
I think it unlikely that Stevens et al were worried about this level of warning. I think it is a more recent addition to GCC anyway; I have code that I've not changed in some years that now gets warnings that I'd've fixed had it been a warning 'back then'. — Jonathan Leffler, May 21 '09 at 19:42
The downside of this is that it turns what would have been a memset of the variable to zero to a memcpy from the bss section, incurring a memory overhead (to store the zero_sockaddr_in in the bss) as well as extra memory reads to copy the memory in. A better way is just to do memset(&var, 0, sizeof(var)). — SecurityMatt, Feb 05 '13 at 06:35
@JonathanLeffler I sometimes see something like this: `memset((char *)&foo, 0, sizeof foo)`. What is the reason for casting to a `char` in the first argument — Peter Chaula, Jan 20 '17 at 21:33
@peter: confusion or antiquity? There's no need for the cast when the code was designed to work with Standard C (C89/90, C99, C11) because the argument to `memset()` is a `void *` and `&f` will convert automatically to `void *`. If the code is old enough to have been written in the days before C89/90, then `memset()` took a `char *` argument rather than a `void *` argument — there was no `void *` back then. Programs like `lint` (seldom used these days because modern compilers can diagnose pretty much everything if the code is written properly) would complain about the absence of a cast. — Jonathan Leffler, Jan 20 '17 at 21:39
I see. I also looked at an example in the man pages (`man 2 bind`) and the cast isn't there. Thanks @JonathanLeffler — Peter Chaula, Jan 20 '17 at 22:03

score 4 · Answer 2 · answered Jun 07 '11 at 02:52

4

I would say that neither is correct because you should never create objects of type sockaddr_anything yourself. Instead always use getaddrinfo (or sometimes getsockname or getpeername) to obtain addresses.

answered Jun 07 '11 at 02:52

R.. GitHub STOP HELPING ICE

208,859
35
376
711

Agreed and up-voted. Unfortunately, some old systems do not have getaddrinfo and you must create the struct yourself. – Chris Young Dec 14 '11 at 03:19
1

Well rather than polluting your main codebase with stuff that creates the struct directly, you could instead throw in a drop-in replacement for the missing `getaddrinfo` functionality... – R.. GitHub STOP HELPING ICE Dec 14 '11 at 20:18
Unfortunately some libraries (libuv in this case) provide APIs that require a pointer-to-sockaddr_storage (sockaddr_storage*) to store their results in. (In libuv's case, for example uv_tcp_getpeername requires this). So, for this use case, I wouldn't want to replace missing functionality, I want to use an API that requires me to provide an empty sockaddr_*. – Dave May 04 '16 at 22:06

score 3 · Answer 3 · answered May 21 '09 at 18:22

3

"struct sockaddr_in foo = { 0 };" is only valid for the first time, whereas "memset(&foo, 0, sizeof foo);" will clear it each time the function is run.

answered May 21 '09 at 18:22

Robert Deml

12,390
20
65
92

2

If foo is a local variable, it is valid each time. If foo is global or static, then you are correct. – Jonathan Leffler May 21 '09 at 18:24
An object with automatic storage duration plus an initializer will be initialized every time it's declared. This is correct for a static one, but these structs are typically not declared static. – Chris Young May 21 '09 at 18:25
3

This is only a problem if the struct is declared static, which it shouldn't be without good reason. – Dave May 21 '09 at 18:25

lothar · Answer 4 · 2009-05-21T19:22:16.940

1

Either one is correct as many have pointed out. Additionally you can allocate these structures with calloc which already returns a zeroed memory block.

edited May 21 '09 at 19:22

answered May 21 '09 at 18:22

lothar

19,853
5
45
59

1

This doesn't answer any part of my question, but yes I'm aware that calloc would do the same. I doubt it's used much, anyway. – Chris Young May 21 '09 at 18:29

score 1 · Answer 5 · answered May 21 '09 at 18:31

1

There shouldn't be a problem with either approach -- the values of the padding bytes shouldn't matter. I suspect that the use of memset() stems from earlier use of the Berkeley-ism bzero(), which may have predated the introduction of struct initializers or been more efficient.

answered May 21 '09 at 18:31

Dave

10,369
1
38
35

1

I suspect you're right - this use of bzero() is so old that many programmers just treat it as idiomatic. – Alnitak May 21 '09 at 18:36

When zeroing a struct such as sockaddr_in, sockaddr_in6 and addrinfo before use, which is correct: memset, an initializer or either?

5 Answers5

GCC has changed over time

Linked