27

Python says I need 4 bytes for a format code of "BH":

struct.error: unpack requires a string argument of length 4

Here is the code, I am putting in 3 bytes as I think is needed:

major, minor = struct.unpack("BH", self.fp.read(3))

"B" Unsigned char (1 byte) + "H" Unsigned short (2 bytes) = 3 bytes (!?)

struct.calcsize("BH") says 4 bytes.

EDIT: The file is ~800 MB and this is in the first few bytes of the file so I'm fairly certain there's data left to be read.

Thomas O
  • 6,026
  • 12
  • 42
  • 60

2 Answers2

28

The struct module mimics C structures. It takes more CPU cycles for a processor to read a 16-bit word on an odd address or a 32-bit dword on an address not divisible by 4, so structures add "pad bytes" to make structure members fall on natural boundaries. Consider:

struct {                   11
    char a;      012345678901
    short b;     ------------
    char c;      axbbcxxxdddd
    int d;
};

This structure will occupy 12 bytes of memory (x being pad bytes).

Python works similarly (see the struct documentation):

>>> import struct
>>> struct.pack('BHBL',1,2,3,4)
'\x01\x00\x02\x00\x03\x00\x00\x00\x04\x00\x00\x00'
>>> struct.calcsize('BHBL')
12

Compilers usually have a way of eliminating padding. In Python, any of =<>! will eliminate padding:

>>> struct.calcsize('=BHBL')
8
>>> struct.pack('=BHBL',1,2,3,4)
'\x01\x02\x00\x03\x04\x00\x00\x00'

Beware of letting struct handle padding. In C, these structures:

struct A {       struct B {
    short a;         int a;
    char b;          char b;
};               };

are typically 4 and 8 bytes, respectively. The padding occurs at the end of the structure in case the structures are used in an array. This keeps the 'a' members aligned on correct boundaries for structures later in the array. Python's struct module does not pad at the end:

>>> struct.pack('LB',1,2)
'\x01\x00\x00\x00\x02'
>>> struct.pack('LBLB',1,2,3,4)
'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04'
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • 1
    What I'm wondering is why Python didn't pack the data in such a format in the first place. "01 01 00" it packed byte 0x01, short 0x01, but it's trying to unpack it like "01 00 01 00". Anyway, I solved my problem, I'm always adding '<' before all my format codes, to make them unpadded little endian. Thanks for the explaination. :) – Thomas O Apr 10 '10 at 17:47
  • had a similar problem, the '=' nor '@' did not solve... using code I did on mac on windows – jokoon Sep 28 '11 at 23:57
  • @ThomasO Why do you say it packs it as "01 01 00"? I'm seeing struct.pack('BH', 1, 2) == '\x01\x00\x02\x00'. – aij Jul 14 '16 at 20:19
  • @aij padding is added unless you use one of the endian options `=<>!`. – Mark Tolonen Jul 14 '16 at 22:22
  • @MarkTolonen Yes, that's what I'm seeing too. I was just wondering why ThomasO is saying `pack` didn't add the padding. – aij Jul 17 '16 at 00:18
8

By default, on many platforms the short will be aligned to an offset at a multiple of 2, so there will be a padding byte added after the char.

To disable this, use: struct.unpack("=BH", data). This will use standard alignment, which doesn't add padding:

>>> struct.calcsize('=BH')
3

The = character will use native byte ordering. You can also use < or > instead of = to force little-endian or big-endian byte ordering, respectively.

interjay
  • 107,303
  • 21
  • 270
  • 254
  • Oddly, I look at my file in hex, and I have the data 01 01 00 which shows three bytes for the version: a single 'major' byte and a single 'minor' short. So is the statement false? unpack("BH", pack("BH", 3, 6)) == (3, 6) Thanks for your help. – Thomas O Apr 10 '10 at 01:23
  • @Thomas: I'm not sure what exactly you're asking. The expression you posted will evaluate to True. – interjay Apr 10 '10 at 01:26
  • That's what I thought, and it's pretty much what I'm doing. I'm packing, using Python, a simple database, with pack("BH", major_ver, minor_ver), then unpacking using unpack("BH"). On the same computer which is an Intel C2D x86-64. Where does the extra byte come in? I'll use =BH, but with some suspicion that a byte is getting lost or gained somewhere. – Thomas O Apr 10 '10 at 01:30