2

How can I xor two data segments pointed by pointers?

I've tried this, one based on a similar solution on stackoverflow, but the output is not what I expected.

Here is the code:

void printXor(){    
    int j;
    char* c = strdup("hey");
    for(j = 0; j < strlen(c); j++){
        c[j] ^= c[j];
    }

    printf("%d\n", *(int*)(c));
}

But the output is: 7955712. Shouldn't the output be 0? I'm xoring "hey" over "hey" and its value in int is 0 right?

Marcos Dimitrio
  • 6,651
  • 5
  • 38
  • 62
  • @EugeneSh. `c` points to the 1st of four bytes. – alk Jul 20 '18 at 14:25
  • 2
    You are breaking strict aliasing with `*(int*)(c)` and therefore suffering from undefined behavior. – Christian Gibbons Jul 20 '18 at 14:26
  • @ChristianGibbons `char` pointer is allowed to alias anything, so this is not the case. –  Jul 20 '18 at 14:27
  • 1
    @Ivan Anything can be accessed as a char array, but not the other way around. – Christian Gibbons Jul 20 '18 at 14:27
  • 1
    @Ivan But not vice versa... – Eugene Sh. Jul 20 '18 at 14:28
  • @EugeneSh. Pointer returned by `strdup` is not a `char` array, but a memory block allocated by `malloc` which is then filled through `char *`. As long as content is access via `int *` and `char *` everything will be fine. –  Jul 20 '18 at 14:29
  • Given `char *c`, `*(int*)(c)` is a strict aliasing violation and undefined behavior. You can't treat something that didn't start as an `int` as an `int`. – Andrew Henle Jul 20 '18 at 14:29
  • @Ivan Think of the origins of the problem - for instance the alignment. What if `c` is not aligned to the `int` boundary? – Eugene Sh. Jul 20 '18 at 14:30
  • @EugeneSh. it is allocated by `malloc`, it can contain any data type. –  Jul 20 '18 at 14:30
  • @Ivan How about `c+1`? – Eugene Sh. Jul 20 '18 at 14:31
  • @Ivan *Pointer returned by strdup is not a char array, but a memory block allocated by malloc* Not relevant. Just because the memory is "suitable for use for any purpose", that doesn't mean that the use can be changed once it's used. Google "effective type". – Andrew Henle Jul 20 '18 at 14:32
  • @EugeneSh. He is not accessing `c+1` through `int *`. Both `char` and `unsigned char` are allowed to be used to change internal representation of any value. And since `malloc`'ed blocks are not really types, there are no issues with aliasing. –  Jul 20 '18 at 14:32
  • @Ivan No. https://stackoverflow.com/questions/30970251/what-is-the-effective-type-of-an-object-written-by-memset – Andrew Henle Jul 20 '18 at 14:33
  • @Ivan According to your logic `*(int*)(c+1)` should be valid as well (let's assume `c` is longer than 4 bytes). – Eugene Sh. Jul 20 '18 at 14:33
  • @Ivan It's legal to treat an `int` as an array of `char`. It is **not** legal to treat an array of `char` as an `int`. – Andrew Henle Jul 20 '18 at 14:34
  • 1
    Yeah, the problem with effective type is that `malloc`ed memory was accessed by the allocator itself already so it is strictly speaking not a `char` or `int` but something else. Standard doesn't seem to be very helpful here. –  Jul 20 '18 at 14:48

2 Answers2

9

Take a look at this loop:

for(j = 0; j < strlen(c); j++) {
    c[j] ^= c[j];
}

You are modifying c and calculating its length using strlen. After the first iterator strlen returns 0 and loop stops.

If you translate 7955712 to hex it is 0x796500. 0x79 is a code for 'y' and 0x65 is a code for 'e' and the least significant byte is 0x00. Since you are running this on a little endian machine, you get an empty string.

1

Strictly speaking, the behaviour of your code is undefined due to an aliasing violation in reading a char array as an int.

You can recast the crux of the question to the well-defined

#include <stdio.h>
#include <stdint.h>
int main(){    

    int32_t n;
    char* c = &n;
    c[0] = 'h';
    c[1] = 'e';
    c[2] = 'y';
    c[3] = 0;

    for( int j = 0; j < strlen(c); j++){
        c[j] ^= c[j];
    }
    printf("%" PRId32 "\n", n);
}

The output is not zero since only one iteration of the for loop runs since a subsequent valuation of strlen will be 0, since c[0] will evaluate to the NUL-terminator.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • No, that's not the case. Char array allocated was 4 bytes long as int is. So when data is xored on itself, in 4 byte memory there should be zeros everywhere. After that I'm reading this 4 byte memory as an int. What's the problem? – Data Chanturia Jul 20 '18 at 14:37
  • @DataChanturia You are not XORing everything, only the first element. When the conditions of the loop are re-evaluated after the first iteration, it sees `strlen(c)` returning `0` and thus the loop stops after one iteration. – Christian Gibbons Jul 20 '18 at 14:39
  • @DataChanturia: No. No. No. The `for` loop terminates after the first iteration. – Bathsheba Jul 20 '18 at 14:39
  • @DataChanturia Huh? You xor `char`s, not `int`s. – Eugene Sh. Jul 20 '18 at 14:39
  • That's right! strlen(c) is the problem (at first misunderstood the point). – Data Chanturia Jul 20 '18 at 14:42
  • @DataChanturia: Absolutely! The other answer explains it better; this one is more to keep the UB away. – Bathsheba Jul 20 '18 at 14:43
  • @EugeneSh. Right I'm trying to XOR char-s but they are just chunks of memory. char, int does not matter right? When using XOR bytes are XORed. – Data Chanturia Jul 20 '18 at 14:44
  • When you store a result into `char` with the `=` operator, only the value of this `char` is affected. – Eugene Sh. Jul 20 '18 at 14:45
  • @EugeneSh. Yep, it's value turns to 0 (in this case). 0 0 0 0 (4 byte zero). Representing pointer to this 4 byte array as pointer to int and dereferencing == 0. – Data Chanturia Jul 20 '18 at 14:52
  • @DataChanturia: There are at least two problems reading 4-byte memory as an `int` in `*(int*)(c)`. The first is that, in the general case, the C standard only defines (partially) the result of converting one object-pointer type to another object-pointer type if the result is correctly aligned (in C 2011 [N1570] 6.3.2.3 7). Since `c` was allocated by a `strdup`, it may not be correctly aligned for an `int`. The second is that accessing four `char` objects as an `int` violates 6.5 7, which constraints the expression types that are allowed to access stored values. – Eric Postpischil Jul 20 '18 at 17:28