Most of this has been said in the comments already, but let me summarise. There are three problems raised by your code/question:
1) MOVDQA
requires the addresses it deals with ([rdx]
in your case) to be aligned to a 16-byte boundary and will trigger an access violation otherwise. This is what you are seeing. Alignment to a 16-byte (DQWORD) boundary means that, using your example, you should read from e.g. 0xFFFFFFFFFFFFFFF0
rather than 0xFFFFFFFFFFFFFFFF
, because the latter number is not divisible by 16.
2) The address you use, 0xFFFFFFFFFFFFFFFF
, is almost certainly invalid.
3) Provided you use MOVDQA
to read from a valid 16-byte-aligned memory location, the results (in xmm1
in your case) will be IDENTICAL to when you use MOVDQU
. The only relevant difference between the two here is that movdqU
allows you to read from Unaligned (hence the U) memory whereas movdqA
requires a (16-byte) Aligned memory location. (The latter case will often be faster, but I don't think you need to worry about that at this stage.)