ROT13 Cipher in Assembly

Question

So I've got my ROT13 cipher pretty much doing what I want, however at the end the command prompt shows up on the same line as the last line of output. This is my first project in Assembly so I'm pretty unsure of what I'm doing wrong.

So make sure your output ends with a newline. (ASCII code = 10). You already have that in one of your strings. Instead of a separate `write()` system call, probably just store a 10 to the end of the buffer that holds the user's string. — Peter Cordes, Sep 24 '16 at 04:07
BTW, nice job formatting and commenting your asm (and using symbolic names so you need fewer comments). This is a lot more readable than a lot of code dumps in beginner questions. — Peter Cordes, Sep 24 '16 at 04:10
You don't need a `nop` at _start. The `start` and `_start` symbols can have the same address. Or you can just type `b _start` to set a breakpoint there. (Fun fact: the `ld -e` option lets you set the entry point to whatever symbol name you want. But don't do that, it's probably just confusing. Also, in a stripped binary, you can use `readelf` to find the numeric address of the entry point so you can set a breakpoint there. (`b *0x04000...`).) — Peter Cordes, Sep 24 '16 at 04:15
I can't take credit for that...all of the work I did on this was modifying the loop structure. I'm not exactly sure how to implement your suggestion though. — TyManning, Sep 24 '16 at 07:27

score 0 · Accepted Answer · edited May 23 '17 at 12:33

Your program doesn't print a newline at the end of its output, so the cursor is at the end of a non-empty line when it exits. The shell doesn't know this, and prints the next prompt there.

echo foo includes a trailing newline, so when the shell prints the next prompt, the cursor was already at the start of a new line. echo -n foo doesn't include a trailing newline, so it leaves the cursor at the end of a line that starts with foo, and your prompt is tacked on to that, like your program does. Pipe those echo commands into hd to see a hexdump of the ASCII characters they print.

So the solution is to make sure your output ends with a newline (ASCII code = 10). You already have that in your msg4: db 10, "Read error", 10 string. (It starts with a newline as well as ending with one.) In C you'd write `"\nRead error\n", but NASM syntax doesn't work that way. It does support C-style escapes inside backquoted strings, but it's typical for people to write newlines with numeric constants.

Your user input (that you get from sys_read) should usually end with a newline, unless the user typed 256 characters on a line, or used ctrl-D to make read return early. (Or similarly piped input that doesn't end with a newline, so read hits EOF).

I started to follow the logic of the compares, but it got tiring pretty quickly. I'm not sure what happens to newlines in your input, but I suspect that your code modifies newlines in the buffer. You should probably avoid that, and leave them unchanged. I guess you'd just add them to your list of compares & branch for characters not to modify.

That's probably more useful behaviour for a rot13 program than adding an extra newline to the end of the buffer, or calling sys_write one extra time to print a newline by itself.

You can test the system calls your program makes using strace. e.g. strace ./a.out will decode the read() and write() system calls you make.

For more debugging tips, see the bottom of the x86 tag wiki. (Which has lots of useful stuff besides that).

BTW, you could do all those cmp al, '?' compares in parallel in an xmm register, with SSE2 (broadcast al to every element of an xmm register, and PCMPEQB with a constant / PMOVMSKB / test/jnz). But don't worry about that until you have a good handle on scalar code.

The other way to avoid the rats nest of CMP/JCC would be to white-list alphabetic characters, leaving input characters unmodified by default.

I'm not sure why you only blacklist '1', but not other numbers, or '+' but not '-', and so on.

Here's how I'd implement your loop, with some "advanced" tricks to collapse multiple similar conditions into single conditions. See my answer on How to access a char array and change lower case letters to upper case, and vice versa for an explanation of the unsigned-compare trick for isalpha().

;; ROT13 alphabetic characters.  Copy others unmodified.
;;  Untested
L1_top:
    movzx   eax, [esi]              ; get a character
    inc     esi                     ; update source pointer

    mov     edx, eax                ; save a copy of the original
    or      al, 0x20                ; make it lower-case if it's a letter (but we can still detect non-letters after this)

    sub     al, 'a'                 ; chars below 'a' will wrap to a high value
    cmp     al, 'z'-'a'
    ja      .non_alpha              ; jump if the sub wrapped, or the char was greater than 'z'

    ; input char was alphabetic
    sub     dl, 13                  ; modify the original character
    sub     al, 13                  ; check if that takes us out of the alphabet.  Can be a CMP, not SUB if we want.
    jnc     .nocarry
    add     dl, 26                  ; add 26 if the subtract wrapped
    ;add     al, 26                 ; we don't care about the value in al anymore
.nocarry:

    ; dl = the ROT13'ed character, with its original case
.non_alpha:
    mov    [edi], dl
    inc    edi

    dec    ecx                      ; I'm not sure what all the cmp ecx,0  in various branches was for.  Just do it earlier if necessary.
    jnz    L1_top

Originally I was going to actually calculate lower-cased ROT13'ed character in AL, and then find the difference between that and the original lower-cased character, and apply that to DL. But then I realized I could conditionally modify DL in the earlier branching.

    ;; after the or al,0x20:    mov   ah, al       ; don't over-do it with upper-half byte registers.  False dependencies on AMD, and partial-reg merging stalls or slowdowns on pre-Haswell Intel if you're not careful.

    add     al, 'a'                 ; 'a' + al is the lower-cased ROT13 of the input character
    sub     ah, al                  ; ah = lcase(orig) - lcase(rot13)
    sub     dl, ah                  ; apply that delta to the original in dl
    ; dl is the original character - 13 (plus 26 if necessary)

Your comment in bold is exactly what was happening. Also, I really hate that I have so many cmps using the al register. This is my first legitimate foray into Assembly so I didn't know of a better way to do that until now but thanks very much! With your insight I understand a lot better and I resolved the original issue. — TyManning, Sep 24 '16 at 13:51
@swingonaspiral: besides SSE tricks, compilers also have other tricks up their sleeve when you write a `switch()` statement with a lot of cases that all take the same action. e.g. check out [how gcc uses an immediate-constant for TEST as a bitmap](https://godbolt.org/g/Htd9Gw). (See [comments on this answer](http://stackoverflow.com/a/129515/224132) for an explanation.) — Peter Cordes, Sep 24 '16 at 20:11
@swingonaspiral: But really, without any fancy tricks, the way to simplify the code is to take a step back and simplify the *logic*. It's ROT13, so probably we only want to modify alphabetic characters, and leave *everything* else unmodified. So **instead of blacklisting everything that you want to preserve, just whitelist the upper and lower case alphabet ranges.** — Peter Cordes, Sep 24 '16 at 20:15
@swingonaspiral: I was curious how efficient I could make the loop. I added an edit that looks pretty good. It's normal that your beginner attempts are inefficient, but I thought you might be interested to see what tricks are possible. — Peter Cordes, Sep 24 '16 at 21:38
I actually thought about your whitelist / blacklist comment a few days after I got it somewhat functional and made that change before coming back here to ask another question about a different project. Thanks again for your help! — TyManning, Oct 02 '16 at 00:50

ROT13 Cipher in Assembly

1 Answers1