How to add two registers together without the "ADD" instruction

Question

First, don't bother with "do you use AT&T or Intel assembly?" or things like that.

I am wondering if it is possible to add the content of two registers, say AX and BX, together without using the ADD or ADC instructions. I want to do that using only this set of instructions:

MOV
CMP
JMP
JE
JNE

I would like to see a simple example, even if it is just the algorithm without any real code, or a good tutorial for this.

If you are wondering why I am asking this, well it is because I'm creating a very basic computer providing only those instructions for the moment, and I would like to know if it is already capable to add two registers together.

If you had decrement/increment instructions you could do it. — Paul R, Apr 10 '13 at 08:57
It's possible if you have a ton of memory to burn on giant lookup tables, assuming the `MOV` can mov from memory. — harold, Apr 10 '13 at 09:04
@harold or it could be a ton of memory for a giant state machine, where the instruction pointer value is the state, assuming CMP can compare with constants. — Alexey Frunze, Apr 10 '13 at 09:22
@PaulR : thank you. I guess "INC" and "DEC" are a must when programming using assembly, aren't they ? And probably (correct me if I am wrong) the "ADD" instruction is using "INC" and "DEC" right ? I see how to use them. I will implement INC and DEC functions :) Many thanks — Guillaume Voiron, Apr 10 '13 at 09:33
@harold Actually 128 KiB look-up table is sufficient, you don't need any **giant** look-up tables. See my edited answer. — nrz, Apr 12 '13 at 11:07
@harold Actually a 258-byte look-up table is sufficient, as each byte of the operands and each byte of the sum can be updated separately. See my edited answer. — nrz, Apr 13 '13 at 15:31
@nrz ok I have been thoroughly defeated now :) I would upvote if I hadn't already.. — harold, Apr 13 '13 at 15:35
In fact, only OR and NOT logical operations are really needed... It's your idea, but with un-not-ed operations x) — whiteiden97, Mar 28 '18 at 22:09
@whiteiden97 OR + NOT are two separate instructions with different gates. A NOR gate isn't harder to build than an OR gate or a NOT gate, so if you're trying to simplify the instruction set to the minimum without getting as ridiculous as https://en.wikipedia.org/wiki/One_instruction_set_computer, you want one boolean operation with https://en.wikipedia.org/wiki/Functional_completeness like NOR or NAND. — Peter Cordes, Mar 29 '18 at 01:44

score 3 · Answer 1 · answered Apr 10 '13 at 09:19

3

Theoretically, yes. The following humongous program can be expressed using your instruction set, and will perform the addition:

if ax = 0 and bx = 0:
  result = 0
else if ax = 0 and bx = 1:
  result = 1
...
else if ax = 42 and bx = 24:
  result = 66
...

In any practical sense, the answer to your question is "no".

answered Apr 10 '13 at 09:19

NPE

486,780
108
951
1,012

Ok so if I understand, I should test... all possible values and give de result... Not practical enough, but thank you for the idea :). Algorithms look so stupid when coming from a high level programming language to assambly... – Guillaume Voiron Apr 10 '13 at 09:28
@GuillaumeVoiron: It's hard to comment on what you *should* do, given that we don't really know what you're trying to achieve here. – NPE Apr 10 '13 at 09:29
This answer is good enough, it gives me an algorithm to add two registers together, which is what I am looking for. Any suggestions on what I should add to my question in order to make it more comprehensible ? – Guillaume Voiron Apr 10 '13 at 09:30
@GuillaumeVoiron: I think the question itself is clear enough. What is less clear is the context (i.e. where you're going with all this). – NPE Apr 10 '13 at 09:32
1

I basically have an electronic circuit that allows me to manipulate this commands with 3 registers and display some results with leds. PaulR answered my question, I'm fine with creating INC and DEC functions :) – Guillaume Voiron Apr 10 '13 at 09:37
3

@GuillaumeVoiron: If I were you, I'd have `ADD` and `SUB` instead of `INC` and `DEC`. Also, if you're aiming for a minimal instruction set, you can get rid of `JNE`. Its effect can be achieved with a `JE` around an unconditional `JMP`. – NPE Apr 10 '13 at 09:41
@NPE : I was thinking the same but now I have an other question. Is it easier to implement the "INC" method or the "ADD" instruction using logical gates ? Probably it's just the same "adder module" that does the job anyway right ? Thanks for the JNE trick :) That would have been my next question and you answered it :D – Guillaume Voiron Apr 10 '13 at 10:03
1

@GuillaumeVoiron: Both are easy. Just google "full adder". The easiest is to have a bunch of those in a ripple-carry configuration. – NPE Apr 10 '13 at 10:14
Yeah that's what I was thinking, INC and ADD are just full adders :) Thank you very much ! – Guillaume Voiron Apr 10 '13 at 10:20
This can be "optimized" to fit the available memory of 80286 by comparing AL,BL && AH,BH parts separately. (there's an extra stage needed to get the carry bit from AL+BL ...) – Aki Suihkonen Apr 10 '13 at 10:28
@AkiSuihkonen : I'm not going to make any optimisation, because this is not the purpose of this project. But thank you for the information, this could be useful in the future :) – Guillaume Voiron Apr 10 '13 at 12:52
2

@GuillaumeVoiron For the absolute minimal instruction set, you only need `mov`, `cmp` and `jne`. All additions and subtractions can be done with a look-up table, and therefore multiplications and divisions too. x86/x86-64 instructions `mov`, `cmp` and `jne` are sufficient for Turing-completeness. When you have `mov`, `cmp` and `jne`, instructions `je` and `jmp` are just syntactic sugar ;) Because you can always replace eg. `cmp al,0; je my_label` with `cmp al,0; jne_no_jump_label; cmp al,1; jne my_label; cmp al,2; jne my_label; no_jump_label:`. See my edited answer for 16-bit addition code. – nrz Apr 12 '13 at 11:18
@NPE In practical sense, yes. See my edited answer. A 258-byte look-up table is sufficient for summing of two integers of any size. – nrz Apr 13 '13 at 15:35

nrz · Answer 2 · 2013-04-13T15:13:36.450

Edit: Any addition or subtraction can be done with a 258-byte look-up table and using only mov, cmp and jne. There is absolutely no need at all for giant look-up tables. Lower 8 bits and upper 8 bits are updated separately using the same look-up table.

Here's the code that sums ax and bx using only one 258-byte look-up table and only mov, cmp and jne:

[bits 64]

; valid instuctions: mov, cmp, jmp, je, jne
; used instuctions: mov, cmp, jne

section .text
global _start

; this code sums ax & bx

_start:

; define the values to be summed (in ax & bx).

        mov     ax,12853        ; example summand 1.
        mov     bx,33276        ; example summand 2.

; summing is easy: just decrement each summand until it becomes zero,
; and for each decrement, increment the sum (in ax).

        cmp     bx,0
        jne     start_summing   ; normally this would be je ready and
                                ; next 2 instructions would be deleted.

                cmp     bx,1    ; this shows that jne is sufficient.
                jne     ready   ; this conditional jump branches always.

start_summing:
        mov     ecx,0

summing_loop:
        mov     cl,bl
        mov     bl,[rcx+(number_line-1)]        ; decrement bl.
        cmp     bl,255
        jne     bl_not_carry

                mov     cl,bh
                mov     bh,[rcx+(number_line-1)]        ; decrement bh.

bl_not_carry:
        mov     cl,al
        mov     al,[rcx+(number_line+1)]        ; increment al.
        cmp     al,0
        jne     al_not_carry

                mov     cl,ah
                mov     ah,[rcx+(number_line+1)]        ; increment ah.

al_not_carry:
        cmp     bx,0
        jne     summing_loop

ready:

; sum is now in eax.

section .data

max_value equ 255
max_value_plus_1 equ (max_value + 1)

db max_value ; 0 - 1 = 255

number_line:

%assign myValue 0

%rep max_value_plus_1
        db myValue
        %assign myValue (myValue + 1)
%endrep

db 0

Edit: The rest of the answer deals with other solutions that need more memory.

Edit: A one-dimensional 128 KiB look-up table is sufficient for any addition or subtraction of 16-bit operands. No need for giant look-up tables. Edit: Fixed bug that causes additions that normally set carry flag produce incorrect result.

Here's the code in x86-64 assembly, assembles with YASM, probably with NASM too. Implements add ax,bx, using only mov, cmp & jne.

[bits 64]

; valid commands: mov, cmp, jmp, je, jne
; used commands: mov, cmp, jne

section .text
global _start

; this code sums ax & bx

_start:

; define the values to be summed (in ax & bx).

        mov     ax,12853  ; example summand 1.
        mov     bx,33276  ; example summand 2.

; summing is easy: just decrement each summand until it becomes zero,
; and for each decrement, increment the sum (in ax).

        mov     edx,0
        mov     dx,ax
        mov     eax,edx   ; eax = ax

        mov     ecx,0
        mov     cx,bx     ; ecx = bx

summing_loop:
        mov     cx,[2*rcx+(number_line-2)]      ; decrement ecx.
        mov     ax,[2*rax+(number_line+2)]      ; increment eax.
        cmp     ecx,0
        jne     summing_loop

; sum is now in eax.

section .data

max_value equ 65535

dw max_value ; 0 - 1 = 65535

number_line:

%assign myValue 0

%rep max_value
        dw myValue
        %assign myValue (myValue + 1)
%endrep

dw 0

Edit: The rest of the answer deals with a more limited solution I first came up with.

It can be done with a two-dimensional look-up table.

For 8-bit registers, for example al & bl, it's easy. For 16-bit registers, it can be done, but the look-up table will be huge (almost 1 tebibyte), see below why. Each cell of the lookup table contains the sum of the corresponding X & Y coordinates (the X & Y coordinates are the summands).

For 8-bit sum the look-up table (a 256 * 256 matrix) is like this:

db   0,   1,   2, ... , 253, 254, 255
db   1,   2,   3, ... , 254, 255,   0
db   2,   3,   4, ... , 255,   0,   1
     .    .    .  .       .    .    .
     .    .    .   .      .    .    .
     .    .    .    .     .    .    .
db 253, 254, 255, ... , 250, 251, 252
db 254, 255,   0, ... , 251, 252, 253
db 255,   0,   1, ... , 252, 253, 254

In x86 and x86-64 mov can be used for multiplication by 256^n, that is: 256, 65536, 16777216, ...

Multiplying by 256 with mov is easy, to compute ax = 256 * bl:

mov ax,0
mov ah,bl

To add eg. al & bl, we need to get the right offset, it's256 * al + bl, or 256 * bl + al (because the look-up table is a symmetric matrix, and it's symmetric because addition is a commutative operation).

Multiplying by 65536 and bigger numbers using only mov in x86/x86-64 requires using memory, because there is no way to address directly upper 16 bits of a 32-bit general register (such as eax) or the upper 32 bits of a 64-bit general register (such as rax).

To compute eax = 65536 * bx using only mov:

mov [temp], dword 0
mov [temp+2], bx
mov eax, [temp]

...

temp dd 0

But the real problem with 16-bit values is that in x86/x86-64 memory is addressed using byte offset, not with word/dword/qword offset, and we can only multiply by 256^n. But let's see first how the look-up table could look like if we didn't have this problem with multiplication and byte offset addressing. The look-up table could then be like this:

dw     0,     1,     2, ... , 65533, 65534, 65535
dw     1,     2,     3, ... , 65534, 65535,     0
dw     2,     3,     4, ... , 65535,     0,     1
       .      .      .  .         .      .      .
       .      .      .   .        .      .      .
       .      .      .    .       .      .      .
dw 65533, 65534, 65535, ... , 65530, 65531, 65532
dw 65534, 65535,     0, ... , 65531, 65532, 65533
dw 65535,     0,     1, ... , 65532, 65533, 65534

Here, each row has 65536 cells, each is dword, so each row takes 2 * 65536 bytes = 131072 bytes. There are 65536 rows, so it's a 65536 * 65536 matrix.

Word-sized cells are not a problem for X (the horizontal index, either of the summands), because x86 assembly allows scale factors of 1, 2, 4 and 8.

Edit: Corrected text on array size, it's actually little bit smaller than 1 TiB.

The problem here is that multiplying Y (the vertical index, the other summand) by 131072 is not possible using only mov. So each row of the look-up table must be repeated 32768 times, or more precisely, there must be 32767 unused filler rows between any actual data rows. Why 32767? Because mov can only be used to multiply by 256, 65536, 16777216 ... so we need to multiply Y (the vertical index, the other summand) by 16777216. As each row takes 131072 bytes, to make new data rows start every 16777216 bytes, there must be 32767 unused filler rows (each takes 131072 bytes) after every data row. After the last data row filler rows are not needed, so in total the array size would be:

65535 * 16777216 + 131072 = 10.99 * 10^12 bytes = almost 1 tebibyte (1 TiB).

Unfortunately I don't have that much memory in my computer, but it's possible in x86-64.

Here's the code for the 8-bit addition using only mov and a 256 * 256 look-up table (tested with YASM, should assemble with NASM too):

[bits 64]

; valid instructions: mov, cmp, jmp, je, jne
; used instructions: mov

section .text
global _start

; al & bl must be preserved

; this code sums al & bl

_start:

; define the values to be summed (in al & bl).

        mov     al,47  ; example first summand
        mov     bl,55  ; example second summand

; the summing code starts here.

        mov     ecx,0
        mov     cl,al  ; ecx = al
        mov     ch,bl  ; ecx = 256 * bl + al
        mov     al,[rcx+sum_look_up_table] ; fetch the sum from look-up table.
                                           ; for 32-bit code, rcx -> ecx
; the sum is now in al.

section .data

y_times equ 256
x_times equ 256

sum_look_up_table:

%assign myY 0

%rep y_times
        %assign myX 0

        %rep x_times
                %assign myValue (myX + myY)
                %rep y_times
                        %if myValue >= 256
                                %assign myValue (myValue - 256)
                        %endif
                %endrep
                db myValue
                %assign myX (myX + 1)
        %endrep

        %assign myY (myY + 1)
%endrep

I love how this answer uses x86 addressing modes which require an add, making a mockery of OP's silly attempt to avoid needing an add circuit. Nice ideas for getting this done with less horrible code that I thought it might take! I like the store/reload with offset idea for dealing with bytes beyond the low 2, which don't have separate partial-register names. — Peter Cordes, Aug 25 '16 at 03:54

phuclv · Answer 3 · 2018-09-11T16:32:51.597

because I'm creating a very basic computer providing only those instructions for the moment, and I would like to know if it is already capable to add two registers together.

In this case you should read What is the minimum instruction set required for any Assembly language to be considered useful?

In short, the minimum possible architecture has only one instruction. However to be more useful a very basic computer should have at least one bit manipulating instruction. Just one NAND or NOR is enough for all logic calculations. And with that you can also do arithmetics too, but not as efficient as a separate ADD/SUB. Besides you need another conditional jump. That totals to 3 instructions. But if you can have 5 instructions like that then there are many better instruction choices that you can read on the other question

That said, it's possible to do anything with just MOV in x86 and many other architectures like 8051 because it's proved to be Turing-complete. There's even a compiler to compile valid C code into a program with only MOVs (or only either of XOR, SUB, ADD, XADD, ADC, SBB, AND/OR, PUSH/POP, 1-bit shifts, or CMPXCHG/XCHG) named movfuscator. Details on how the compiler works is explained in Break Me00 The MoVfuscator Turning mov into a soul crushing RE nightmare Christopher Domas, or just read the slide if you don't have time. Basically it uses lookup tables for most purposes. Here is an example on how an 8-bit addition is achieved

static void alu_add8(char* s, char* x, char* y, char* c, int off)
{
    /* requires dword carry initialized */
    print("# alu_add8\n");
    debug_id();
    print("movl $0, %%eax\n");
    print("movl $0, %%ebx\n");
    print("movl $0, %%ecx\n");
    print("movl $0, %%edx\n");
    print("movb (%s+%d), %%al\n", x, off);
    print("movb (%s+%d), %%cl\n", y, off);
    print("movl (%s), %%ebx\n", c);
    print("movb alu_add8l(%%eax,%%ecx), %%dl\n");
    print("movb alu_add8h(%%eax,%%ecx), %%dh\n");
    print("movb alu_add8l(%%edx,%%ebx), %%al\n");
    print("movb %%al, (%s+%d)\n", s, off);
    print("movb alu_add8h(%%edx,%%ebx), %%al\n");
    print("movb %%al, (%s)\n", c);
    print("# end alu_add8\n");
}

How to add two registers together without the "ADD" instruction

3 Answers3