Assembly Tutorial – Some Simple Control Flow

Let’s have another look at the code from our earlier post about writing to files. Suppose that we want to append our input text to an existing file. Here’s the code:

.section .data
filename:
.string "output\.txt"

.section .bss
.lcomm buffer_data, 500

.section .text

.globl _start
_start:

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq $2, %rax
movq $filename, %rdi
movq $0x401, %rsi
movq $0666, %rdx
syscall

movq %rax, %rdi
movq $1, %rax
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq $60, %rax
movq $0, %rdi
syscall

Now, if there is no file named “outputfile.txt”, the open system interrupt will fail and control will return to our code with a negative number in rax, rather than a file descriptor. So when we use the write system call we won’t have a proper file descriptor in rdi, which too will fail. Then our program will exit happily with status code 0.

Let’s try and improve this a little. We’re going to change the code, so that if it fails to open a file it immediately exits with exit code 1. First we’ll move the system that opens the file to the beginning of the code, so we have:

.section .data
filename:
.string "outputfile\.txt"

.section .bss
.lcomm buffer_data, 500

.section .text

.globl _start
_start:

movq $2, %rax
movq $filename, %rdi
movq $0x401, %rsi
movq $0666, %rdx
syscall

movq %rax, %rbx

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq %rbx, %rdi
movq $1, %rax
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq $60, %rax
movq $0, %rdi
syscall

With this order or system calls we have to set aside the file descriptor that is returned from the open system call in the rbx register so that it doesn’t get overwritten in the next system call. The functionality is exactly the same.

Now we are going to check the return value of the open system call and exit on error if the value is negative, here is the code:

.section .data
filename:
.string "outputfile\.txt"

.section .bss
.lcomm buffer_data, 500

.section .text

.globl _start
_start:

movq $2, %rax
movq $filename, %rdi
movq $0x401, %rsi
movq $0666, %rdx
syscall

cmpq $0, %rax
jl exit_with_error

movq %rax, %rbx

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq %rbx, %rdi
movq $1, %rax
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq $60, %rax
movq $0, %rdi
syscall

exit_with_error:
movq $60, %rax
movq $1, %rdi
syscall

There are two new lines after the first system interrupt:

cmpq $0, %rax
jl exit_with_error

The first line compares the value zero to the value in the rax register, the second line, jumps to the label ‘exit_with_error’ if the value in rax was less that zero. What’s quite un-intuitive here is that the condition in the comparison statement is true if the second value is less than the first. This is the opposite order to what you might assume. Indeed when I first wrote this code I spent a significant amount of time debugging in GDB before I realised I had made the mistake of writing:

cmpq %rax, $0
jl exit_with_error

This will jump if 0 less than rax.

We also have four new lines of code at the end of this file:

exit_with_error:
movq $60, %rax
movq $1, %rdi
syscall

The first line is just a label like the _start label we use to denote the entry point of our code. When the assembler runs this label will be removed, and the reference to it in the jump statement above will be replaced by the actual memory address of the instruction directly after the label. The three instructions after this label are an “exit with code 1” system call.

When we assemble, link and run this code, if a “outputfile.txt” file exists it will happily append to it, otherwise it will exit with 1. We can, as usual check the exit code with:

echo $?

Let’s look at comparisons and jumps a little more closely. The comparison instruction actually subtracts the second operand from the first, which is why the order of comparison is the reverse to what you might expect. So the instructions:

cmpq %rax, %rbx

subtracts the value in the rbx register from the rax register and sets the flags in the rflags appropriately. There are a number of different conditional jump instructions like jl. They all jump to a given label conditional on which flags are set in rflags. We don’t have to use a jump instruction with a comparison instruction, we could use a jump instruction at anytime to jump based on the contents of the rflags register, but it is best practice to always use the two tegether.

In the list below, we explain what each of the listed jump instructions would do if they followed a comparison of the form

cmpq X, Y

where X is a memory address, register or constant value, and Y is a memory address or register.

  • je – jump when Y is equal to X
  • jne – jump when Y is not equal to X
  • jl – jump when Y is less than X
  • jg – jump when Y is greater than X
  • jle – jump when Y is less than or equal to X
  • jge – jump when Y is greater than or equal to X
  • ja – jump when absolute value of Y is greater than absolute value of X
  • jb – jump when absolute value of Y is less than absolute value of X
  • jae – jump when absolute value of Y is greater than or equal to X
  • jbe – jump when absolute value of Y is less than or equal to X

Note, the second operand of cmpq cannot be a constant value. There is also an unconditional jumper instruction jmp, which always jumps to the specified label.

These control flow instructions are quite different from the ones you might be used to from higher level languages. However, in later posts we will see how to write code equivalent to the loops and if statements of a higher level language like C++ or python.

Leave a Reply

Your email address will not be published. Required fields are marked *