Assembly Tutorial – The Data Section

So far we have only used the data section of our programs to define data and to read that data at runtime. However, we can also write to the data section! Try running the following code through the debugger:

.section .data
var:
.byte 5

.section .text

.globl _start
_start:

movq $10, var

movq $60, %rax
movq $0, %rdi
syscall

Using the techniques we learned in a previous post, you will see that the line

movq $10, var

changes the value of the byte of memory named var from 5 to 10.

If you want to define read only data in your binary, you can do so with the .rodata section. This works just like the data section. However, if you try to write to the memory here, you will get a seg-fault.

Assembly Tutorial – Advanced debugging Techniques

We have already seen the basics of debugging assembly code with GDB. We covered assembling our code with debug symbols, setting break points, stepping through the code as it executes and inspecting the contents of registers. Now it is time to learn some more advanced techniques!

Command Line Arguments

Sometimes when we run an executable we pass in command line arguments. We can also do this when debugging with GDB. There are two ways to do this. We can put the command line arguments after the run command r. So if echo_input was the name of a binary, and we wanted to debug it with the command line arguments “Hello World” we would, load it into GDB as normal, and then start execution with

r Hello World

Alternatively we can load the executable with our command line arguments directly by using the --args option. So, in our example we would execute:

gdb --args ./echo_input Hello World

This is very useful when debugging the code in our previous posts covering command line arguments!

Inspecting Memory

We know how to read the data stored in registers, but when we’re debugging we often want to read values stored in memory. Suppose, for example, we have a register that we are using as a pointer. We can use the info registers command to see what memory address is stored in the register. To see what data is stored in that memory address we can use the x command.

The x command prints out the value at a given memory address. We provide a suffix to specify how much memory to read and how to display it.

Let’s have a look at an example. Suppose we have defined a byte in our data section with value 12, and that we have moved the address of this byte into the register rax. In GDB we use the command

info registers rax

to read this value from rax. Let’s say that the output of this command is

rdi            0x40200b            4202507

So the address of the our data is 4202507, or 0x40200b in hex. We can read the value stored at this memory address with the command

x/bd 0x40200b

The output of this will be 0x40200b: 12, that is, the address followed by the value 12, as we would expect.

The suffix bd tells gdb to read a byte (b) of memory and display the result as a decimal (d). We can display value as hexadecimal with x, octal with o, binary with t and unsigned decimal with u. We can specify the size to read as a byte with b, 2 bytes with h, 4 bytes with w and 8 bytes with g.

Let’s say we have defined a short in our data section named numShort that has value 256. This will take up more than one byte, it will appear in memory as the byte 00000000 followed by 000000001. So the command x/bt will read the first byte, 00000000, and the command x/ht will read two bytes giving 0000000100000000.

We can also output more than one value at once, by adding a multiplier to the suffix. So if we apply the command x/2bt to the memory address referenced by the name numShort our output will be:

0x40200c:	00000000	00000001

We can also read and output character values. Suppose we have declared a string in our data section with the value “outputFile”. If we inspect the address of this memory with x/bx, we will get 0x6f. If you look this value up in an ascii table, you will see this is the hex value of the character ‘o’. This is, or course, the first character of our string.

To output these values as characters directly we use the c suffix like so:

x/bc 4202496

the output will be: 111 'o'. That is the decimal ascii value of the character ‘o’ and the character ‘o’. We can even read multiple values at once. For example the command

x/5bc 4202496

will output:

111 'o'	117 'u'	116 't'	112 'p'	117 'u' 

that is, we have read five characters starting at the memory address 4202496. Of course reading individual characters like this would be a little tedious, and we can just read entire strings out of memory. The command:

x/s 4202496

will output the entire string: "outputFile".

The x command doesn’t work just on the data section, we can use it to inspect any memory address! For example we can check the values in our buffers defined in the .bss section with x.

Our instructions are also stored in memory, and we can read those with x as well. The register rip is the instruction pointer and contains the address of the next instruction that will be executed. So, we can get that address in GDB with the command:

info registers rip

If we use x/i on the returned address we will see something like:

=> 0x401000 <_start>:	mov    $0x32,%rax

Once we specify a particular size and format, when we execute x without a suffix, it will use the same size and format. The default is to read 4 bytes and display in hexadecimal.

In our example we read the memory addresses from registers. However, we can also use the names of memory addresses directly with the & operator. For example:

x/s &filename

will print the content of the memory address named filename. This works for buffers defined in the .bss section and for data declared in the .data section. We can also use arithmetic expressions when specifying the memory address to inspect. For example to read the byte that is located 3 bytes pas the memory address 0x40200b we would use the command

x/bx 0x40200b + 3

The units we are counting in are the same as the units we are reading, so if we wanted to read the fifth 2 byte value after the memory address 0x40200b we would use:

x/hx 0x40200b + 5

Conditional BreakPoints and Watches

To help with the monotony of debugging we use conditional breakpoints. These are breakpoints that come with a logical condition on a register. The breakpoint will only stop execution when the condition is satisfied. Suppose our source code is in a file named, echo_input.s, and we would like to break at line 12 whenever the register rbx has value 4. We can do that with the following command:

b echo_input.s:12 if $rbx == 4

Note, we put a dollar sign $ before the name of the register. This is a little confusing, as we usually use dollar signs for constant values. We can use any of the typical binary comparison operators, ==, !=, <, >, ,<= and >= when specifying the condition. One thing that can catch you out here, is that the break point condition is evaluated before the instruction on that line executes.

We can also set watches, which are very similar to conditional breakpoints but more general. With a watch, we specify a condition on a register and code execution will halt whenever that condition is satisfied. We do not have to specify a particular line of code. To set a watch that will halt whenever the rcx is greater than 5, we use the command:

watch $rcx > 5

note, we don’t specify the file name, and we still use the dollar sign.

I have covered some pretty technical stuff in this post, I’d recommend you experiment with it all yourself to get a feel for these techniques!

Assembly Language – Command Line Parsing Part 2

So, we’ve already seen how the stack works and how we can read the name of the currently executing binary off the stack. Now it’s time to actually parse command line arguments that come after the name of the binary. Reading the arguments is a little bit more complicated because we do not know in advance how many there will be.

Luckily for us, the kernel gives us a little help. Before execution of our program begins the kernel reads all the space separated arguments after the binary name. It puts these arguments into one chunk of contiguous memory as null terminated strings. Then it puts the address of this piece of memory on the stack. Finally, it puts the number of command line arguments it saw onto the stack.

So, the first thing we do is read the number of command line arguments off the stack. Then we read the address of the start of the strings. Then we iterate over this block of memory, printing each string as we see it. We know how many arguments to expect, so when we have seen that many we quit.

OK, it’s time to look at the code!

.equ NULL, 0

.section .data
NEW_LINE: .byte 10
.section .text

.globl _start
_start:

popq %rbx

decq %rbx

cmpq $0, %rbx
je exit_with_error

popq %r13 # The name of the currently executing binary
popq %r13

movq $0, %r8   # number of nulls seen so far
movq $0, %r9   # number of characters since last null 
movq $0, %r10  # number of characters up to last null
movq $0, %r12  # number of characters seen so far

loop_start:
movq (%r13,%r12,1),%rax

incq %r9
incq %r12

cmp $NULL,%al
jne loop_start 

movq %r9, %rdx
movq %r13, %rsi
addq %r10, %rsi
movq $1, %rax
movq $1, %rdi
syscall

movq $1, %rdx
movq $NEW_LINE, %rsi
movq $1, %rax
movq $1, %rdi
syscall

addq %r9, %r10
movq $0,%r9

incq %r8
cmpq %r8,%rbx
je exit

jmp loop_start

exit:

movq $60, %rax
movq $0, %rdi
syscall

exit_with_error:
movq $60, %rax
movq $-1, %rdi
syscall

First we pop the number of command line arguments off the stack into the rbx register. We decrement this value with decq because it will include the name of the binary itself. We check that there are a non-zero number of command line arguments. Then we pop the binary name, which we won’t be using. After that we pop the memory address of the actual arguments into r13.

Next we set up four different registers as counters we will use when looping over the command line arguments. As the strings in memory are null-terminated we can keep track of them via null characters. So, the counters are: the number of nulls we have seen so far, the number of characters we have seen since the last null, the number of characters that came before the last null and the total number of characters we have seen overall.

movq $0, %r8   # number of nulls seen so far
movq $0, %r9   # number of characters since last null 
movq $0, %r10  # number of characters up to last null
movq $0, %r12  # number of characters seen so far

Now the loop itself begins. The first part of this loop indexes into the memory location beginning at r13 until we see a null character, incrementing r9 and r12 as we go.

loop_start:
movq (%r13,%r12,1),%rax

incq %r9
incq %r12

cmp $NULL,%al
jne loop_start 

Whenever we do see a null character we proceed to the next section. This is where we write the current string to the terminal.

movq %r9, %rdx
movq %r13, %rsi
addq %r10, %rsi
movq $1, %rax
movq $1, %rdi
syscall

movq $1, %rdx
movq $NEW_LINE, %rsi
movq $1, %rax
movq $1, %rdi
syscall

The register r9 contains the number of characters since the last null, so that is the length of the current string. The memory address of the start of this block of memory is in r13, the number of characters that we saw up to the last null are in r10, so the memory address of the start of this string is the sum of those two values. We also print a newline so make our output a little prettier.

Once we have outputted the current string, we update our other counters and check to see if we have read all the command line arguments.

addq %r9, %r10
movq $0,%r9

incq %r8
cmpq %r8,%rbx
je exit

jmp loop_start

First we update the value in r10 to contain the number of characters up to the null we have just seen by adding on the value in r9. Then we reset r9 to zero. Register r8 keeps track of the number of nulls we have seen so far, so we increment it and compare against rbx. If they are equal, we jump straight to the exit. Otherwise we jump back to the start of the loop and carry on.

So, we now know how to parse our command lines. The kernel also copies the current environment variables into memory and leaves a pointer to them on the stack. These are a little bit harder to parse, so we will ignore them for now.

Assembly Language – Arithmetic Instructions

Before we continue with command line parsing, we will have a brief diversion covering how arithmetic instructions work. We have already seen the increment and decrement instructions, incq and decq. These add one and subtract one from the value in a register. Now we will be covering more general arithmetic operations. In a previous post we saw how to use comparison instructions. Arithmetic instructions are really quite similar.

If we wish to add two quad-word values, we use the addq instruction. The syntax is:

addq X, Y

where X is the name of a register or a constant value and Y is the name of a register. So the instruction addq $17, %rax adds 17 to the value in register rax and stores the result in rax. To subtract we use the subq instruction which uses the exact same syntax.

There are two different multiplication instructions. The first, imulq, works just like the addq and subq instructions. This performs signed multiplication. However, as the result is stored in a single 64 bit register this instruction can quite easily lead to an overflow. Indeed, if we try to use constant values that are too large the assembly step will fail. For example the instruction

imulq $0x8000000, %rax 

will cause an error when you try and assemble. This is, roughly, because max positive value you can store in 32 bits is 7FFFFFFF. However, you can still move this value into a register and multiply that way.

There is another multiply syntax that allows us to multiply 64 bit numbers without overflow. This syntax uses two instruction names, mulq and imulq, but it takes a single register value. The instruction imulq performs signed multiplication and the instruction mulq performs unsigned multiplication. These instructions multiply the value in the supplied register by whatever value is in the rax register and stores the result across rdx and rax. The lower 64 bits are stored in rax and the upper 64 bits in rdx.

To perform division we use idivq and divq. As before, idivq is signed division, and divq is unsigned division. The division instructions take a single argument, the name of a register. With these instructions, the CPU takes the values in rax and rdx as a single value, rax is the lower 64 bits and rdx is the upper 64 bits. It divides this value by the value in the register supplied. The result of this division is then stored in rax and the remainder is stored in rdx.

There is also a unary negation operation negq, that negates the value in a register.

Many of these instructions will overflow. And, unlike in some higher level languages, our program will continue to execute happily with whatever values the registers now contain. To avoid this behaviour we use a special instruction: jo. This is the jump on overflow instruction. Whenever an arithmetic operation that causes an overflow occurs the CPU sets the overflow flag. The jo instruction jumps conditioned on this flag. If the flag is set, execution jumps to the address supplied.

There are also versions of the above arithmetic operations for non-quad words. However we aren’t particularly interested in them right now.

Assembly Tutorial – I/O Bringing it all Together

We’ve seen in previous posts how to handle errors when writing to files and how to read and write arbitrary numbers of bytes to files. It’s time we put this all together! We are going to write a program that will read an arbitrary number of bytes from the command line and write them to a file. If our program encounters any errors it will gracefully exit with code 1. Here’s the code:

.equ BUFFER_SIZE, 20
.equ NEW_LINE, 10

.section .data
filename:
.string "outputfile\.txt"

.section .bss
.lcomm buffer_data, BUFFER_SIZE

.section .text

.globl _start
_start:

movq $2, %rax
movq $filename, %rdi
movq $0x441, %rsi
movq $0666, %rdx
syscall

cmpq $0, %rax
jl exit_with_error

movq %rax, %r9

read_from_buffer:

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $BUFFER_SIZE, %rdx
syscall

cmpq $0, %rax
jl exit_with_error

movq %rax, %rbx

movq $1, %rax
movq %r9, %rdi
movq $buffer_data, %rsi
movq %rbx, %rdx
syscall

cmpq $0, %rax
jl exit_with_error

decq %rbx
cmpb $NEW_LINE, buffer_data(,%rbx,1)
je exit

jmp read_from_buffer

exit:
movq $60, %rax
movq $0, %rdi
syscall

exit_with_error:
movq $60, %rax
movq $1, %rdi
syscall

Nothing we have done here is really new. We start by defining some constants and a 20 byte buffer. Then, in the text section, we open the file named “outputfile.txt”.

movq $2, %rax
movq $filename, %rdi
movq $0x441, %rsi
movq $0666, %rdx
syscall

When we open the file we use the flag value 0x441. This flag tells the kernel three things: that we want to open the file in write mode, that we want to create a file if it doesn’t exist and that we want to append to the end of a file if it does exist.

After the open system call, we check if the return value of this system call is negative:

cmpq $0, %rax
jl exit_with_error

movq %rax, %rbx

If so, we jump straight to exit_with_error, otherwise we stash the return value in r9. You have to be careful to stash values you want to keep somewhere they won’t get over written during the next system call. We don’t use rsi, rdi or rdx as we are using to pass values to the kernel. The registers rcx and r11 will, in general, have their values overwritten during a system calls and rax will contain return values. So we choose rbx and r9 as our two stash registers.

Now, once we’ve opened our file, we enter a loop. Our loop starts with the label read_from_buffer. As before, at the end of each loop we check if the last character we have read is a new line and if so, jump to the exit, otherwise we jump back to the loop start:

decq %rbx
cmpb $NEW_LINE, buffer_data(,%rbx,1)
je exit

jmp read_from_buffer

Inside this loop we have our read and write system calls:

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $BUFFER_SIZE, %rdx
syscall

cmpq $0, %rax
jl exit_with_error

movq %rax, %rbx

movq $1, %rax
movq %r9, %rdi
movq $buffer_data, %rsi
movq %rbx, %rdx
syscall

cmpq $0, %rax
jl exit_with_error

We now have a conditional jump statement after each of these calls. If control returns from either with a negative number in rax we jump straight to exit_with_error.

That’s it, we’ve covered everything we will need to handle input and output in x86 assembly! It’s been quite a journey. In our next few posts we’ll be trying something slightly different.

Assembly Tutorial – Input and Output the Right way

Up until now, whenever we’ve read from or written to a file, we’ve just put an upper bound on the number of bytes we were reading or writing. For example in our original simple echo program we used a buffer of 500 bytes, and we put the value 500 into the rdx buffer when making the read and write system calls. If we carry on this way, we’ll always have to put a maximum size on input and output. Let’s learn how to do this properly!

We are going to write another simple echo program. However, this time, we’ll use a loop to read and write the input. We’ll also use a register to store a memory address like a pointer.

Ok, let’s look at some code:

.equ BUFFER_SIZE, 20
.equ NEW_LINE, 10

.section .data
.section .bss
.lcomm buffer_data, BUFFER_SIZE

.section .text

.globl _start
_start:

read_from_buffer:

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $BUFFER_SIZE, %rdx
syscall

movq %rax, %rbx

movq $1, %rax
movq $1, %rdi
movq $buffer_data, %rsi
movq %rbx, %rdx
syscall

decq %rbx
cmpb $NEW_LINE, buffer_data(,%rbx,1)
je exit

jmp read_from_buffer

exit:
movq $60, %rax
movq $0, %rdi
syscall

The first two lines of this program introduce some new syntax. The equ keyword allows us to define a constant that will be substituted by the assembler. This is just like the #define pre-processor directive in C or C++. Here we define two constants:

.equ BUFFER_SIZE, 2o
.equ NEW_LINE, 10

The first, BUFFER_SIZE, is the size of the buffer we will be using, in this case 20 bytes. The second, NEW_LINE is just the ascii character code for a newline. Defining constants like this makes our code more readable and maintainable. Next, in the bss section we define a buffer named buffer_data of length buffer_size.

Now we have the meat of our program: a loop that starts with the label read_from_buffer. Inside this loop we have a read system call, a write system call, and a conditional jump.

The read system call:

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $BUFFER_SIZE, %rdx
syscall

reads BUFFER_SIZE worth of data from stdin to our buffer buffer_data. When control returns from the read system call the kernel will leave a return value in the register rax. This value will either be the number of bytes that the kernel read or a negative number indicating an error. For now, we ignore the error case. So, we move the value in rax into rbx to save it. Then we perform a write system call:

movq $1, %rax
movq $1, %rdi
movq $buffer_data, %rsi
movq %rbx, %rdx
syscall

The only new point here is that we move the value in rbx into rdx. This means we only ask the kernel to write the number of bytes that were actually read. Now, we do a conditional jump:

decq %rbx
cmpb $NEW_LINE, buffer_data(,%rbx,1)
je exit

We are checking to see if the final character we have read from stdin is a newline. The register rbx, contains the number of bytes we have read. So, to get the index of the last byte that we read, we decrement it. Then we use index addressing mode, buffer_data(,%rbx,1), to get the value of the last byte that we have read. This tells the cpu to read the value in rbx and count that many bytes past the start of buffer_data and load the value it finds. We compare this value with the ascii value for a newline. If the final character was a newline, we jump to the usual exit with code 0. Otherwise, the next instruction is the unconditional jump jmp read_from_buffer which brings us back to the start of the loop.

When we assemble, link and run this code, once it hits the first input system call, the shell will prompt the user for input on the command line. Suppose the user enters some text and hits enter. The kernel stores this text in the stdin file. In our system call we only asked for 20 bytes, so the kernel copies (at most) 20 bytes into our buffer, and discards them from stdin. The rest of the text that was input persists in stdin. Once we’ve written these bytes to stdout we can go back and read the next chunk from stdin. However, the user only gets prompted once, even though our code reads from the stdin file multiple times.

So now we know how to read and write input the proper way!

Assembly Tutorial – Looping

We’re going to write a simple program that demonstrates how to loop in assembly. We won’t be using a direct loop construct like in a higher level language. Instead, we’ll be using the jump and comparison instructions we covered in the a previous post.

We can loop infinitely over a block of code in assembly using a label and an unconditional jump:

loop_start:
### code that get's looped over 
jmp loop_start

Usually we don’t want an infinite loop in our code. So we put a conditional jump inside the loop that jumps to a label after the loop ends. Let’s have a look at an example. We’re going to write some code that uses a loop to print 10 asterisks to the terminal and a new line and then exits.

.section .data
asterick: .byte 0x2A
newline: .byte 0xA

.globl _start
_start:

movq $0, %rbx

loop_start:

movq $1, %rax
movq $1, %rdi
movq $asterick, %rsi
movq $1, %rdx
syscall

incq %rbx

cmpq $10, %rbx
jge exit

jmp loop_start

exit:

movq $1, %rax
movq $1, %rdi
movq $newline, %rsi
movq $1, %rdx
syscall

movq $60, %rax
movq $0, %rbx
syscall

In the data section of this code we declare two separate bytes in memory. The first byte is labelled ‘asterick’ and has hex value 2A (the hex value of an asterick). The second is label ‘newline’ and has hex value A (the hex value for a new line).

Then we have our loop:

movq $0, %rbx

loop_start:

movq $1, %rax
movq $1, %rdi
movq $asterick, %rsi
movq $1, %rdx
syscall

incq %rbx

cmpq $10, %rbx
jge exit

jmp loop_start

In this loop we are using the register rbx as our loop counter, so we begin by setting it to 0. Then we have the usual system call to write to stdout. We give the kernel the memory address of the byte in memory that contains the hex code for an asterisk. There is an important point here. The write system call takes a memory address not a value. If we want to print an asterisk, we cannot just pass it the hex value for an asterisk, we have to pass it the memory address of a byte containing an asterisk.

Once we have performed this system call, we must increment our counter. We do this with the instruction:

incq %rbx

This instruction does a 64 bit increment of the value in the register rbx. incq is one of the special instructions we can use to increment and decrement register values. They come in the usual instruction size variations. The instructions incq, incl, incw and incb increment 8 bytes, 4 bytes, 2 bytes and 1 byte respectively. Similarly the instructions decq, decw, decw and decb decrement 8 bytes, 4 bytes, 2 bytes and 1 bytes respectively.

Once we have incremented our counter, we check if the value is greater or equal to 10. If the value in the register rbx was less than ten we move straight to the next instruction:

jmp loop_start

which jumps back to the start of the loop. Notice that we jump back to the next instruction after we set up our loop counter in rbx. If we had put the loop start label one instruction earlier, our loop would run indefinitely, because the counter would have reset to 0 on every iteration.

If however, our counter in rbx is greater or equal to 10 we jump straight to the labelled exit section:

exit:

movq $1, %rax
movq $1, %rdi
movq $newline, %rsi
movq $1, %rdx
syscall

movq $60, %rax
movq $0, %rbx
syscall

This section prints a new line and then exits with exit code 0 as usual. We now know how to do conditional branching and looping in assembly!

Assemly Tutorial – Register Access and Pointers

So far we have used registers in two distinct ways. The first way is when we load values into registers and compare them to other values, like in the following code:

movq $4, %rax
movq $3, %rbx
cmpq %rbx, %rax

We’ve also used registers to store memory addresses when we used system calls. For example if we wanted to write the 50 bytes from the buffer named data_buffer to stdout we would use code like the following:

movq $1, %rax
movq $1, %rdi
movq $data_buffer, %rsi
movq $50, %rdx

In the third line, $data_buffer, is the address of the buffer, so when we make the system call, the register contains an address for the data we are interested in rather than the data itself.

We can use registers like this more generally. Indeed, to access the value stored at the memory location contained in a register we wrap the register name in brackets, as below.

cmpq $0, (%rax)

In the above code, if the value stored in rax is an accessible address in memory, that contains a value equal to zero, then the above condition is true. If rax contains an accessible address in memory that contains a value other than zero the above condition is false. If the register rax contains the address of a region of memory we cannot access, for example the region before the instructions, the we will get a segmentation fault when our program runs.

We can also offset the address in a register by placing a constant value in front of the brackets like so:

cmpq $0, 8(%rax)

this value can be positive or negative and is specified in bytes.

Often however, we want to dynamically calculate addresses in our code, we do that with indexed addressing mode. This allows us to provide a constant base address, a constant multiplier, and two registers representing an offset and a multiplier. Specifically,

data_buffer(%rax, %rbx, 2)

refers to the memory address found when you start at the address of data_buffer, add the value contained in rax and 2 times the value in rbx (with all numeric values specified in bytes). Unfortunately you cannot use a negative multiple here. This addressing mode is particularly useful when we are iterating through strings or arrays of contiguous memory.

Assembly Tutorial – Some Simple Control Flow

Let’s have another look at the code from our earlier post about writing to files. Suppose that we want to append our input text to an existing file. Here’s the code:

.section .data
filename:
.string "output\.txt"

.section .bss
.lcomm buffer_data, 500

.section .text

.globl _start
_start:

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq $2, %rax
movq $filename, %rdi
movq $0x401, %rsi
movq $0666, %rdx
syscall

movq %rax, %rdi
movq $1, %rax
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq $60, %rax
movq $0, %rdi
syscall

Now, if there is no file named “outputfile.txt”, the open system interrupt will fail and control will return to our code with a negative number in rax, rather than a file descriptor. So when we use the write system call we won’t have a proper file descriptor in rdi, which too will fail. Then our program will exit happily with status code 0.

Let’s try and improve this a little. We’re going to change the code, so that if it fails to open a file it immediately exits with exit code 1. First we’ll move the system that opens the file to the beginning of the code, so we have:

.section .data
filename:
.string "outputfile\.txt"

.section .bss
.lcomm buffer_data, 500

.section .text

.globl _start
_start:

movq $2, %rax
movq $filename, %rdi
movq $0x401, %rsi
movq $0666, %rdx
syscall

movq %rax, %rbx

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq %rbx, %rdi
movq $1, %rax
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq $60, %rax
movq $0, %rdi
syscall

With this order or system calls we have to set aside the file descriptor that is returned from the open system call in the rbx register so that it doesn’t get overwritten in the next system call. The functionality is exactly the same.

Now we are going to check the return value of the open system call and exit on error if the value is negative, here is the code:

.section .data
filename:
.string "outputfile\.txt"

.section .bss
.lcomm buffer_data, 500

.section .text

.globl _start
_start:

movq $2, %rax
movq $filename, %rdi
movq $0x401, %rsi
movq $0666, %rdx
syscall

cmpq $0, %rax
jl exit_with_error

movq %rax, %rbx

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq %rbx, %rdi
movq $1, %rax
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq $60, %rax
movq $0, %rdi
syscall

exit_with_error:
movq $60, %rax
movq $1, %rdi
syscall

There are two new lines after the first system interrupt:

cmpq $0, %rax
jl exit_with_error

The first line compares the value zero to the value in the rax register, the second line, jumps to the label ‘exit_with_error’ if the value in rax was less that zero. What’s quite un-intuitive here is that the condition in the comparison statement is true if the second value is less than the first. This is the opposite order to what you might assume. Indeed when I first wrote this code I spent a significant amount of time debugging in GDB before I realised I had made the mistake of writing:

cmpq %rax, $0
jl exit_with_error

This will jump if 0 less than rax.

We also have four new lines of code at the end of this file:

exit_with_error:
movq $60, %rax
movq $1, %rdi
syscall

The first line is just a label like the _start label we use to denote the entry point of our code. When the assembler runs this label will be removed, and the reference to it in the jump statement above will be replaced by the actual memory address of the instruction directly after the label. The three instructions after this label are an “exit with code 1” system call.

When we assemble, link and run this code, if a “outputfile.txt” file exists it will happily append to it, otherwise it will exit with 1. We can, as usual check the exit code with:

echo $?

Let’s look at comparisons and jumps a little more closely. The comparison instruction actually subtracts the second operand from the first, which is why the order of comparison is the reverse to what you might expect. So the instructions:

cmpq %rax, %rbx

subtracts the value in the rbx register from the rax register and sets the flags in the rflags appropriately. There are a number of different conditional jump instructions like jl. They all jump to a given label conditional on which flags are set in rflags. We don’t have to use a jump instruction with a comparison instruction, we could use a jump instruction at anytime to jump based on the contents of the rflags register, but it is best practice to always use the two tegether.

In the list below, we explain what each of the listed jump instructions would do if they followed a comparison of the form

cmpq X, Y

where X is a memory address, register or constant value, and Y is a memory address or register.

  • je – jump when Y is equal to X
  • jne – jump when Y is not equal to X
  • jl – jump when Y is less than X
  • jg – jump when Y is greater than X
  • jle – jump when Y is less than or equal to X
  • jge – jump when Y is greater than or equal to X
  • ja – jump when absolute value of Y is greater than absolute value of X
  • jb – jump when absolute value of Y is less than absolute value of X
  • jae – jump when absolute value of Y is greater than or equal to X
  • jbe – jump when absolute value of Y is less than or equal to X

Note, the second operand of cmpq cannot be a constant value. There is also an unconditional jumper instruction jmp, which always jumps to the specified label.

These control flow instructions are quite different from the ones you might be used to from higher level languages. However, in later posts we will see how to write code equivalent to the loops and if statements of a higher level language like C++ or python.

Assembly Tutorial – All About Registers

The registers on a CPU are the very fast and very small internal memory that the CPU, ideally, uses to do it’s calculations. The registers on a 64 bit CPU are 64 bits wide.

In the 64 bit x86 architecture there are 16 general purpose registers registers. They are named r0, r1, r2, r3r15. The first eight of these registers also have special names for historic reasons, they are, in order, rax, rbx, rcx, rdx, rsp, rbp, rsi, rdi. We usually use these names in our code as they are much more readable. Technically we can read and edit these 16 general purpose register as you wish. However three of them, rcx, rsp and rbp have certain restrictions.

The rcx register is used implicitly as the cycle counter by the loop instruction (c stands for cycle). If you use the loop instruction and the rcx register simultaneously things will go wrong. However as long as you aren’t using loop, then you are free to use rcx. The rsp register is used to point to the top of the stack in memory (sp stands for stack pointer). We only ever use rsp to access the stack. The rbp pointer is used to keep track of stack frames and we only ever use it when calling functions (bp stands for base pointer).

There is also the rip register that holds of the address in memory of the next instruction that will be executed. The kernel jumps to a different instruction by editing this register. You can access the rip register and edit it’s value if you wish, but it really isn’t advisable, if you want to jump to a different instruction you should always use the built-in jump commands.

There is also a completely separate set of registers used entirely by the kernel that you cannot access. The most interesting of these is the flags register which holds various flags that are set whenever the CPU performs a computation, for example the zero flag which indicates whether the result of the last computation was zero. These are the registers that are checked when we use a comparison instruction. There are various other registers that the kernel uses to keep track of things like the current protection level or virtual memory.

When using the 64 bit general purpose registers we usually read and write 64 bit values. However we can also read and write, 32 bit, 16 bit and 8 bit values. In my experience this is mostly useful when we want to check the value of specific bytes, for example when we are searching an ascii string for a specific character.

We can access these lower 32 bit, 16 bit and 8 bit segments of registers by adding a special suffix to the standard register names. To access the lowest byte we add b, to access the lowers two bytes we add w and to access the lowest four byte we add w. So for example to access the lower four bytes of r3, we use the name r3w, to access the lowest byte of r14 we use r14b.

Corresponding to the legacy register names, rax, rbx, rcx, rdx, rsp, rbp, rsi and rdi, there are similar legacy names for the lower portions of registers r0 to r7.

  • We access the bottom four bytes with eax, ebx, ecx, edx, esp, ebp, esi and edi.
  • The bottom two bytes are ax, bx, cx, dx, sp, bp, si and di.
  • We can access the bottom byte with al, bl, cl, dl, spl, bpl, sil, and dil.

We can also use special names to access the second to last byte of the registers rax, rbx, rcx and rdx, they are ah, bh, ch and dh. As we can see there is an inconsistent pattern to these register names. To access the bottom four bytes we replace ‘r’ with ‘e’, to access the lower two byte we remove the initial ‘r’.

When we want to use these lower portions of the registers we also have to change the instruction mnemonic we use. Normally we use movq to move a value into a register. We use a different instruction mnemonic for each different byte size.

  • To move eight bytes we use movq
  • To move four bytes we use movl
  • To move two bytes we use mov
  • To move one byte we use movb

There is one strange inconsistency with these lower registers. When we write to the lower portions of these registers the upper portions are left unaffected, except when we write to the lower four bytes, in which case the upper portion is filled with sign bits.

Let’s have a look at an example. The code below simply prints “Hello World” to the standard output and exits. It does this while only using the lower four bytes of each register.

.section .data
msg:
.string "Hello world\n"

.section .text

.globl _start
_start:

movl $1, %eax
movl $1, %edi
movl $msg, %esi
movl $12, %edx
syscall

movl $60, %eax
movl $0, %edi
syscall

A quick word on notation. Just as 8 bits make a byte, two bytes makes a word, four bytes makes a double word and eight bytes makes a quad word. Confusingly the term ‘word’ is often used to refer to the size of the largest memory address a CPU can access, that is four bytes on a 32 bit machine and eight bytes on a 64 bit machine.