Assembly Tutorial – Some Simple Control Flow

Let’s have another look at the code from our earlier post about writing to files. Suppose that we want to append our input text to an existing file. Here’s the code:

.section .data
filename:
.string "output\.txt"

.section .bss
.lcomm buffer_data, 500

.section .text

.globl _start
_start:

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq $2, %rax
movq $filename, %rdi
movq $0x401, %rsi
movq $0666, %rdx
syscall

movq %rax, %rdi
movq $1, %rax
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq $60, %rax
movq $0, %rdi
syscall

Now, if there is no file named “outputfile.txt”, the open system interrupt will fail and control will return to our code with a negative number in rax, rather than a file descriptor. So when we use the write system call we won’t have a proper file descriptor in rdi, which too will fail. Then our program will exit happily with status code 0.

Let’s try and improve this a little. We’re going to change the code, so that if it fails to open a file it immediately exits with exit code 1. First we’ll move the system that opens the file to the beginning of the code, so we have:

.section .data
filename:
.string "outputfile\.txt"

.section .bss
.lcomm buffer_data, 500

.section .text

.globl _start
_start:

movq $2, %rax
movq $filename, %rdi
movq $0x401, %rsi
movq $0666, %rdx
syscall

movq %rax, %rbx

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq %rbx, %rdi
movq $1, %rax
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq $60, %rax
movq $0, %rdi
syscall

With this order or system calls we have to set aside the file descriptor that is returned from the open system call in the rbx register so that it doesn’t get overwritten in the next system call. The functionality is exactly the same.

Now we are going to check the return value of the open system call and exit on error if the value is negative, here is the code:

.section .data
filename:
.string "outputfile\.txt"

.section .bss
.lcomm buffer_data, 500

.section .text

.globl _start
_start:

movq $2, %rax
movq $filename, %rdi
movq $0x401, %rsi
movq $0666, %rdx
syscall

cmpq $0, %rax
jl exit_with_error

movq %rax, %rbx

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq %rbx, %rdi
movq $1, %rax
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq $60, %rax
movq $0, %rdi
syscall

exit_with_error:
movq $60, %rax
movq $1, %rdi
syscall

There are two new lines after the first system interrupt:

cmpq $0, %rax
jl exit_with_error

The first line compares the value zero to the value in the rax register, the second line, jumps to the label ‘exit_with_error’ if the value in rax was less that zero. What’s quite un-intuitive here is that the condition in the comparison statement is true if the second value is less than the first. This is the opposite order to what you might assume. Indeed when I first wrote this code I spent a significant amount of time debugging in GDB before I realised I had made the mistake of writing:

cmpq %rax, $0
jl exit_with_error

This will jump if 0 less than rax.

We also have four new lines of code at the end of this file:

exit_with_error:
movq $60, %rax
movq $1, %rdi
syscall

The first line is just a label like the _start label we use to denote the entry point of our code. When the assembler runs this label will be removed, and the reference to it in the jump statement above will be replaced by the actual memory address of the instruction directly after the label. The three instructions after this label are an “exit with code 1” system call.

When we assemble, link and run this code, if a “outputfile.txt” file exists it will happily append to it, otherwise it will exit with 1. We can, as usual check the exit code with:

echo $?

Let’s look at comparisons and jumps a little more closely. The comparison instruction actually subtracts the second operand from the first, which is why the order of comparison is the reverse to what you might expect. So the instructions:

cmpq %rax, %rbx

subtracts the value in the rbx register from the rax register and sets the flags in the rflags appropriately. There are a number of different conditional jump instructions like jl. They all jump to a given label conditional on which flags are set in rflags. We don’t have to use a jump instruction with a comparison instruction, we could use a jump instruction at anytime to jump based on the contents of the rflags register, but it is best practice to always use the two tegether.

In the list below, we explain what each of the listed jump instructions would do if they followed a comparison of the form

cmpq X, Y

where X is a memory address, register or constant value, and Y is a memory address or register.

  • je – jump when Y is equal to X
  • jne – jump when Y is not equal to X
  • jl – jump when Y is less than X
  • jg – jump when Y is greater than X
  • jle – jump when Y is less than or equal to X
  • jge – jump when Y is greater than or equal to X
  • ja – jump when absolute value of Y is greater than absolute value of X
  • jb – jump when absolute value of Y is less than absolute value of X
  • jae – jump when absolute value of Y is greater than or equal to X
  • jbe – jump when absolute value of Y is less than or equal to X

Note, the second operand of cmpq cannot be a constant value. There is also an unconditional jumper instruction jmp, which always jumps to the specified label.

These control flow instructions are quite different from the ones you might be used to from higher level languages. However, in later posts we will see how to write code equivalent to the loops and if statements of a higher level language like C++ or python.

Assembly Tutorial – All About Registers

The registers on a CPU are the very fast and very small internal memory that the CPU, ideally, uses to do it’s calculations. The registers on a 64 bit CPU are 64 bits wide.

In the 64 bit x86 architecture there are 16 general purpose registers registers. They are named r0, r1, r2, r3r15. The first eight of these registers also have special names for historic reasons, they are, in order, rax, rbx, rcx, rdx, rsp, rbp, rsi, rdi. We usually use these names in our code as they are much more readable. Technically we can read and edit these 16 general purpose register as you wish. However three of them, rcx, rsp and rbp have certain restrictions.

The rcx register is used implicitly as the cycle counter by the loop instruction (c stands for cycle). If you use the loop instruction and the rcx register simultaneously things will go wrong. However as long as you aren’t using loop, then you are free to use rcx. The rsp register is used to point to the top of the stack in memory (sp stands for stack pointer). We only ever use rsp to access the stack. The rbp pointer is used to keep track of stack frames and we only ever use it when calling functions (bp stands for base pointer).

There is also the rip register that holds of the address in memory of the next instruction that will be executed. The kernel jumps to a different instruction by editing this register. You can access the rip register and edit it’s value if you wish, but it really isn’t advisable, if you want to jump to a different instruction you should always use the built-in jump commands.

There is also a completely separate set of registers used entirely by the kernel that you cannot access. The most interesting of these is the flags register which holds various flags that are set whenever the CPU performs a computation, for example the zero flag which indicates whether the result of the last computation was zero. These are the registers that are checked when we use a comparison instruction. There are various other registers that the kernel uses to keep track of things like the current protection level or virtual memory.

When using the 64 bit general purpose registers we usually read and write 64 bit values. However we can also read and write, 32 bit, 16 bit and 8 bit values. In my experience this is mostly useful when we want to check the value of specific bytes, for example when we are searching an ascii string for a specific character.

We can access these lower 32 bit, 16 bit and 8 bit segments of registers by adding a special suffix to the standard register names. To access the lowest byte we add b, to access the lowers two bytes we add w and to access the lowest four byte we add w. So for example to access the lower four bytes of r3, we use the name r3w, to access the lowest byte of r14 we use r14b.

Corresponding to the legacy register names, rax, rbx, rcx, rdx, rsp, rbp, rsi and rdi, there are similar legacy names for the lower portions of registers r0 to r7.

  • We access the bottom four bytes with eax, ebx, ecx, edx, esp, ebp, esi and edi.
  • The bottom two bytes are ax, bx, cx, dx, sp, bp, si and di.
  • We can access the bottom byte with al, bl, cl, dl, spl, bpl, sil, and dil.

We can also use special names to access the second to last byte of the registers rax, rbx, rcx and rdx, they are ah, bh, ch and dh. As we can see there is an inconsistent pattern to these register names. To access the bottom four bytes we replace ‘r’ with ‘e’, to access the lower two byte we remove the initial ‘r’.

When we want to use these lower portions of the registers we also have to change the instruction mnemonic we use. Normally we use movq to move a value into a register. We use a different instruction mnemonic for each different byte size.

  • To move eight bytes we use movq
  • To move four bytes we use movl
  • To move two bytes we use mov
  • To move one byte we use movb

There is one strange inconsistency with these lower registers. When we write to the lower portions of these registers the upper portions are left unaffected, except when we write to the lower four bytes, in which case the upper portion is filled with sign bits.

Let’s have a look at an example. The code below simply prints “Hello World” to the standard output and exits. It does this while only using the lower four bytes of each register.

.section .data
msg:
.string "Hello world\n"

.section .text

.globl _start
_start:

movl $1, %eax
movl $1, %edi
movl $msg, %esi
movl $12, %edx
syscall

movl $60, %eax
movl $0, %edi
syscall

A quick word on notation. Just as 8 bits make a byte, two bytes makes a word, four bytes makes a double word and eight bytes makes a quad word. Confusingly the term ‘word’ is often used to refer to the size of the largest memory address a CPU can access, that is four bytes on a 32 bit machine and eight bytes on a 64 bit machine.

Assembly Tutorial – Writing to a File

We’re going to cover one more quick file based example. To do this we will write a utility that reads from the command line and writes to a file. We will also try a couple of different file modes.

First lets look at the code:

.section .data
filename:
.string "output\.txt"

.section .bss
.lcomm buffer_data, 500

.section .text

.globl _start
_start:

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq $2, %rax
movq $filename, %rdi
movq $0x41, %rsi
movq $0666, %rdx
syscall

movq %rax, %rdi
movq $1, %rax
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq $60, %rax
movq $0, %rdi
syscall

This should all be pretty familiar by now. First we define a string named “filename” with the value “output.txt” and we define a buffer named “buffer_data” of size 500 bytes.

The first few lines of instructions:

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $500, %rdx
syscall

read from stdin into our buffer. As usual, we set rax to 0 to indicate we are reading, rdi to 0 to as that is the file descriptor of stdin, rsi to the label of the memory buffer we wish to read to and rdx to the number of bytes we want to read. We then have another interrupt that creates the file we wish to write to:

movq $2, %rax
movq $filename, %rdi
movq $0x41, %rsi
movq $0666, %rdx
syscall

We set rax to 2 as this is the linux open file system call number and the filename goes in rdi. We set rdx to 0666 to indicate that every user will have read, write and execute permissions with this file. We set rsi to the mode we would like to use when opening this file. There are quite a few different flags you can use when opening files. They include

  • Create 0x40
  • Append 0x400
  • Truncate 0x200
  • Read Only 0x0
  • Write Only 0x1
  • Read and Write 0x2

You can combine these flags with bitwise or |. Note however, that you can’t combine read only and write only in this way. Also, notice that we prefaced the numeric value here with 0x to indicate it is a hexadecimal value. We didn’t bother using this preface previously as we were using the read only flag ‘0’ which is the same in hex and decimal.

Let’s look at some examples. In our above program, we wanted create a new file, and we only write to it, so we use the create and write only flags, 0x40 and 0x1 respectively. When we take the bitwise or of these flags we get

0x40 | 0x1 = 0x41

so we set rsi to 0x41.

When we assemble, link and run this file, the command line should give us an opportunity to enter text, which then gets written to a file named “output.txt”. If a file with that name already exists, the first line of that file will be overwritten.

Suppose that, when a file named “output.txt” already exists, we would rather completely overwrite it rather than just overwriting the first line. We do this by setting rsi to 0x241 when we perform the open file system call. 0x241 is the bitwise or of the create flag 0x40, the write only flag 0x1 and the truncate flag 0x200.

Now, suppose that, instead we wanted to append to the end of an existing file rather than overwriting it. To do this, when opening the file, we set the rsi register to 0x441, that is the open flag 0x40, the write flag 0x1 and the append flag 0x400. Now we can repeatedly run our binary and append text to the end of the file.

Assembly Tutorial – Debugging Assembly Code

Coding in assembly can be quite tricky. The syntax is unintuitive. For example when we want to open a file, we don’t call a simple one line function with a name like “open”, instead we have to set multiple registers and use a system call. It is also extremely unforgiving, we can easily set the wrong register or set a register to the wrong value without noticing, and then our code will just fall over and there will be no helpful error message.

How do we diagnose these errors? Well thankfully we can debug assembly with the standard GNU debugger gdb. The process of debugging assembly in gdb is very similar to debugging C or C++.

Suppose we have an assembly program that is supposed to write to stdout but unfortunately does not:

 .section .data
msg:
.string "Hello world\n"

 .section .text

 .globl _start
_start:

 movq $1, %rax
 movq $10, %rdi
 movq $msg, %rsi
 movq $12, %rdx
 syscall


 movq $60, %rax
 movq $0, %rdi
 syscall

We have carefully combed through this code, but have not found the error yet, so we decide to debug it. To debug, first we must assemble the code with debug symbols. To do this we assemble with the extra command line option –gstabs+:

as --gstabs+ write.s -o write.o

We then link the file as we normally do:

ld write.o -o write

Now instead of running the binary file that has been created we pass it as an argument to gdb:

gdb write

This will load the write binary into gdb, after gdb spits out some general information you should have a command line that looks like

Reading symbols from write...
(gdb)

To set breakpoints in gdb we use the command:

b <filename>:<linenumber>

In our case we will set a breakpoint at the start label:

b write.s:10

We then tell gdb to start the execution of our program with the run command, ‘r’. If this runs succesfully our command line will look like:

Breakpoint 1, _start () at write.s:10
10	 movq $1, %rax
(gdb) 

While debugging we can step to the next line of code executed with the ‘s’ command. Let’s say we step all the way to line 14 as so:

(gdb) s
11	 movq $10, %rdi
(gdb) s
12	 movq $msg, %rsi
(gdb) s
13	 movq $12, %rdx
(gdb) s
14	 syscall
(gdb) 

Let’s have a look at what’s in the registers, to do this we use the ‘info registers’ command:

(gdb) info registers
rax            0x4                 1
rbx            0xa                 0
rcx            0x402000            0
rdx            0xc                 12
rsi            0x0                 4202496
rdi            0x0                 10
rbp            0x0                 0x0
rsp            0x7fffffffd870      0x7fffffffd870
r8             0x0                 0
r9             0x0                 0
r10            0x0                 0
r11            0x0                 0
r12            0x0                 0
r13            0x0                 0
r14            0x0                 0
r15            0x0                 0
rip            0x401014            0x401014 <_start+20>
eflags         0x202               [ IF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
(gdb) 

Each of the individual registers is listed along with the value it contains, first in hex and then in a more human readable form. If we just want to see the value in a single register we use ‘info registers <register name>’, for example,

info registers rdi

gives,

rdi            0xa                 10
(gdb) 

Now, as this point in the code we are attempting to write to stdout, so the rdi register should be set to 1, however we can see it is set to ten. If we look back at the code we have stepped through we see that on line 11 we set the rdi register to 10, whoops!

There are a few other useful commands we have left out.

  • info – prints a numbered list of breakpoints
  • delete <breakpoint number> – deletes a breakpoint
  • c – continues execution to the next breakoint and
  • r – can be called at any point to restart execution from the beginning.

We’ll learn more about debugging in a later post, but this should be all you need to get started!

Assembly Tutorial – Reading From a File

In previous posts we saw how to read from stdin and write to stdout. Now, we know that stdin and stdout are just special files, so we should be able to read from and write to normal files without much difficulty.

To show this, we’re going to write a simple program that writes the contents of a text file to the terminal (a bit like the cat utility). The following code, opens a file called “inputfile.txt” and then reads the first 500 bytes and writes them to stdout.


.section .data
name:
.string "inputfile.txt"
 
.section .bss
.lcomm buffer_data, 500

.section .text

.globl _start
_start:
 
movq $2, %rax
movq $name, %rdi
movq $0, %rsi
movq $0666, %rdx
syscall

movq %rax, %rdi
movq $0, %rax
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq $1, %rdi
movq $buffer_data, %rsi
movq $500, %rdx
movq $1, %rax
syscall

movq $60, %rax
movq $0, %rdi
syscall

Just like with our echo utility, we use a buffer defined in the .bss section to temporarily store the data we read. We also define the name of the file we will read in the data section.

The first thing we have to do is open the file we are interested in. We do this with the following code:

movq $2, %rax
movq $name, %rdi
movq $0, %rsi
movq $0666, %rdx
syscall

First we set rax to 2, this is the system call number for opening files. We set rdi to the label of the memory location containing the name of the file. When opening files we need to set the rsi register to indicate whether we are opening to read or write. In this case we are reading, so we set rsi to 0. Finally we set rdx with the permissions we would like this file to have. This is just the normal linux file permissions, in this case 0666 indicates that all users have read and write permission. Finally we invoke a system call and the linux kernel, should open this file. If the kernel manages to open the file successfully, the file descriptor will be in rax once control returns to our code.

Once the file is opened, we have to read it into our buffer. We do this with the following piece of code:

movq %rax, %rdi
movq $0, %rax
movq $buffer_data, %rsi
movq $500, %rdx
syscall

We are going to read from this file just like we read from stdin. The difference is that instead of putting the file descriptor for stdin (0) in rdi, now we put the file descriptor for the file we opened in there. Now, assuming everything went according to plan, the file descriptor we want will be in rax, so the first thing we do is move the value from rax to rdi. The rest of the code is the same, we will read the first 500 bytes of the file with the file descriptor in rdi into our buffer.

Our final two pieces of code just output the contents of the buffer to stdout and exit with a success code.

As usual, if this code is in a file named display.s, we assemble and link it with:

as display.s -o display.o
ld display.o -o display 

This creates a binary file that will display the contents of a file named “inputfile.txt”. If the file does not exist, the open system call will return with the error code -2 in rax. Right now we don’t check for this, but once we know a little more about control flow we will.

Assembly Tutorial – A Simple Echo Program

We are gong to write a simple program that reads from the terminal and writes the data it has read back out to the terminal. As we know from the previous post, reading and writing to the terminal is just a matter of reading and writing to two special files, stdin and stdout. These have file descriptors 0 and 1 respectively. We also need to set aside some space in memory to store the characters we read in from the terminal.

Let’s look at the code!

movq $1, %rax
movq $0, %rbx
int $0x80
.section .data 
.section .bss
.lcomm buffer_data, 500

.section .text

.globl _start
_start:

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $500, %rdx
syscall

movq $1, %rax
movq $1, %rdi
syscall

movq $60, %rax
movq $0, %rdi
syscall

We have some new syntax here. The second line:

.section .bss

indicates to the assembler that this is the section in which we will define buffers. A buffer is a chunk of contiguous memory that we use to perform input and output operation. The next line:

.lcomm buffer_data, 500

declares a buffer named buffer_data that is 500 byes long. Now we read from the terminal with a usual file I/O kernel interrupt. We set rax to 0 to indicate we are reading form this file and we set rdi to 0 as that is the file descriptor of stdin. We set rsi to the address of our buffer with

movq $buffer_data, %rsi

and we set rdx to 500 to tell the kernel we would like to read 500 bytes. Then we invoke the system call to transfer control to the kernel.

The next three lines set up the system call to write the contents of the buffer to the terminal. The registers rsi and rdx will not have been altered by the kernel, so they still contain the address of the buffer and the number of bytes, 500. We set rax to 1 to indicate that we want to write and rdi to 1 to tell the kernel that it is stdout that we want to write to.

The final three lines are the usual exit with 0. Now if you write this code in a file named echo_input.s and run

as echo_input.s -o echo_input.o
ld echo_input.o -o echo_input

you will have a new binary file named echo input. If you execute this binary, it will wait for you to enter some input and hit the return key. When you do this, it will print what you wrote out again and exit with exit code 0.

Now, in our code, we specified that our buffer is 500 bytes long, and that we should read and write 500 bytes. The read and write operations will handle data smaller than 500 bytes perfectly well. However if you try to pass in a string larger than this, only the first 500 characters will be passed to our echo utility, the shell will try, and probably fail, to execute whatever comes after that.

Assembly Tutorial – Everything is a File

At some point you may have heard:

In linux, everything is a file.

-common observation of unknown provenance

What does this actually mean though? Well in this post, we are going to find out.

In our previous post when we wanted to write to the terminal we used the following lines of code:

movq $1, %rax
movq $1, %rdi
movq $msg, %rsi
movq $12, %rdx
syscall

Now, in Linux, when we write to the terminal, we are actually writing to a special file called stdout. Rather than saving this to disk, the kernel writes the contents of the file to the terminal. The same is true for reading from the terminal, reading and writing to sockets and many other I/O operations. We can perform all of these operations in the same way, because the kernel allows us to treat them all as if we are reading from or writing to a file.

So let’s describe how we read and write to files more generally. File I/O requires a system call and we need to set four registers to give the kernel the information it needs to perform the I/O for us.

We set the rax register to 0 if we want to read from a file and 1 if we want to write to a file. We have to tell the kernel what file we would like it to read from/write to. To do this we set the register rdi with the file’s file descriptor. File descriptors are the unique numeric identifies associated with the files that the kernel knows about. For example, stdout‘s file descriptor is 1. We set rsi to be the address of the data we would like to write to the file, or the address of the memory we would like to read into. Finally, we set rdx to the size in bytes that we would like to read/write.

Assembly Tutorial – Hello World in Assembly

The first code you ever wrote was probably something like:

public class MyFirstClass {
    public static void main(String[] args) {
        System.out.println("Hello World!");
    }
}

Assembly is a little bit more complicated than Java, so it is necessary to cover some basic concepts before you are ready to write to the terminal. The good news is that we are ready to write to the terminal!

Writing to the terminal works via a system call, just like when we exited with exit code 0. Lets have a look at the code.

.section .data
msg:
.string "Hello world\n"
 
.section .text
.globl _start
_start:
  
movq $1, %rax
movq $1, %rdi
movq $msg, %rsi
movq $12, %rdx
syscall

movq $60, %rax
movq $0, %rdi
syscall

We start off with our data section. Our data section is no longer empty. Now, we define a string with the value “Hello world\n”. We give this string the label msg.

In the text section we define our entry point as before, and we exit with code 0 as before. However we also have the five lines that output our message to the terminal:

movq $1, %rax
movq $1, %rdi
movq $msg, %rsi
movq $12, %rdx
syscall

Before transferring control to the linux kernel we have to move four values into registers. First we have to put 1 in the rax register and 1 in the rdi register. Don’t worry, the significance of these two values will be explained later! We put the address of the data we would like to write in the rsi register. In this case, we can reference the address of the data we would like to write with the label msg. Finally, in the rdx register we put the number of bytes we would like to write. This is asci, so each character is one byte, so our string is 12 bytes long, including the newline character.

If you put this code in a film named helloworld.s and execute

as helloworld.s -o helloworld.o
ld helloworld.o -o helloworld

This will create a binary in the same directory named helloworld. When you run this binary you should see “Hello world” printed to the terminal!

Assembly Tutorial – How Does our Simple Program Work?

In the last post we wrote a simple assembly program, all it did was exit with status code zero. Our code was

.section .data 
.section .text 
.globl _start 
_start: 
movq $60, %rax
movq $0, %rdi 
syscall

How does this work? The first line can be ignored for now, it just denotes where we would store any user defined data, if we had any. The second line

.section .text

denotes where our actual code begins. The third and fourth lines

_globl _start
_start:

defines a special label called start. A label is a convenient human readable name for a particular memory address. Labels allow us, the programmer to reference memory addresses without having to use actual numeric addresses. Labels are obviously less error prone, but they also mean that when we change the memory layout of our code we don’t have to recalculate a lot of addresses. The CPU does not understand labels, when the assembler runs it replaces all usages of labels with the actual addresses they refer to.

As we said, _start is a special label that defines the entry point of our program, like the main method in a C program. The line

_globl _start

just makes this label available outside of the program itself. If we had left this label out, our program will still assemble, link and run successfully, as the assembler will just create a default entry point. In general we don’t want to do this, as in more complex programs the default entry point might not be the entry point we want.

To understand this program there are two things we need to understand, the first of which is the system call. The kernel is the core part of the operating system. It handles all I/O, looks after memory at a low level, writes and reads files and plenty of other things. When we want to perform any of these task we transfer control to the kernel, this is called a system call. We perform a system call with the command

sycall

This will immediately transfer control over to the kernel. However, we also need to tell the kernel what we would like it to do for us. To do this we move certain special values into specific registers. (Remember the registers are small very fast memory inside the CPU). When the Kernel takes over, it reads these registers to find out what we are asking of it.

In the above program we use two registers, rax and rdi. These are 64 bit general purpose registers. With the command

movq $60, %rax

we move the 64 bit value 1 into the register rax. With the command

movq $0, %rdi

we move the 64 bit value 0 in the register rdi. When we perform a system call the value in the rax register tells the kernel what operation we would like it to perform. In this case, the value 1 tells it that we would like to exit. When exiting the value in the rdi register will be the exit code, in this case, we are exiting with code 0.

Assembly Tutorial Writing a Simple Assembly Program

There are two different syntaxes for assembly language, AT&T and Intel. The difference is really only aesthetic, both compile to the same underlying code. For no particular reason I have been using AT&T syntax. All my code has been built and run on 64 bit Ubuntu linux, with the GNU assembly, gas, and the GNU linker ld. My code is available on github.

My first assembly program was a little something like this:

.section .data
.section .text
.globl _start
_start:
movq $60, %rax
movq $0, %rdi
syscall

Put this in a file named return.s and run

as return.s -o return.o

This calls the gnu assembler, which assembles the text you have written into object code in a file named return.o.

Then run

ld return.o -o return

This calls the GNU linker and links the object code, creating an executable file named return. Now if, in the same directory, you run

./return

your code (should) run successfully.

All this code does is tell the CPU to exit with exit code 0, so when it runs successfully, you shouldn’t really see anything. You can inspect the exit code of the last program you ran with

echo $?

which, in this case should be 0.