Assembly Tutorial – Input and Output the Right way

Up until now, whenever we’ve read from or written to a file, we’ve just put an upper bound on the number of bytes we were reading or writing. For example in our original simple echo program we used a buffer of 500 bytes, and we put the value 500 into the rdx buffer when making the read and write system calls. If we carry on this way, we’ll always have to put a maximum size on input and output. Let’s learn how to do this properly!

We are going to write another simple echo program. However, this time, we’ll use a loop to read and write the input. We’ll also use a register to store a memory address like a pointer.

Ok, let’s look at some code:

.equ BUFFER_SIZE, 20
.equ NEW_LINE, 10

.section .data
.section .bss
.lcomm buffer_data, BUFFER_SIZE

.section .text

.globl _start
_start:

read_from_buffer:

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $BUFFER_SIZE, %rdx
syscall

movq %rax, %rbx

movq $1, %rax
movq $1, %rdi
movq $buffer_data, %rsi
movq %rbx, %rdx
syscall

decq %rbx
cmpb $NEW_LINE, buffer_data(,%rbx,1)
je exit

jmp read_from_buffer

exit:
movq $60, %rax
movq $0, %rdi
syscall

The first two lines of this program introduce some new syntax. The equ keyword allows us to define a constant that will be substituted by the assembler. This is just like the #define pre-processor directive in C or C++. Here we define two constants:

.equ BUFFER_SIZE, 2o
.equ NEW_LINE, 10

The first, BUFFER_SIZE, is the size of the buffer we will be using, in this case 20 bytes. The second, NEW_LINE is just the ascii character code for a newline. Defining constants like this makes our code more readable and maintainable. Next, in the bss section we define a buffer named buffer_data of length buffer_size.

Now we have the meat of our program: a loop that starts with the label read_from_buffer. Inside this loop we have a read system call, a write system call, and a conditional jump.

The read system call:

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $BUFFER_SIZE, %rdx
syscall

reads BUFFER_SIZE worth of data from stdin to our buffer buffer_data. When control returns from the read system call the kernel will leave a return value in the register rax. This value will either be the number of bytes that the kernel read or a negative number indicating an error. For now, we ignore the error case. So, we move the value in rax into rbx to save it. Then we perform a write system call:

movq $1, %rax
movq $1, %rdi
movq $buffer_data, %rsi
movq %rbx, %rdx
syscall

The only new point here is that we move the value in rbx into rdx. This means we only ask the kernel to write the number of bytes that were actually read. Now, we do a conditional jump:

decq %rbx
cmpb $NEW_LINE, buffer_data(,%rbx,1)
je exit

We are checking to see if the final character we have read from stdin is a newline. The register rbx, contains the number of bytes we have read. So, to get the index of the last byte that we read, we decrement it. Then we use index addressing mode, buffer_data(,%rbx,1), to get the value of the last byte that we have read. This tells the cpu to read the value in rbx and count that many bytes past the start of buffer_data and load the value it finds. We compare this value with the ascii value for a newline. If the final character was a newline, we jump to the usual exit with code 0. Otherwise, the next instruction is the unconditional jump jmp read_from_buffer which brings us back to the start of the loop.

When we assemble, link and run this code, once it hits the first input system call, the shell will prompt the user for input on the command line. Suppose the user enters some text and hits enter. The kernel stores this text in the stdin file. In our system call we only asked for 20 bytes, so the kernel copies (at most) 20 bytes into our buffer, and discards them from stdin. The rest of the text that was input persists in stdin. Once we’ve written these bytes to stdout we can go back and read the next chunk from stdin. However, the user only gets prompted once, even though our code reads from the stdin file multiple times.

So now we know how to read and write input the proper way!

Leave a Reply

Your email address will not be published. Required fields are marked *