Assembly Language – Command Line Parsing part 1

We know from a previous post that when our program starts the Linux kernel will have stashed some helpful values for us on the stack. At the top of the stack we have the number of command line arguments. This value includes the name of the binary being executed. So, for example, the command line “binaryName arg1 arg2” would give 3. Next in the stack is a pointer to the name of the binary. Finally we have a pointer to the command line arguments themselves, this doesn’t include the name of the binary.

We’re going to see how to parse these arguments. First we’re going to read and display the name of the binary that is currently executing. To do this, we will need to read a value off the stack, get the length of this string by searching for the null character and then print it.

Let’s look at some code:

.equ NULL, 0

.section .data
NEW_LINE: .byte 10
.section .text

.globl _start
_start:

popq %r13

cmpq $1, %r13
jne exit_with_error

popq %r13

movq $0, %r12  # number of characters seen so far

loop_start:

movq (%r13,%r12,1),%rax

incq %r12

cmp $NULL,%al
jne loop_start 

movq %r12, %rdx
movq %r13, %rsi
movq $1, %rax
movq $1, %rdi
syscall

movq $1, %rdx
movq $NEW_LINE, %rsi
movq $1, %rax
movq $1, %rdi
syscall

exit:

movq $60, %rax
movq $0, %rdi
syscall

exit_with_error:

movq $60, %rax
movq $-1, %rdi
syscall

Firstly, I should mentions that I’ve decided to use the higher numbered registers in this example as it will make it easier to modify this code for the next post.

The very first thing we do is define a constant with equ. The constant is named NULL and has value 0. Next we define a byte in our data section named NEW_LINE with value 10. It will not surprise you to learn that 0 is the ascii code for a null character and 10 is the ascii code for a new line.

An important subtlety here is the difference between an equ constant and a value defined in the .data section. The constant values defined with equ are filled in when the assembler runs. The data section becomes part of the binary, and is copied into memory when our program is run. So we can reference a value in the .data section by it’s memory address, whereas an equ constant is really just a special name for a value.

In our text section, our first instruction is:

popq %r13

This pops the top value off the stack into the r13 register. We know that this value is the number of command line arguments. So, we check if it is equal to 1, and if not, exit with an error:

cmpq $1, %r13
jne exit_with_error

Now we pop the next value of the stack into the r13 register. This will be a pointer to the name of the currently executing binary. To get the length of this string we loop over it looking for the null character like so:

movq $0, %r12  # number of characters seen so far

loop_start:

movq (%r13,%r12,1),%rax

incq %r12

cmp $NULL,%al
jne loop_start 

To index into the string that contains the name of the binary we use index addressing mode:(%r13,%r12,1). This reads the value stored in the memory address equal to the value in r13 plus 1 times the value in r12. The r12 register is our counter that keeps track of the current character we are looking at. So as we loop we are iterating through the string.

We check to see if the current character is null with: cmp $NULL,%al. The important point here is that we are reading characters which are represented as bytes. So if we just read the null character, we would expect the lowest byte of rax to be zero. There is probably lots of junk in the higher bytes of rax that we don’t care about. We know already that the lowest byte of rax is named al, so we compare this to 0. Also we put the register second in our comparison, if we do not we will get an assembler error. This is because the size of the second operand determines the memory size we are comparing, in this case byte.

Once we find the null character we output the string to the command line as normal:

movq %r12, %rdx
movq %r13, %rsi
movq $1, %rax
movq $1, %rdi
syscall

Then we use the NEW_LINE data value we defined earlier to output a new line:

movq $1, %rdx
movq $NEW_LINE, %rsi
movq $1, %rax
movq $1, %rdi
syscall

And finally we exit with exit code 0. If you assemble, link and run this code, you should see the name of your binary file printed to the command line. In our next post we will see how to parse the command line arguments that come after the name of the binary.

Leave a Reply

Your email address will not be published. Required fields are marked *