Learning How to Write in Assembly for Linux 1

Well, it's been almost a year since my last post and a lot has transpired since then. The good news is that I am no longer under a boss that stifles creativity and learning. The bad news is that its farther removed from hacking. However, I have had some time lately to teach myself some Assembly language and so I'd like to share what I've done so far.

The reason I want to learn assembly is simply because it interests me. There's no requirement at work to use it--nothing prohibiting me from using it either-- just a simple desire to understand it better. That and maybe someday I'll create something cool with it.

Now that we're in the right mindset, you'll need to install an assembly compiler (I'm using nasm) and a linker (ld works great and is generally preinstalled on any given linux distro--you can check by just typing ld at a console).

The structure of an assembly file can vary from compiler to compiler, or even just depending on its purpose. At the very least there must be a .text section, like so:

     section .text


See, that wasnt so difficult. Under this section though we need to give our program the equivalent of a 'main()' so that it knows where to start from. In assembly it's done as so:

     section .text
     global _start
     _start:

Now we have to decide what we want our program to do. For now lets just to a basic "Hello World!" program. We'll get fancier later on.
To begin, we should first store the string "Hello World!" in a variable in our program. (Well, technically there arent any variables in assembly, there are only pointers/labels.) Now we must note that the .text section is only for program commands, and not for variable/data storage. For that we need a .data section. Inside this section we will give the data a label as well.

     section .text
     global _start
     _start:


     section .data
     msg db 'Hello World!',0xa     ;the 0xa (or 10d) is the code for a carriage return

Now that we have our data in the file we can begin setting up our registers to execute. For this program we will only print out our message to the console. All that is required is the sys_write system call. To find out what the calls are you will need to look up your system's unistd_32.h file (or unistd_64.h if you're running 64bit linux). Less this file to find the write command and you'll notice that it is defined as 4. Since this is the command that we will be executing, we must load this number into our first register, eax. We do this with the mov command:

     section .text
     global _start
     _start:
          mov eax, 4      ;you may put this as 0x4, 4h, or 4d for different encoding


     section .data
     msg db 'Hello World!',0xa

Just as commands we execute on the command line, assembly commands take a form of arguments (how else would you pass the string to be printed?). These arguments go into the registers ebx, ecx, edx, esi, edi, and ebp in that order. One way to look up what the arguments for any particular command is to use the man pages. The command:

     $ man 2 write

NAME

       write - write to a file descriptor

SYNOPSIS
       #include <unistd.h>

       ssize_t write(int fd, const void *buf, size_t count);
....

will give us the argument layout for the command write. As you can see, the first argument, int fd, is an integer for a file descriptor. This one is simplified because we only want to write out to the console, so we will use 1, which is the file descriptor for the console. Next is the actual character buffer itself, which can be called by the label 'msg.' 

     section .text
     global _start
     _start:
          mov eax, 4
          mov ebx, 1        ;file descriptor for the console
          mov ecx, msg   ;label for the string to be printed


     section .data
     msg db 'Hello World!',0xa

The last argument for the write command is the size of the buffer, or of 'msg' in this case. One way to figure this out is to manually count the characters in the string. Not only is this tedious, but also much more prone for errors, especially if you have lots of strings in your program. A much simpler and elegant way to do this is to have a variable immediately after your string do the calculating for you. This is done through the equ command like this:

     section .text
     global _start
     _start:
          mov eax, 4
          mov ebx, 1
          mov ecx, msg


     section .data
     msg db 'Hello World!',0xa
     len equ $-msg

Now all you have to do is to move len (which is actually a value, and not a location in memory like msg is) into edx and then call the interrupt 0x80 command (notice this is hex) which will execute the command that you have loaded into the registers. Then to exit cleanly you need to load a 1 (sys_exit) and call interrupt 0x80 one last time. Here is the final assembly file:


     section .text
     global _start
     _start:
          mov eax, 4
          mov ebx, 1
          mov ecx, msg
          mov edx, len
          int 0x80

          mov eax, 1
          int 0x80

     section .data
     msg db 'Hello World!',0xa
     len equ $-msg

Now all you need to do is compile the  .asm file like so:

     $ nasm -f elf hello.asm     ;note: for 64 compiling it is '-f elf64'

And link it:

     $ ld -s -o hello hello.o

And now you can call it like any other program.

     $ ./hello
     Hello World!
     $

I hope to dive a little deeper into assembly programming as soon as I can. Hopefully it won't be another year before I get around to this and have to relearn it again.

          geno

Comments

  1. Great tutorial, Easy to follow and well explained.
    This complements security tubes Linux x86 Assembly course primer very nicely.

    ReplyDelete
  2. Awesome, thanks for the compliments!
    I assume this is the course you were referring to:
    http://www.securitytube.net/video/208
    These videos are definitely what got me started on coding in assembly for fun, and I'd highly recommend them to anyone interested in getting started in assembly.

    ReplyDelete

Post a Comment