This is long overdue, but I wanted to do a write up of this challenge, hopefully someone will find it helpful if they find themselves on a similar situation and need a close example. So here it is.
For 2017’s EkoParty, Bluefrost Security issued a challenge, a spoof of the very controversial topic of the electronic vote and the “Voting Machine”, which was discussed in Argentina’s congress a few months before EkoParty. Winners would get extra whisky shots on the convention and invited to a dinner party on one of the best restaurants in Buenos Aires.
If you just want to see the exploit, you can find it in my Github:alex91ar/bluefrostchallenge
bluefrostchallenge – Exploit for the 2017 Ekoparty Stack Overflow challenge. “The unbreakable voting machine”.github.com
I never could resist a nice dinner party with drinks, Argentinian meat and friends, so I decided to give it my best shot. The challenge was to find the vulnerability, to exploit it and gain remote code execution with process continuation (ie. without crashing the application). The exploit had to work on Windows 7 through Windows 10 on a 64 bit platform. Before we begin, the tools I used to analyze and exploit this challenge were:
- x64dbg debugger: A very good ollydbg fork that supports both 32 and 64 bits. Recommended.
- Cheat Engine: It has a very powerful and lightweight debugger, paired with its unmatched ability to search through a program’s memory it’s excellent to find raw gadgets.
So, first thing I did was to open and to see the possible options:
The application after I chose an option started listening on port 55555. Naturally the first thing I did was to send a very large payload, with a debugger attached.
The application seemed to be asking for a handshake, and at first only accepted 4096. So, I put a breakpoint on ws2_32.dll:recv() and sent a random payload. Once the breakpoint was triggered I followed it until recv() returned and started analyzing what it did:
After the call to recv() returned, the application checked if the function returned successfully (First conditional jump), printed out the data received and called a function, passing the size of the buffer (4096 bytes in this case) and the buffer itself as parameters. The function then checks if the handshake is valid and otherwise the error “Invalid handshake” is prompted and the socket is closed.
Now, the calling convention for Microsoft on x64 is quite complicated and can be challenging when it comes to exploit writing, since ROP chains must include gadgets that zero out unused parameters on function calls with many parameters (such as CreateProcessA). According to wikipedia:
It uses registers RCX, RDX, R8, R9 for the first four integer or pointer arguments (in that order), and XMM0, XMM1, XMM2, XMM3 are used for floating point arguments. Additional arguments are pushed onto the stack (right to left). Integer return values (similar to x86) are returned in RAX if 64 bits or less. Floating point return values are returned in XMM0. Parameters less than 64 bits long are not zero extended; the high bits are not zeroed.
As we can see on the code above, it performs a comparison between the buffer received and the word “Hello”. So, naturally we have to send “Hello” as our first packet, and we’ll be able to bypass this quick check. So far, so good. Let’s send a bunch of “AAAA” after sending the initial handshake.
So, let’s analyze what caused the crash and how we can use this to our advantage…
As we can see, it seems that the application is trying to copy our buffer (Currently residing in dynamic memory) to the stack, and has taken the size (which can be seen on register R8, which would be the third parameter) from our payload, which currently is 0x6161 (“aa” in ASCII). Not only this results in the application overwriting the stack, but also the incremental counter on RAX tries to go beyond the ending of the stack, which causes the access violation error. So, now we know that the first two bytes of the second payload defines the size of the amount of data that is going to be copied from the heap to the stack.
We have a classic Stack Overflow, which means that we can overwrite the function’s return address, and when the current function finishes, it is going to return to an address that we control. However, since it’s not 2003, we cannot just jump into the stack and execute a payload, there are a few protections we’ll have to bypass in order to successfully exploit this:
- ASLR (Address Space Layout Randomization): This protection randomizes the base address of every module, including the original exe, so that they are not predictable and cannot be hardcoded.
- DEP (Data execution prevention): This protection prevents instructions on the stack from being executed. This means that the memory page that holds the stack, will not have the executable bit enabled.
- Stack canaries: These are values that reside before the return address in the stack, and get checked before each function returns. If a stack overflow occurs, the canary will be overwritten before the return address is, and when the check is performed, at the end of each function, it will fail, prompting an exception and exiting the process, preventing arbitrary code execution.
ASLR and Stack canaries are bypassed in a similar fashion, by leaking memory addresses. For ASLR we just need an address against which we can calculate offsets and use it every time we need to address a memory region. For Stack canaries, we just need to leak two values: The canary itself, and the stack pointer against which it was calculated. Once we have both, we can reverse the original canary by XORing them together. So, enough theory, let’s find our leak.
But first, there is an extra problem to bypass, it seems that one byte of our request is being used to call a vtable, which means that one byte that we control, is being used to call to a list of pointers, none of which we can control, so we can’t use it as an execution primitive:
The vtable that we have looks like this:
So, we have to make sure that our payload has the correct byte in place, so that when the call is made, we don’t get a crash due to the application trying to execute something on “0x4141414141414141”. The correct byte turns out to be “0x66”, so we add it to our payload. After tinkering around a bit I created a function that correctly interacts with the server:
Let’s see then what happens when we run it:
It seems that our application is not only overwriting the stack, but it’s returning in each response, one more byte than it should, and if we look at the first request we sent, it appears that “0x19” is part of the stack content. Let’s then look at the stack, to see what we have:
Now, the first thing we need to do, is leak the current Stack Canary. If we have it, and we have a stack address, we can deduce the base canary value, and we can calculate the next canary by calculating offsets of our leaked stack address. I created a function for this purpose, which will leak the stack byte by byte, and used it to leak the stack canary, the code base address (from a return address) and a stack pointer present on the stack itself. I’ll explain why in a second.
And when we run it we get the following:
So, why did we leak all these values? This is why:
- We’ll need the base address to create our ROP chain.
- We’ll need the stack canary to safely overwrite a return address, without triggering the stack overflow protection.
- We’ll need the stack pointer to calculate the base canary, so we may generate any stack canary we want, for our final payload.
Now, being brutally honest, creating the ROP chains that allowed me to finally pop a calc, was both incredibly hard and unbelievably fun, hunting those little bastards that let you step by step execute your code… It feels like Dr. Frankenstein looking through body parts to create my monster.
As everything here in this write-up, this is the way I personally did it, I’m very sure that there are easier and more efficient ways of doing it, but this is the way I found most convenient.
First thing I did was to create a ROP chain that allowed me to write any value into any address I wanted, sort of like a Write-What-Where primitive. Then, I zeroed-out the base canary (the one against which every stack canary is created). This is for convenience, and because I didn’t want to calculate every stack canary every time. But it’s completely unnecessary, as you may include the correct canary in the payload.
Of course, if we want the application to continue running after we execute our payload, we need to restore the stack pointer to its original value, restore the return address to its original value, and in this case, restore the SOCKET descriptor, if it was overwritten. For this, I created a second ROP chain:
Now, in order to get the calc, I had to call an API that was on the executable’s Import Address Table. There are several options to chose, all of which would get an RCE, but I went for CreateProcessA. The main problem with this API is that it uses A LOT of stack, and overwrites some of the application’s own stack, which won’t really matter for many of the local variables; but for one, the SOCKET descriptor, it has to be restored, otherwise the application will get stuck on an infinite error loop. For this, I created another sort of primitive, which lets me move the data from one address to another.
And finally, last but not least, the ROP chain to achieve remote code execution and pop the calc. For convenience I stored the cmdline on the main ROP chain, which made it easier to reference in the API call.
So, without further ado, here’s a video of the exploit in action: