Welcome to the New Order: A DEF CON 2018 Retrospective
On August 12th, 2018, the Plaid Parliament of Pwning earned second place in DEF CON CTF, one of the most competitive hacking competitions in the world. Placing ahead of us this year were our colleagues on DEFKOR00T, marking their second such victory over the past four years. Although this year I cannot provide an account of the how the winning team played, we still have many great stories to tell, and we learned a lot from DEF CON 2018.
A Brief Introduction
For each of the past three DEF CONs, I’ve provided a retrospective (2016 and 2017) on what the experience was like for our team, the PPP. I enjoy writing these and sharing my perspective on the competition, and I hope that you will join me for a in-depth look at one of the most exciting security competitions in the world.
Imposing Order
2018 marked the Order of the Overflow’s first year as DEF CON CTF organizers. LegitBS (the group that ran the previous five competitions) was a hard act to follow, but the OOO had a rockstar team. Led by the ever amusing Zardus, the OOO contains players from Shellphish, organizers of iCTF and Boston Key Party, Professors, and long-time CTFers. Furthermore, they had accepted this new position on a promise of shaking things up a bit and trying something new. We had a few ideas about what this might be, but even our vivid imaginations were not prepared for this new Order.
Upon arriving on the competition floor at 9:00 AM, the OOO directed us to oooverflow.io/obey, which housed the rules for this CTF. While these rules begin on the same general premise as traditional DEF CON rules, they very quickly begin to diverge.
For those unaware, DEF CON CTF is the prototypical Attack-Defense CTF. Each team has a collection of services (usually binary applications) that have a host of bugs in them. Teams are tasked with finding these bugs and exploiting them to collect “flags” embedded in the challenge. Once these flags are collected, they can be submitted for points. Furthermore, the flags change every few minutes, and so you can keep re-submitting every few minutes for as long as you are still able to exploit the challenge.
However, a team’s exploits might not continue forever. The flip-side of attack-defense is that teams may also upload new versions of the application, called “patches.” These patches must implement the same basic functionality as the original application, but may remove some subset of the bugs. This might then prevent other teams from scoring points.
Changing the Game
Having played in a large number of DEF CON CTFs, the members of the OOO decided that they wanted to change a host of these rules. Here is a quick summary of what they did differently.
1. Far Fewer Packet Captures
In the case of both attack and defense, no team has direct access to the machine running the problem. Instead, there are tools provided by the organizers to allow you to interact with their services. The OOO did not change this basic structure, although they did severely limit the effectiveness of one such tool. Network captures (pcap
s) are an invaluable reference for defense — they provide an insight into how you are being attacked, allowing you to remove the bugs that are being exploited. Previously they were released immediately (or at a 1-2 round delay), but the OOO opted to not provide network traffic at all until about two thirds of the way through the competition. This was done in an attempt to focus teams on exploitation, but had extremely far-reaching impacts that we will discuss further on.
2. Limited Patching Capacity and Pre-Verified Patches
While in previous years patches had been allowed to contain any number of changes, this year patches could only contain a fixed number of changes. For instance, in one problem, we were allowed only 100 bytes of difference between the new binary and the original. This meant that tools which completely rewrite a binary to obfuscate the change were no longer allowed. In addition, instead of having a service-level agreement (SLA) checker that would consistently verify that your service was behaving properly, all patches were given functionality testing before they could be deployed. This meant that you could not lose points for uploading an invalid patch, because if it failed the check, the patch would not be deployed.
3. No Consensus Evaluation
One major change that was introduced during DEF CON 2016 was consensus evaluation. Consensus evaluation is a catch-all term for rules that provide other teams with more information about the game state. While many such rules were common in previous CTFs, the changes included the ability for rival teams to see your modified application. DEF CON 2018 removed this, and as such patches were once again completely private. This meant that opponents could no longer craft patch-specific exploits.
4. New Game Mode: King-of-the-Hill
Although the aforementioned differences altered the gameplay significantly, none did as much as the addition of a new game-mode. Accounting for 20% of the points, the introduction of King-of-the-Hill (KotH) brought a fun Japanese twist to the game. For those unfamiliar with KotH, it is a game mode wherein some objective (such as highest score or fewest changes) is fixed at the beginning and all teams compete to perform the best by that metric. The top 5 teams then earn points relative to their ranking. Unlike strict KotH problems, OOO’s had an additional level of challenge wherein the problems themselves could be hacked to earn the scorer more points than their non-“cheating” competitors.
Although there is a lot to be said about how these new rules affected the gameplay, before we consider that we should first step back a few weeks and look at how the team prepared for the competition.
The Plaid Parliament of Planning
If you’ve read any of our previous DEF CON writeups, you might remember that we try to do a lot of preparation for the CTF every year. Oftentimes this focuses on tooling and general infrastructure to ensure that we can handle whatever is thrown at us. This year, however, our preparation was comparatively minimal. We spent some amount of time writing better patching infrastructure, although this was mostly to put us on the same level as many of the other teams. Circa last year we did not have any tools that allowed us to rewrite entire binaries, though we observed several other teams using it at that time.
Unfortunately, due to the aforementioned changes to the patching process, these tools were not particularly helpful. Despite that, the preparation helped get a few of us in the right mindset needed for designing the types of hyper-optimized patches required by the rule change.
In a similar manner, what myself and a few other members prepared proved no more useful. We continued fleshing out the defensive networking infrastructure we began building last year. Having already built the basic backbone, most of this year’s preparations were focused around optimizations and improved usability. In fact, I’m proud to announce that 2018 is the year that the Plaid Parliament of Pwning became 👏 web-scale 👏.
Although the core featureset didn’t change significantly, as most of last year’s infrastructure was written during the competition itself (and for CLEMENCY none-the-less), rewriting everything was a fairly onerous burden. Fortunately, all that hard work really paid off when we booted it up for the first time and… realized we weren’t getting PCAPs until they were no longer useful. Yep, we didn’t use this work either.
So with two unhelpful projects behind us, one might wonder if any of our prep was actually useful. Fortunately, a third project we began was to build plug-and-play Docker images for a wide variety of architectures. Although using Docker to run other architectures should be fairly easy, in practice it can be difficult to find an appropriate disk image, operating system, and qemu build so that everything plays nicely together. As such, having images that already contain all of the necessary components installed made working with less-convenient architectures much easier.
From Pointless to Propaganda
Another slight change to the competition that I neglected earlier is that the days became a little bit longer. In years past, the competition ran from 10:00 AM to 6:00 PM on Friday and Saturday, and 10:00 AM to 2:00 PM on Sunday. As part of their new regime, the OOO put us to task and extended each of the first two days to 8:00 PM. Over the course of those 24 hours of competition, the OOO released 7 Attack-Defense and 3 King-of-the-Hill challenges. They were (in release order)
Attack-Defense
- pointless: A mips-based challenge.
- twoplustwo: A JavaScript calculator that was run using Duktape.
- oooeditor: An ed-reminiscent editor with binary file support.
- poool: A crypto-currency pool management binary.
- vchat: A jabber client.
- bew: A web-based dissident-reporting site.
- reeducation: A subeq interpreter.
King of the Hill
- reverse: An assembly fill-in-the-blank game.
- doublethink: Who can write the most polymorphic shellcode?
- propaganda: Smallest patch for the win!
Although each problem was interesting in its own right and warrants its own discussion, I am not familiar enough with all of them to do that, so we will instead only talk about a few of the more unusual ones.
reverse
Reverse was not only the first KotH challenge released, but was also the first problem released overall. It consisted of several levels, each providing some small section of assembly with pieces blanked out. Here’s an example of what that looked like:
0x8163bb0: c744240804000000 mov dword [esp+0x8], 0x4
0x8163bb8: c744240402000000 mov dword [esp+0x4], 0x2
0x8163bc0: 891424 ??????
0x8163bc3: 8844242c mov byte [esp+0x2c], al
0x8163bc7: e88492f001 call sub_5497
0x8163bcc: 0fb644242c movzx eax, byte [esp+0x2c]
0: cmp rax, rdx
1: mov dword [ebp-0x38 ], eax
2: cmp edx, eax
3: mov dword [esp], edx
4: mov rbp, qword [rax+0x10]
As one might guess, answering questions like these can be very easily automated. As such, the vast majority of teams, ourselves included, devoted a tremendous amount of time to simply answering the questions. Unbeknownst to us (although at least one other team found it) there were also a number of bugs in the program. Using these, one could obtain near-infinite points. However, only one bug was actually used by any team to get an advantage, and that one amusingly gave only 20 points. For our part, we saw the assembly being leaked as part of the challenge, and assumed that we were supposed to use it to reconstruct the server’s code. Unfortunately, this turned out to be a red herring, and despite having reconstructed nearly 50% of the binary (as far as we can tell) we still have no idea what we were actually reconstructing.
pooool
Another interesting challenge was pooool. This was a binary released later on in the competition (near the end of day 2), and allowed players to join a mining pool where they could submit proof-of-works to buy a flag. Although actually mining the flag was an expensive proposition (it would take roughly 750 cores per round for a single team), there were a number of bugs that made it easier. For instance, one such bug did not properly check the case in the proof of work, and so it could be used to speed up the work by a factor of 150 (reasonable enough for a player’s laptop). Another bug allowed an attacker to leak the flag directly, although this one was patched out by about two thirds of the teams very quickly.
What makes this problem so interesting — and worth discussing — is the fact that in this day and age 750 cores is not that much. For teams with even mild financial backing, acquiring sufficient compute to just buy the flag is entirely reasonable (even on a fully patched binary). For our part, we had actually considered using this technique to give ourselves an extra edge.
doublethink
On a bit more of a fun note, the KotH challenge doublethink challenged teams to write a single 4096-byte shellcode that could run on as many of 12 different architectures as possible. For reference, these architectures were
lgp-30
pdp-1
pdp-8
pdp-10
mix
ibm-1401
nova
risc-v
hexagon
mmix
clemency
- One of:
amd64
,arm64
, ormipsel
By the time the competition had retired, our team was in third place with shellcode that ran on 8 different architectures (amd64
, lgp30
, mix
, pdp1
, pdp8
, pdp10
clemency
, and nova
). We were blown away by Dragon Sector with 9, and HITCON who had achieved a whopping 11. However, as we later found out, both of those two teams had found a bug in the problem that let them claim success for far more architectures than they actually supported. On the plus side, it was a ton of fun writing each of those shellcodes!
bew
The final problem we will discuss in depth is bew. Bew is the first “web”-challenge that I have ever seen at DEF CON finals. It initially presented itself as a web-assembly (WASM) reversing problem, although after about 15 minutes of reading through it, it became apparent that the WASM was there primarily to masquerade the fact that all input to the problem was being evalled. Since it was released only about an hour before the end of the day, most teams were content with using that eval to copy the flag onto a publicly accessible page where it could be read directly. In fact, even the teams who did not find the bug realized that they could troll this page and submit any flags they found.
The real issue with bew’s design became evident the next morning. Having had all night to play around with it, teams quickly realized that they could use this entry point to establish a permanent backdoor on other teams’ servers that also removed the main entrypoint. Unfortunately, since everyone realized this, the first round of the day was simply a race to see who could get their backdoor installed on as many people’s system as possible. PPP unfortunately lost this race, but got lucky insofar as whomever backdoored us did so in a way that logged everyone else’s backdoors to a public place. This meant that we had the source code for several teams’ (insecure!) backdoors and were able to use those to get flags from other teams.
Notes, Issues, and Requests
While DEF CON finals were a lot of fun, they were not without issue either. The OOO took on a huge task this year, and with twice as many teams there are twice as many points of failure. Before I begin discussing some of these problems, I want to commend the OOO for their transparency throughout the whole process, and their eager willingness to make things right. An attitude like that is far more valuable than perfect infrastructure, because it resonates in every aspect of the competition. With this in mind, let’s briefly discuss a few of the issues with the competition, and how we hope to see them changed.
Infrastructure Problems
These are the easiest to discuss, because everyone knows that this is an impossibly hard problem, and no team ever gets it perfect. The only specific worth mentioning is that when one of these problems affected our ability to score for a substantial portion of time, there was not really any good recourse. This is not to suggest that the OOO had an option they chose not to take, but rather as a result of this error there was no fair way to restore us those points. This can be a frustrating situation for all involved, and hopefully in the future a better recovery method can be implemented.
Gameplay Errata
With regard to some of the decisions made for the competition itself, these provide for a lot more discussion. First, I would like to express how much I enjoyed the new King-of-the-Hill mode. It added some much needed variety to the competition that really helped it to feel fresh and fun. In a similar vein, several members of our patching team mentioned that the limited-byte patching method made their job harder, but a lot more fun. It meant that they had to do a lot more by hand, but it was an exciting challenge.
In contrast, some of the changes made our job a little less fun. For my part especially, the omittance of packet captures removed a significant strategic element from the game. We use the network captures for a number of different things. One use is as an indicator of how we are being attacked. By seeing what transpires over the wire, we can determine bugs in our application, and figure out how to fix them and use them against other people. The organizers referred to this as “ripping exploits off the wire,” but I think that this is a little unfair, because it requires one to understand the contents of the transaction and build off of what other teams already have. It is also worth saying that as a result of removing this, the OOO exposed an api that directly told you if you were being attacked. This helped fill in the gap somewhat, but still left out a lot of information.
Similarly, the knowledge that other teams will have the full contents of what you send them in the traditional attack-defense format encourages teams to be clever about how they deploy their exploits. It no longer becomes a matter of find-exploit-pwn, but instead other questions come into play such as: “who do we exploit?”, “which exploit do we use?”, and “how can we hide our exploit?”. The importance of these “metagame” elements are one of the most interesting aspects of attack-defense style CTFs. Without these, the contest becomes more similar to a standard Jeopardy competition, and loses some of the “real-world” feel.
Finally, the loss of consensus evaluation removed another fun, inter-team component to the game. Instead of having an exploit that either works or does not work, consensus evaluation gives teams a way to seek out errors in the patching process, and exploit those for points. Not to mention, everybody loves to show off their fancy backdoor!
These are, of course, the thoughts of an individual player on an individual team. I would be interested to hear from both other teams and organizers as to their opinions on the new decisions.
A Challenge Coin for your Thoughts
Although it was disappointing to not win this year, losses like these always provide rich insight into ways that we as individuals and as a team can improve.
Avoid Mental Lock-In
I personally fell into the trap of preparing for defense and defense alone. When we arrived and found out that nearly all of our defensive systems were made irrelevant, I never mentally recovered. Instead of responding to that with “ok, let me spend most of my time on XYZ instead,” I began bouncing around other projects and problems working on them in short vignettes. This is not to say that I was totally useless, but had I been more focused in what I was working on, I could have contributed more to the team. That would have been possible had I prepared to work on aspects of the competition beyond just defense.
Organization is Always Useful
This may not be true for everyone, but oftentimes I think of organization as a luxury only for when I have a lot of time on my hands. In practice, this is a terrible mindset to be in, and there were a number of places where better organization could have helped us a lot. A prime example of this is that in working on two different problems, a team member found a critical bug but did not realize it. They made a mental note to come back to it, but it eventually slipped their mind and the bug was never investigated later. Having a good way to organize these thoughts and notes could have helped us significantly.
With that said, there were a number of places were we organized ourselves well. Our team captain did an excellent job of making sure that everyone was involved in some aspect of the competition, and that everyone had goals to work toward. This meant that we had to waste far fewer cycles coordinating among ourselves and syncing up with different teams.
You Cannot Predict the Future
Having spent so many hours preparing a useless tool for finals this year, I was quick to beat myself up over our loss. In some sense, it felt as though my own shortcoming in predicting what we would need cost us the game. Yet, when I revisit what I knew ahead of time, I realize that knowing what I did then, I still made the best decision I could. It was unfortunate, but I came to Vegas and lost the gamble. As long as I take the time to discuss what went wrong and what we can do better next time, it was all still worth it.
T-364 Days
While this may not be as exciting as a write-up from the winning team, I think that it is nonetheless valuable to see the competition from a slightly different perspective. I want to once again thank the Order of the Overflow for all of the hard work and sleepless nights they suffered through to bring us this competition. Furthermore, DEFKOR00T, HITCON, and all of the other teams played an awesome game and really pushed us. Congratulations again to DEFKOR00T, and I’m looking forward to seeing everyone in Vegas again next year!