Kernel Panic: A DEF CON 2020 Retrospective

August 17th, 2020

In many regards, 2020 has felt like a collective Blue Screen of Death. However, a raging virus was not enough to keep DEF CON from happening. This year it booted into Safemode, an online-only version of the popular security conference. Along with it, the titular DEF CON CTF reimagined itself for a virtual future.

1 Minute to Midnight

DEF CON is an annual conference traditionally held in Las Vegas during the summer. A bizarre mish-mash of hardcore hacking and excessive partying, most members of the infosec community will find themselves visiting it at least once. This year, 16 teams of hackers from over 1500 qualified for it’s most well-known competition, DEF CON CTF.

As in previous years, I competed in DEF CON CTF with Carnegie Mellon’s hacking team, the Plaid Parliament of Pwning (PPP). Similarly, the Order of the Overflow returned to host. Although we did not win, it was an exciting competition with difficulties and challenges unique to these times.

Table of Contents

Overview

Postpandemic Cyberwarfare A Solid Foundation Infrastructure

Preparation

Pandemic Preppers "Organization"

Problems

An Artisinal Selection of Pwn

Conclusion

Panic! At the Discord Sharpening the Blade Until Next Year

Postpandemic Cyberwarfare

A solid foundation

In what was a huge relief, the game this year was very similar to previous years. I’ve covered this before so I’ll keep this section short, but the game consists of roughly 300 rounds wherein teams can score points by

Attacking other teams: Teams will score 1 attack point (or 0.5 points as I’ll explain momentarily) for each flag of their opponents that they steal.
Defending their services: Teams will score 1 defense point for each of their services that is not successfully attacked in that round (while there is at least one opponent who is under active attack)
Scoring well in King-of-the-Hill challenges: For each KotH challenge, the current best scorer will earn 10 points, second best 6, then 3, 2, and 1. Teams below the top five do not score.

Gameplay proceeds with teams exploiting, patching, and competing with each other every round to score as many points as they can before the next round. In this regard, the game was almost identical to previous years. The only substantial change was related to network captures.

In order to improve the fairness and predictability of the game, the OOO decided to always release packet captures as soon as they were ready (on a three-minute cadence). However, if teams did not want their traffic released, they could opt to attack on “stealth” variants which would earn them half points, but whose traffic would never be released to their target.

Additionally, in order to accommodate the wide variety of timezones that players would be in, the OOO bent the rules of time and space to create the fabled 17-hour day. More specifically, they restructured the competition so that the game would take place over four “days”, where each day represented 1 hour of prep, 8 hours of competition, and 8 hours of “rest.” This schedule had the unique property of being equally hellish for everyone, so that no team would be unfairly favored due to proximity to the organizers.

Despite these changes, from a gameplay and competitive standpoint, the rules remained roughly the same as competitors had grown used to.

Infrastructure

Unlike the rules of the game, it was not possible for the infrastructure to stay as it had been. Despite this, the OOO’s adaptation of their traditional on-site attack/defense infrastructure to a fully virtualized version of the game is an impressive tale of reliability and performance.

In order to ensure that teams were playing fairly, the OOO once again set up an API-based game network wherein, instead of directly running the vulnerable services, teams were allowed to interface with them over a standardized API. Specifically, the game network’s API allowed teams to do the following

Connect to (and attack) a team’s active problem
Connect to their own instance of a KotH problem
Upload a patch for a problem
Download PCAPs for a problem
Communicate with the organizers

Although this sounds straightforward, there is actually a lot of hidden complexity. For instance, two different teams might connect to 10.13.37.1 to attack one of A*0*E's problems. Consider the following situation: team 1 finds an exploit that crashes the service. From the outset, it seems like team 2 is just out of luck – until the problem is restored they cannot exploit it.

In previous years, this was a real concern.

However, in this architecture each team had a dedicated machine to attack for each of their opponents. Likewise, each team had a dedicated instance they had to defend for each of their opponents. Through the use of the API, this interaction was fully transparent to the teams.

In order to allow competitors to connect to the network, the OOO established a wireguard based VPN with a node on the network for each of the teams. Designed to be a jump-box for access to the network, this device simulated teams connecting their computers to the traditional physical network. For teams (like ours) that installed an additional VPN on the jump box, networking was actually more stable than most on-site events.

Pandemic Preppers

Since DEF CON CTF is such an important competition for us, we traditionally spend the month leading up to it in preparation and organization. This year, we did not.

The truth is, it’s hard to be fully enraptured by competitions like these when the world is in disarray. With that said, CTFs for me are ultimately about solving challenging puzzles with the kind of people who make it a joy instead of a chore. Even if my head wasn’t fully in the game, without any organization at all we were going to be ill equipped to handle the challenges of fully remote play. To that end, several of us prepared a strategy for managing DEF CON 28.

“Organization”

There were a number of problems we faced in our preparation for the competition. The first of these was organizational — who was going to play. Although the lack of flights meant that we could have an unbounded number of players, in practice managing a large group of rotating people went far beyond what we were prepared for.

To find a medium between allowing anyone to play and having a group that we could reasonably keep track of, we requested that everyone commit to playing for the full competition. This might seem trivial, but I mention it because it was absolutely crucial for us. Although we lost a few helping hands, it meant that we could reliably assign people to work on problems and that there was never any loss of knowledge.

The latter of these is most important, and brings me to our second concern — knowledge transfer. Traditionally, knowledge is transferred via huddled clusters of people in the corner of a Las Vegas hotel room. In lieu of that, we set up a Discord server for the team with the ability to create voice and text channels for every problem.

Again, there’s nothing groundbreaking about using Slack or Discord to manage discussion about a focused topic. However, by having everyone in one of a few problem channels, it was substantially easy for a newcomer to quickly get caught up on a problem and began contributing. It also made it easier for us to self-balance, as at any point you could glance at a problem and see how many people were working on it.

Preparing the team in this manner was important, but fairly easy to implement. In contrast, we spent a significant amount of time on our final concern — animated emoji. Discord only enables animated emoji for paying customers, which many of our team members were not. Unfortunately, this broke our standard workflow of constantly spamming each other with:

:robertoclap: .

To this end, we built a custom Discord bot that would allow anyone to send an animated emoji. We also established bots for less useful tasks such as monitoring announcements, sending alerts from internal services, and sharing information. Altogether, these bots helped smooth over some of the roughness of the online format, although none so much as Emoji Bot.

An Artisinal Selection of Pwn

With our scant preparation complete, it was nearly time for the first shift of the competition.

Day 1

The first “day” of the competition saw three problems to start teams off.

Our first King of the Hill problem was Casinooo, a Blackjack simulator written for a computer that had been built in Conway’s Game of Life (using the Quest for Tetris architecture). Your task was to upload your own Game of Life configuration that would play Blackjack against your competitors.

Meanwhile, the first Attack/Defense challenge was Parallel-AF, an operating system and application designed for the Manchester Dataflow Architecture. As the name suggests, the application made heavy use of concurrency.

Finally, the third problem was an oddball artificial intelligence problem, Rorschach. For most of the first day, and into the second this was the problem that I spent the most time on. At first, it seemed like a fairly standard AI-Reverse engineering problem, the likes of which have been used in both Plaid and previous DEF CONs. Each team was hosting a model with 68 classes that had been trained on grayscale noise. Each round, you were given an output class for every team, and had to produce an input that would classify to that result.

Initially we thought this was a trivial gradient descent problem, until we realized that the only model we actually received was our own. As a result, we could easily perform gradient descent on our personal model, but not any of our opponents’. However, we could run an input against their model and get the full output vector. As a result, our only option was to do blackbox gradient descent.

I wish I had some brilliant insight that I could share about how we managed to blackbox our opponents’ models, but our strategy was roughly

pseudocode

content_copy

1. Start with a simple input image
2. While we haven't found the correct class
  a. Randomize a random subset of the pixels
  b. If this input gets us closer to the correct output, keep it

And then we had 6 or 7 machines running this against all of the teams every turn. Over the course of the game, we tweaked our strategy for “Randomize a random subset of the pixels”, such as binarizing the image or changing the distribution we were pulling from, but these only helped a small amount.

On the patching side, teams were allowed to run an arbitrary python verifier against candidate images. If the image passed muster, then it was allowed to be fed to the model. Otherwise, teams would not be able to test their input. Unfortunately, this meant that there was no “correct” patch, because the original inputs to the model would always need to pass the check, so if teams were able to reconstruct those original inputs, they would always get the flag.

Our best effort patch, then, was to take a number of statistical measurements of the sample inputs we received and check competitors’ solutions against those. Although some solutions were able to still land, this worked well for most of the competition.

In a twist that I had not expected, after the end of the first day the OOO switched up the problem to be whitebox (meaning teams had each other’s models). Once this switch happened, our attack reverted to the traditional gradient descent technique.

I really enjoyed the unique blackbox gradient descent, but there were a few things about this problem that made it frustrating. The biggest of these was that teams’ models were unique, but static. Since the models could never change, some teams lucked into having models that were much harder to attack than others. For instance, I remember team 7 having a model for which determining an input would take roughly twice as long as for most other teams. Since the model never changed, they were more likely to be defended just by virtue of their model’s structure.

Additionally, since there were no actual bugs in the problem, patching felt very much like guesswork, wherein we had to assume what we could about the SLA checker to try and find a patch that would effectively distinguish between true inputs, and the generated ones.

Day 2

Since many of the first day’s problems carried over to day 2, it saw only two new problems released.

One of these was Real Hacking Game (RHG), a game problem that saw teams sending commands to corrupt the game state.

The second was a new King of the Hill problem, Pinboool. In this challenge, players submitted an input of size 1024 bytes to the application, which would then use it to “play” a game of pinball. Effectively, this problem was similar to a really complex Crackme, where the code could take a number of different branching paths depending on the contents of the input. In this manner, the control flow of the executable was designed to mimic a game of pinball

I have a confession to make. I was utterly useless on this problem.

In order to help us navigate the many different codepaths, I meticulously reimplemented the game in python. I spent basically my whole night working on this, and then it ended up being entirely unhelpful because of a memory corruption that allowed you to bypass some of the restrictions that I had assumed in my code. Worse still, the memory corruption itself did not help substantially because all of the top teams ended up just hitting one particular codepath in a tight loop.

This was more than a little frustrating for me, and also for the many other people who worked on the memory corruption variant. But, there’s no one at fault for this. That’s just how the game goes sometimes ¯\(ツ)/¯

Day 3

As a result of the bizarre competition schedule, Day 3 began on Saturday evening, after a refreshing afternoon of rest and preparation. Although the overnight shift would be grueling, it brought with it a welcome blast from the past.

Last year I wrote about one of our favorite problems, ROPship. In it, players were tasked with automatically generating a ROP payload that would pilot a ship to attack opponents. One of our biggest laments, however, was that it was taken down before we could implement any of our cool strategies.

The OOO heard our cry, and brought ROPship back, with an AI twist. This year, instead of uploading ROP payloads, teams uploaded small neural networks that used a number of signals about the game (such as nearest bullet, direction of closest player, distance to walls, etc.). Unfortunately, since we only had one hidden layer of size at most two, we were highly limited in the strategies that we could implement.

However, as the name suggests, this was ultimately a Return-Oriented Programming problem and a bug in the executable allowed players to ROP, thus enabling them to overcome the network size limitations. Once the exploit had stabilized, we could upload any network we wanted, provided the total number of nodes was less than 16k.

My teammates are producing a separate writeup of how we solved this, so I don’t want to spoil anything. However I will leave this video of one of our last rounds to tide you over until they have finished.

Other problems released on Day 3 included Bdooos, a hardware firewall (that ended up being emulated due to an issue with acquiring the hardware), Gameboooy, a distributed Nintendo Gameboy emulator, and Slooot, a whitebox cryptography challenge.

Day 4

By the start of the final day, all of us were pretty much at our limits. Fortunately, the OOO maintained a tradition of releasing silly web problems near the end of the competition, and this year was no exception.

This year’s problem was fairly standard, and had a number of glaring bugs that we quickly patched out. Even the more “subtle” bug, a prototype pollution issue in a validation library they were using, was made incredibly obvious by the cleanup function that reset the prototype after every connection.

Part of why this problem was so confusing for me was that I never really felt like I had found/patched/exploited all of the potential bugs. Yet, at the same time, I could neither find new ones, nor saw anyone actively attacking us. Even one bug that let me reconfigure the web server of my target was not powerful enough to convert into a file disclosure.

Although I might sound frustrated by this, I think it’s actually an example of really good A/D problem design. Even though the application was straightforward, it presented a variety of options for exploitation.

With that said, this problem also isolated clearly what I find difficult about opaque validation. The service-level agreement (SLA) checks are performed on your patches when they’re submitted so the “correct” functionality of the service is never really specified anywhere. Instead, players are mostly left to guess what the SLA check is going to do, and then plan around it. For us, we submitted a large number of patches that we thought to be sound (as in, patching a real bug in the same way that I would for a production application), that caused us to fail SLA in a non-descriptive way. This caused us to remain vulnerable for far longer than I would have liked, given that we knew what the bugs were and how to remove them.

Panic! At the Discord

When the curtain closed on the fourth and final day, I had a strong suspicion that we would not make first. Not only did I personally waste a lot of time on the various problems I worked on, but our lead opponent, A*0*E had been playing an incredibly strong game.

As it turned out, my hunch was correct, but not by the margin I might have guessed. As is demonstrated here, the game was neck and neck to the very end.

This game was crazy, check out this #DC28CTF game!@defcon CTF

/cc @thedarktangent pic.twitter.com/X4hPFfRw4o
— Overflow (@oooverflow) August 10, 2020

With everything going on, this was an incredibly stressful game for me — even more so than when we are all in Las Vegas evaporating together. If I have a single take-way from DEF CON this year it is this: Breathe, and remember to look at the big picture. At one point my teammate and I were at each others’ throats over some tiny details that did not matter. Once we stepped away, regained our composure, and came back together we were infinitely more productive than we had been before.

Additionally, we as a team we dealt with a lot of chaos when people would jump from one challenge to another. As a result, we would occasionally leave active problems completely stalled out. For future competitions like this, I think it will be valuable to appoint a single lead for every problem who can ensure that forward progress is always being made, even as other members bounce from problem to problem.

Sharpening the Blade

From making problem lifetimes more predictable, to using stealth ports as a balance between offense and defense, the Order of the Overflow has once again proven their commitment to making DEF CON CTF the best it can be. There were a few more issues this year, many of which were a direct consequence of the global situation, but their team has documented them well in their postmortem. There’s not much value for me to go into them, but I do think that they brought to light an area in which the whole CTF community is lacking.

Communication during a highly competitive CTF is a balancing act between ensuring that players have properly understood the boundaries you’ve provided, and not giving any team an unfair advantage. Coupling this with a dozen different languages, communication can often feel like a minefield.

At the same time, problems often have components that are unrelated to the actual challenge itself. For instance, whether my XSS victim is named “admin” or “root” is probably irrelevant to the actual exploit. I think it could ease a lot of frustration to have this distinction codified in a public FAQ where non-critical questions are answered in full view of the competition, whereas exploit adjacent questions are given a friendly “Hack Harder.”

By collecting this information publicly, and by being slightly more generous with the information we provide, we as organizers might be able to make our problems more enjoyable and educational for those playing. I say “we,” because I have an opportunity to try this out as one of the organizers of PlaidCTF. I might start with a single problem, but this could be a valuable method for improving the communication between players and organizers, and in relieving many irrelevant frustrations.

Until Next Year

As always, DEF CON CTF really shows its merits as one of the biggest CTFs of the year. Especially given the circumstances of the competition, I want to thank the Order of the Overflow for the hard work they put in to making this happen.

Additionally, I want to thank all of my teammates and friends who played an incredible game, and who make even stressful CTFs fun sorry I was so useless this year, oops.

Finally, I want to offer all of our competition a hearty congratulations on a job well done. We saw a lot of creativity and ingenuity in the attacks and patches everyone levied. To A*0*E especially, congratulations on your hard-earned win. It has been many years coming.

And to everyone, I look forward to seeing you during PlaidCTF and future DEF CONs. Thanks!