The Lies We Tell Ourselves

December 27th, 2019

With 30,000,000 weekly downloads, it’s reasonable to expect that qs has been written from the ground up to be efficient, secure, and robust. Unfortunately, as is often the case with small projects that become unexpectedly big, it is plagued by legacy options and sibylline code. Without understanding how it really works, it is incredibly difficult to use safely.

Modern engineering is built heavily on trust. We trust hardware manufactures to build predictable devices. We trust operating system developers to implement secure protections. Most commonly, we trust libraries to chose safe paradigms. This trust is essential for nearly all development, however, it can sometimes be misplaced.

When we naïvely use someone else’s work, we allow ourselves to be blindsided by the mistakes that they have made. As a result, it is critical to understand not just what we are using, but how it actually works. From there, we can develop ways to protect ourselves and our users.

A Case Study

Constantly trying to understand every library and service you use may seem sisyphean. However, even the most basic tools can have nasty side effects. To illustrate the dangers of blind trust, we are going to take a deep dive into express.js, and more specifically, the popular library qs. When reading this, keep in mind that more than 5 million packages use this library.

The Good

For most use cases, express is the best server framework for writing web applications in Node. Even when you use a different web framework, there is still a high probability that it is using express under the hood. There’s nothing inherently wrong with this; by-and-large express is reliable and well designed. However, in an attempt to make user’s lives easier, it includes additional somewhat hidden functionality.

Consider this very basic server example:

typescript

content_copy

import * as express from "express"

const app = express();

app.get("/", (req, res) => {
    // Access the query string from the URL
    let obj = req.query.obj;
    res.send(typeof obj);
});

app.listen(3030);

What are the possible values that you can receive back from the server?

Now, if you’re unfamiliar with express or qs it would be natural to assume that the only possible values for typeof obj are "undefined" and "string". After all, query strings are simply key-value pairs and so every key should correspond to either a string value, or undefined to indicate that it was not set.

However, this neglects the fact that an additional parsing layer is used. Specifically, express uses qs to parse the query string, which allows it to extend the query string semantics to include nested objects and arrays. So in reality, we have that typeof obj has an effective type of "undefined" | "string" | "object".

Already it is possible for errors to occur if the developer is always expecting values to be strings.

typescript

content_copy

app.get("/", (req, res) => {
    let { name } = req.query;
    let serializedName = `s:${name.length}:${name}`
    sendSomewhere(serializedName);
    res.send("Done!");
});

In this case, we’re encoding the name using a PHP style serialization scheme. Not that I would recommend using PHP serialization for… well anything, but it’s a great way to show how bad assumptions can go wrong.

Let’s consider this query string: ?name[]=first&name[]=second. This query string appends [] after the key name to indicate that name should be an array. Now, when we go to access query.name, we get back the value ["first", "second"].

Now consider the more complicated query string: ?name[]=_&name[]=s:11:hello%20world. Since name is now an array, not a string, length is referring to the number of elements in the array — 2. However, when we insert name itself into a format string, javascript automatically stringifies the array by joining its elements with ,. As such, the value we ultimately serialize is s:2:_,s:11:hello world. For those unfamiliar with the PHP serialization scheme, this string represents the serialization of two strings, "_," and "hello world". Depending on what the receiving end is doing with this, it may cause crashes or, in an extreme case, remote code execution.

This behaviour on its own is fully intended. Although our hypothetical programmer did not understand how it worked, express was working correctly. However, it is a dangerous pratice to add opaque features into your service and assume that the user will be aware of them. Although qs is mainstream these days, it is still easy to introduce fatal bugs into one’s code by forgetting that express uses qs.

The Bad

Forgetting about intended behaviour is bad enough, but what happens when unintended behaviour enters the mix?

typescript

content_copy

app.get("/", (req, res) => {
    let { options } = req.query;
    delete options.authorized;
	
    if (checkCredentials(options.username, options.password)) {
        options.authorized = true;
    }

    // ...

    if (options.authorized) {
        res.send("Authorized");
    } else {
        res.send("Unauthorized");
    }
});

Despite the fact that I’ve told you nothing about what checkCredentials does, it is still possible to get the server to respond with authorized. Huh?

Let’s take a look at how qs works under the hood. On line 98 of parse.js, we see that when it parses a key and a value, it assigns them to obj via obj[key] = value. Earlier in the file we see that unless plainObjects is set, obj will just be instantiated as {}. Sadly, this means that qs is making the age-old mistake of treating objects as maps for user defined data. This is a problem.

(Proto)Typical JavaScript

It’s not worth going into the precise details of how JavaScript’s inheritence works. Its kludgey, hard to work with, and frankly not used much anymore. However, there are some artifacts of it that are crucial to understand. Consider this line

javascript

content_copy

let x = new Thing();
console.log(x instanceof Thing);

This snippet, as one would expect, will print true. Yet, it’s unclear how the engine knows that. One might expect that something internal to the runtime keeps track of inheritence. This is partially true, although misleading. In fact, each object keeps track of its prototype, a reference to the underlying class or object from which the current instance inherits methods and fields. This is not simply an internal construct however, but rather a mutable object available via the __proto__ key of any object.

This means that, while not officially supported by the JavaScript specification, it is possible to modify at runtime what class an object is an instance of.

I.e, we can do this:

javascript

content_copy

let x = new Thing();
x.__proto__ = { foo: () => 42 };
console.log(x.foo()); // 42
console.log(x instanceof Thing); // false

When done maliciously (or unintentially), this is called prototype pollution.

Don’t Pollute

Given that qs is allowing us to arbitrarily choose both the key and the value for this assignment, what happens if we change the prototype? Specifically, if we assign to a sub-key of the prototype, it will automatically create a new object for __proto__ with whatever key-value pair we’ve provided it.

In our above example, we can use the query string ?options[__proto__][authorized]=yes. When parsed, the following will happen:

qs creates an object that we’ll call object1. It sets object1[authorized] to "yes".
qs then creates a new object called object2. It then sets object2[__proto__]=object1. object2 is now an instance of object1
On the parent query object, qs sets query[options]=object2.

Now, we have an options object that is inheriting all of the values of { authorized: "yes" }, without having them defined directly on it. Thus, the statement delete options.authorized has no effect, because authorized is not defined on options itself.

In our above example, even though the checkPassword check fails, options.authorized returns "yes", so we still appear to be authorized for the action.

Array, or not Array

Here’s another example of where prototype pollution can cause problems. Let’s play “spot the error”:

typescript

content_copy

app.get("/", (req, res) => {
    let { input } = req.query;

    let sanitize = (obj) => {
        if (obj instanceof Array) {
            for (let element of obj) {
                sanitize(obj);
            }
        } else if (obj instanceof Object) {
            for (let key in Object.keys(obj)) {
                sanitize(key);
                sanitize(obj[key]);
            }
        } else if (typeof obj === "string") {
            if (obj.indexOf("$") >= 0) {
                throw new Error("No-SQL Injection!");
            }
        }
    }
    
    sanitize(input);
    
    res.send(doSomethingWith(input));
});

This code implements a very basic filter that checks for any no-sql injections.

Bypassing this check is, in some sense, even easier than the previous example. Here, we only need to produce an array whose keys are non-numeric. Although at first blush this sounds difficult or even impossible, prototype pollution makes this almost trivial.

In order to get the arbitrary key that we need, we can create an object via qs. Then, we define __proto__ on that object to be an array with some arbitrary content. Because the prototype of our object is now an instance of Array, the object itself becomes an array. However, qs still thinks that it is an object, and so we can set arbitrary keys on it. If we insert this into a no-SQL query, those keys will look like standard object keys, and allow us to execute arbitrary no-SQL commands.

In practice, this looks like obj[$exec]=malicious string&obj[$exec][__proto__]=&obj[$exec][__proto__]=.

The Ugly

The goal of this writeup is to demonstrate the dangers of faulty assumptions. Good documentation and anecdotal evidence can lure us into think we understand how something works, when in reality the code itself is doing something wildly different. We often think about our dependencies as these neat little packages that we can trust because 5.1 million other people use them. Yet, many of these packages have dirty little secrets, hacks, and poor design decisions that render blind trust fatal.

The final example I have for you is as follows:

typescript

content_copy

app.get("/", (req, res) => {
    // Check for any non-string terminal in our input
    let challenge = (obj) => {
        if (typeof obj === "string") {
            return false;
        }
        
        if (typeof obj === "object") {
            for (let key of Object.keys(obj)) {
                if (challenge(obj[key])) {
                    return true
                }
            }
            return false;
        }
        
        return true; // found something other than an object or a string 
    }
    
    if (challenge(req.query)) {
        res.send("Success!");
    } else {
        res.send("Failed");
    }
});

At the beginning of this blog post, I mentioned that the only types qs could produce are strings, objects, and arrays (which are really just objects). However, this test checks for something other than those. A read-through of qs's documentation would suggest that this isn’t possible.

The docs are wrong.

I don’t want to spoil the fun, but if you find it, send me a tweet at @zwad3. Although I only have one solution, it would not be surprised if qs had more than one method for bypassing this check.

The Solution

Now that we really understand how qs works, we can begin to protect ourselves from these bad assumptions. Specifically, we have two primary concerns. The first is simple: “Is the type of our input the type we are expecting?”. The second, however, is more subtle: “Has any prototype pollution occurred?”.

Ironically, the easiest problem to solve is the latter. Taking a deeper look at qs's source code reveals that it builds it’s objects bottom-up. This means that when you have a[b][c]=d, it first creates {c: 'd'}, then {b: {c: 'd'}}, and finally {a: {b: {c: 'd'}}}. As such, the prototype is only every overwritten, and never modified. Therefore, to check whether any malicious user has modified the query object prototype, one can simply check whether obj.__proto__ === Object.prototoype (or obj.__proto__ === Array.prototype, in the case of arrays). If these are equivalent, then the prototype has not been modified.

The first problem, however, remains trickier. Often times, the naive solution is to simply check the type of every object that is passed through the query parameters. This is fine when only strings are anticipated, but anytime a program expects a more complicated object, this manual checking becomes tedious and error-prone.

To better address this problem, it can be helpful to employ a marshaller — a parser of sorts that takes an object and a description of what it should look like, and compares the two. Ideally, this check would not only provide runtime guarantees about the type of the incoming object, but also provide static checks for compile time as well.

With these two checks in place, using qs becomes safe and reliable. Even when qs decides to produce a type that was previously unexpected, with a reliable marshaller, this will be caught immeddiately.

The Lies We Tell Ourselves

Regardless of how you address these problems, it should be clear at this point that the web of trust we weave is fragile, and can fall apart anywhere. No one will ever completely understand every single thing that makes a computer work, but when we drop our presumptions of safety, we become more equipped to prevent problems before they happen.

I have tremendous respect for everyone who has worked on qs over the years. Maintaining an open source package is challenging in the best of times, but trying to ensure that your changes don’t break half the web must certainly be daunting.

In writing this article, my hope is not to publicly shame the project, but help illustrate how messy modern development can be. I would encourage everyone to critically evaluate everything that you trust, and once you understand it, share it with the rest of us.

Zach Wade is an alumnus of computer science at Carnegie Mellon University. He is also a member of the PPP, CMU’s competitive hacking team. You can find him at @zwad3