Path Traversal is another pretty common type of injection vulnerability. They tend to happen when the construction of a URI (be it for a URL, file path, or otherwise) doesn’t properly ensure that the fully resolved path isn’t pointing outside the root of the intended path.
It's important to call out that path traversal could also be seen as in fact being a path *injection* vulnerability.
The impact of a path traversal vulnerability heavily depends on the context of where the traversal occurs, and the overall hardening that’s been done. But before we get into that, let’s go through a quick practical example of this vulnerability to see what we’re talking about:
Consider an endpoint in your application that serves up documents, like templates for contracts or job offers. These could all be files, like PDFs, that are static in your application.
In this situation, you might have a piece of code like this to fetch the files on request:
In order to demonstrate how the vulnerability plays out, we also have to know where the root of our application is so, for this example, assume the application's root is at ‘/var/www/api/’.
We know the application takes a ‘filename’ parameter, let's look at a few examples of inputs, and what the result is:
Notice how we’re able to traverse the file system using ‘../’. We're able to move out of the ‘documents’ folder where the PDFs usually live and into the ‘/etc/’ folder containing the ‘shadow’ file, which on Linux, contains password hashes. As you can imagine, that’s really not ideal.
Another variant of path traversal can occur when constructing URLs that are intended to interact with an API. Assume we have an API with following methods:
The API is interacted with by another application that might call it, say, when trying to get information about an order:
What now happens, depending on the order ID provided by the user? Below you see the effective URL invoked based on the input provided.
The canonicalization is usually not done on the client side (though it can be), but web servers will canonicalize the request into the format seen below.
With the input of the second example, rather than fetching the order with ID number ‘1, we’ve actually invoke the delete method instead, which of course results in deleting the order.
When discussing path traversal, there are both direct mitigations along with indirect/defense techniques that can, and should, be applied as often as possible. First, let’s look at how to handle paths.
When it comes to handling a path, we have to understand the process of path resolution, or path canonicalization, and its importance.
When you have a path like ‘/var/www/api/documents/../../../../etc/shadow’, it’s in a non-canonical path. If you request this path from your file system, it will canonicalize it to ‘/etc/shadow’. It's critical that you don’t try to open non-canonical paths. Rather, you should canonicalize paths first, verify that they’re pointing only to the intended file or folder, and then read it.
It may be tempting to do something like this:
However, this approach should not be used. The key in handling paths is to always look at the canonical path.
As long as the canonical path isn’t breaking any rules, how the path is ultimately constructed doesn’t really make any difference. Trying to sanitize a path like this is very error-prone and rarely secure, if ever.
In our previous examples, we've used the reading of the ‘/etc/shadow’ file, which is the file with password hashes on Linux. But there's really no reason an application should be able to read that file, or other files, outside of its root.
If you employ containers, you’re likely already mitigating a lot of risks. Taking steps to harden the container (don't run as root, and such) is vital. Dropping all privileges from your web process is strongly recommended, and limiting its read permissions on the file system to only those files it strictly needs.
Now we’ll share a few examples in various languages to help demonstrate things a little better while they’re in action.
By not resolving the full path, or ensuring that you only use the file name part of a path, it leaves the code vulnerable to Path Traversal.
In this example, we protect against Path Traversal by resolving the full (absolute) path, and ensuring that the file resolved path is within our base folder.
In this example, we protect against Path Traversal by taking only the file name part of the path, ensuring that it's impossible to traverse out of the folder specified.
By not resolving the full path, or ensuring that you only use the file name part of a path, it leaves the code vulnerable to Path Traversal.
In this example, we protect against Path Traversal by resolving the full (absolute) path, and ensuring that the file resolved path is within our base folder.
In this example, we protect against Path Traversal by taking only the file name part of the path, ensuring that it's impossible to traverse out of the folder specified.
By not resolving the full path, or ensuring that you only use the file name part of a path, it leaves the code vulnerable to Path Traversal.
In this example, we protect against Path Traversal by resolving the full (absolute) path, and ensuring that the file resolved path is within our base folder.
In this example, we protect against Path Traversal by taking only the file name part of the path, ensuring that it's impossible to traverse out of the folder specified.
By not resolving the full path, or ensuring that you only use the file name part of a path, it leaves the code vulnerable to Path Traversal.
In this example, we protect against Path Traversal by resolving the full (absolute) path, and ensuring that the file resolved path is within our base folder.
In this example, we protect against Path Traversal by taking only the file name part of the path, ensuring that it's impossible to traverse out of the folder specified.
We secure software through developer-driven security at the start of the software development lifecycle.
Visit Secure Code Warrior