Category Archives: OWASP

Web Cache Security Issues: Web Cache Deception and Web Cache Poison

How does web cache work?

In order to reduce the HTTP requests latency and reduce the performance stress of the application servers, an web application would have some  web data, for example, images, js files, css file, HTML context  files, json template, URLs copied and stored in a different storage or place (your browser, proxy server or a CDN)for a certain amount of time, we call it Cache.  After these data are stored in Cache, these cached web data could be served to some users directly rather than asking the application servers to extract data over and over again when users are making requests to these data.

In general, web cache could be categorized as client side cache ( browser cache)  or remote server side cache (Proxy, CDN). The data stored in the browser will only serve the local user when using that browser; whereas, the cached data on the server side will  be distributed and served to many users.

The diagram below depicts how Client Side Cache and Server Side Cache work in a web request, and how App Server could reduce the requests by utilizing cache.

How Web Cache works
How web cache works

Security Issues in Web Cache

Web caching improves performance and convenience, but it has a drawback: security. 

Security issues in client side Cache(Browser Cache)

The risk with browser side caches is that you may leave sensitive information on a browser cache. Users with access to the same browser could steal the cached data. Risks are more likely to occur in public terminals, such as those found in libraries and Internet cafes.  In this article, we will focus more on the security issues on the server side cache.

Security Issues in the server side cache

The most common security issues discovered in the web cache are Web Cache Deception and Web Cache Poison. Web cache deception is when an attacker tricks a caching server into incorrectly storing the victim’s private information, and then the attacker gains access to the cached data by accessing the cache server. In contrast, web cache poisoning is an attack in which a malicious user stores malicious data in the web cache server, and the malicious data is distributed to many victims by the cached server.

Difference between Web Cache Deception and Web Cache Poison

Engineers are frequently perplexed by the terms Web Cache Deception and Web Cache Poison. Let’s use the table below to tell the difference between web cache deception and web cache poison.

Which data are cached?How does an exploit happen?Is Interaction required?
Cache DeceptionVictim’s private data unconsciously stored 1. An attacker impersonates a path within an application, such as http://example.org/profile.php/noexistening.js, and causes the victim to click on the link.

2. Assuming the victim has logged in and the profile.php page contains sensitive data, the victim clicks on the link http://example.org/profile.php/noexistening.js. Due to some loose configuration or misconfiguration, the App server receives the requests and pulls the data for page http://example.org/profile.php.

3. Because the content of this page has not yet been cached in the Cache Server (step 6 in Diagram 1), the data will be cached by the cache server due to the extension noexistening.js, which the cache server considers to be a static file.

4. Now the sensitive data of victims under http://example.org/profile.php/noexistening.js. has been cached in the web cache server.

5. The attacker could make a request to http://example.org/profile.php/noexistening.js to pull the data from the web cache server as most of the web cache server has no authentication implemented. 
Yes. The attacker has to trigger the victim to visit a crafted link.
It will only affect the victims who access the crafted link
Cache PoisonMalicious data crafted by an attacker1. An attacker identifies and evaluates unkeyed inputs in the HTTP request, mostly headers.

2. An attacker injects a piece of malicious code into the unkeyed inputs and makes a request to the app server

3. The app server extracts data for the malicious request by consuming the malicious codes. 

4. The responses with the malicious codes will be rendered to the attacker and the response content will be stored on the web cache server.

5. The victim makes a request to the same page as the attacker and obtains the cached data from the web cache server. The malicious code will be executed at the victim’s end because the cache data contains malicious code.
No.  Any users who get data from the compromised cache server. 
Difference between Web Cache Deception and Web Cache Poison

According to the table above, certain prerequisites must be met for a successful web cache deception or web cache poisoning.

Prerequisites for web cache deception
  • Web cache setting is based on file extension disregarding cache header
  • The victim has to to be authenticated when the attacker trigger the victim to access the crafted link.
  • Loose or misconfiguration in the application route handler so that web server will return the content https://yourapplication/profile when the users make a request to https://yourapplication/profile/test.js. The following snippet is a simple nodejs application with this kind of misconfiguration.
var express = require("express"),
    app = express.createServer();

function fooRoute(req, res, next) {
  res.send("YOUR_PROFILE_PAGE");
}
app.get("/profile*", fooRoute);
app.listen(3000);
Prerequisites for web cache poison
  • An attacker need figure out some unkeyed headers and able to trigger the backend server to return content containing the malicious payload added to these unkeyed headers
  • The content with the malicious payload is cached in the cache server and will be distributed to the victims.

How to Prevent Web Cache Deception and Web Cache Poison

It is unlikely that you could ask your engineering team to disable cache altogether. Here are some common mitigation methods that we could prevent these kind of cache issues.

Only Cache Static File

Cache should be strictly applied by truly static files and it should not change based on user input.

Don’t accept Get request with suspicious header

Some web developers are not implement strict validation against HTTP request header as it is really hard for an attacker to modify the HTTP headers of the requests originated by a victim. However, if these vulnerability is used together with web cache, the damage could be devastating. When the web server process a Get request, it should add a validation function to some HTTP headers.

Insecure logging could be a burden to your security team

If you are part of a security team, it is very likely that your team has been feverishly remediating the vulnerabilities caused by log4j in the past two months.  It is really frustrating and struggling as the potential damage of this vulnerability could be catastrophic if exploited. However, the slimy bright side of it , at least, means that  your developer team is trying to implement logging functions in the product for monitoring or debugging purposes.

However, logging itself sometimes could be another security issue that is often overlooked as many developers are treating logging as an internal debugging and monitoring functions where security enforcement is often missing. I have observed many cases where improper logging functions turn out to be security incidents and add many burdens to its security team to overturn the damage.

Apply logging functions with security controls

Before we dive deep into the details, let us look at the following  piece of code from an old internal project that I created a while back. If you are using NodeJs Express framework, you could pinpoint that this piece of code is acting as a middleware to log every single HTTP request with the request body into a log file

The above piece of code definitely composes  security issues if you start to review it from security perspective. First of all, sensitive information in the HTTP request could be added to logs files and it could potentially cause a data leakage if the logs files are accessed by unauthorized users. The internal project is a web application with a login and registration function. As consequence of the above logging functions, the registration verification token and the username and password in the HTTP requestes could be leaked into the log files.

Potential Risks Caused by Logging

Risk 1: Sensitive Data are logged in log files

Logging sensitive data without proper masking or filtering methods is a common security ignorance from startup to enterprise due to many reasons. Couple of years back, twitter sent out an announcement for its users and urged them to change passwords due to unmasked/unfiltered passwords being logged into an internal log file.  

Reason 1: Security is not baked into entire SDLC

Many development organizations are involved their security teams at the test phase of the software/service development cycles. Without consulting the security team at design and development phase, many developers are not aware which data should be masked or filtered before implementing the log functions.

One tricky and representative example  that I have experienced was that the development teams got a list of blacklist data entries, like IP, password and token by referring to a document created from the security team a while back. They applied the filtering method into the log function without consulting with the Security team. However, the ‘referer’ header containing sensitive API tokens from customers was logged into the log files as it was not included in the predefined black list. This implementation mistake was discovered after the feature has been shipped into production environment and it took a while to purge the sensitive data from the log systems.

Reason 2:  Lack of standard logging functions in a complex environment

With more and more companies adopting the micro service architecture and making the development environment complex, lack of standard log functions could be another reason where sensitive data is logged and exposed into log files.

The following diagram is a typical workflow of micro service architecture, where the API gateway is exposed to the public to handle API requests and many micro servers are deployed in its private VPC to process the API requests.  Some developers are probably aware that sensitive data must be filtered out at the API gateway level before sending it to the S3 log system. However, when the requests passed are handled in the internal micro service (for example, micro service B), the developers might forget to perform the filtering as they believe the service is residing in the internal VPC and there is no need to filter sensitive data before writing to the log files. As a result of that, potential sensitive data could be logged in to the log file by some internal micro services.

Reason 3: Insufficient QA and Security Testing

It is common some QA are only performing blackbox testing whereas Security Team are only employing some automation scanning tools to scan the applications to find the potential flaw in the codes. Then it is very difficult for the QA team and security team to figure out the security issues caused by logging without manual code reviewing.

Risk 2: Malicious Data are process and logged without validation

Another risk when implementing logging functions and writing data to log is that maliciou data is processed and executed without any validation. You might be curious about why I should perform certain validation when processing the log data and storing it in a log file as the entire purpose of logging is to capture the raw data and use it for analysis. 

The reason is that you might be at risk of Deserialization exploitation when validation is absent from your logging functions. I have seen many developers dumping the entire object into the log file in some cases. When this happens, they are very likely to use some serialization functions to serialize the object and write it to the log file. After that, they may deserialize the logged data for analyzing purposes.  In this case, it is possible that you are at risk of deserialization exploitation.

Take the Log4j as an example, except the Log4jShell vulnerability, it has been suffering from a couple of deserialization vulnerabilities where untrusted log data could lead to remote code execuations.  

Some Best Practices

To ensure your logging function is not becoming a burden for your security and even turned against you when it leads to a security incident. Some best practices could be followed.

Involve Security at every phase of SDLC when implement log function

Security applies at every phase of the software development life cycle (SDLC). If you don’t have a security review procedure set up in your organization,  it is time for you to define it now.  By designing  a secure log function or log management feature by collaborating with your security team would save your organization much more time and effort. For example, the security team might ask you to avoid using GET instead of POST if your logging function is going to log all the requests in the log. They could also ask you to mask certain sensitive data as soon as the data is processed before it gets logged in to the log files.

Implement a standard and centralized logging function

When the organization is getting larger and larger, your platform and service is getting more sophisticated. Without a standard or centralized logging function, each team is forced to choose its own way to implement logging functions in the service they are in charge of. This could add many security uncertainties in the log functions as you could not foreseen how the logging function is implemented 

Consistent monitoring and scan your log data

Sometimes, unexpected data could still be logged into the log file even though you have set up strict logging functions to mask or filter all the sensitive data.  For example, your clients might not follow your API usage guidance and send sensitive data when calling your API endpoints. Your log function might log these sensitive data into the log files as the usage of the API is not intended as it was designed.  Under this case, you need to have a monitoring tool to scan your log data to check whether there is unexpected sensitive data logging into the log files.

Understand that data you are logging

If you could not measure it, you could not manage it”. It also applies to security. If you don’t know which kind of data you are logging, you could not really secure it. For example, if you are going to dump an entire object into your log file by calling some serialization functions without validation the object data, you are likely to log some malicious data and could lead to an exploitation.