CORS, Cache poisoning and the Vary HTTP header

Understanding the Vary header is key ensuring CORS will work with CDNs and caching.

CORS, Cache poisoning and the Vary HTTP header

What is CORS

Taken from the MDN web docs

Cross-Origin Resource Sharing (CORS) is a mechanism that uses additional HTTP headers to tell browsers to give a web application running at one origin, access to selected resources from a different origin. A web application executes a cross-origin HTTP request when it requests a resource that has a different origin (domain, protocol, or port) from its own.

So one site hosted on one domain can access certain resources of another domain, provided the remote domain has allowed this to happen. Most commonly this occurs with AJAX.

The Origin header and cache poisoning

Out of the box, most CDNs (and other reverse proxies) will not vary their cache by arbitrary HTTP headers. So you can be left in a situation where you cache can be poisoned.

E.g. if an attacker requested:

curl -sIXGET -H "Origin: https://evil-site.com" "https://remote.example.com/resource" | grep -E '^Access|^Cache|^Vary' | sort

Cache-Control: max-age=900, public

Notice how:

  • There are no CORS headers in the response
  • There is no Vary header in the response
  • There is caching headers that indicates that CDNs should cache the response

Subsequent requests to the URL, with the correct Origin HTTP header will yield the same response (as it is cached). This is known as cache poisoning.

How to solve

There are 2 ways to solve this.

Option #1 - change the CDN

The first method is to alter your CDN to add the Origin HTTP header into the cache key definition. I don't typically recommend this approach as it forces your application logic into your CDN. It also means pages that do not need to be varied are, and this will reduce your cache hit rate.

Option #2 - change the remote site

The second approach is to get the site your are serving the remote resource from to add a Vary HTTP header to instruct any CDNs to vary their cache key by the value of this header. See the MDN docs for more background on this magical header.

e.g.: this is what you should be looking for

curl -sIXGET -H "Origin: https://www.example.com" "https://remote.example.com/resource" | grep -E '^Access|^Cache|^Vary' | sort

Access-Control-Allow-Credentials: true
Access-Control-Allow-Headers: DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type
Access-Control-Allow-Methods: GET, POST, OPTIONS
Access-Control-Allow-Origin: https://www.example.com
Cache-Control: max-age=900, public
Vary: Origin

Note the Vary header.

Once you have the Vary header in place, you will need to drop the entire cache of the site in the CDN to ensure there are no entries left that are poisoned.