CORS, Cache poisoning and the Vary HTTP header
Understanding the Vary header is key ensuring CORS will work with CDNs and caching.
What is CORS
Taken from the MDN web docs
Cross-Origin Resource Sharing (CORS) is a mechanism that uses additional HTTP headers to tell browsers to give a web application running at one origin, access to selected resources from a different origin. A web application executes a cross-origin HTTP request when it requests a resource that has a different origin (domain, protocol, or port) from its own.
So one site hosted on one domain can access certain resources of another domain, provided the remote domain has allowed this to happen. Most commonly this occurs with AJAX.
The Origin header and cache poisoning
Out of the box, most CDNs (and other reverse proxies) will not vary their cache by arbitrary HTTP headers. So you can be left in a situation where you cache can be poisoned.
E.g. if an attacker requested:
curl -sIXGET -H "Origin: https://evil-site.com" "https://remote.example.com/resource" | grep -E '^Access|^Cache|^Vary' | sort
Cache-Control: max-age=900, public
Notice how:
- There are no CORS headers in the response
- There is no
Vary
header in the response - There is caching headers that indicates that CDNs should cache the response
Subsequent requests to the URL, with the correct Origin
HTTP header will yield the same response (as it is cached). This is known as cache poisoning.
How to solve
There are 2 ways to solve this.
Option #1 - change the CDN
The first method is to alter your CDN to add the Origin
HTTP header into the cache key definition. I don't typically recommend this approach as it forces your application logic into your CDN. It also means pages that do not need to be varied are, and this will reduce your cache hit rate.
Option #2 - change the remote site
The second approach is to get the site your are serving the remote resource from to add a Vary
HTTP header to instruct any CDNs to vary their cache key by the value of this header. See the MDN docs for more background on this magical header.
e.g.: this is what you should be looking for
curl -sIXGET -H "Origin: https://www.example.com" "https://remote.example.com/resource" | grep -E '^Access|^Cache|^Vary' | sort
Access-Control-Allow-Credentials: true
Access-Control-Allow-Headers: DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type
Access-Control-Allow-Methods: GET, POST, OPTIONS
Access-Control-Allow-Origin: https://www.example.com
Cache-Control: max-age=900, public
Vary: Origin
Note the Vary
header.
Once you have the Vary header in place, you will need to drop the entire cache of the site in the CDN to ensure there are no entries left that are poisoned.