Faster Pageloads: Effectively using HTTP Caching, Cache Busting, and a CDN
September 15, 2014 - Best Practices・Product Updates
It seems HTTP caching is one of those things few devs ever really need to think about. We expect webservers to cache assets intelligently, and largely ignore caching except when testing, when we’re sure to use the tricks we’ve used for years: hard refreshing, dev tools cache clearing, maybe an extension to simplify those.
But ultimately, that’s about the extent of it. Caching works great! Or rather, it works well enough, and we have more important things to do.
Then one day…
Then one day, things start to get complicated. For us, it was two realizations: our application servers cost far more than normal due to our PCI compliance requirements; and our logging showed something like 90%+ of requests were for nearly static javascript and CSS includes (loaded on every pageload of every site using FoxyCart).
The solution was clear: Add a CDN and some Cache-Control
headers. BOOM! Problem solved. In our first month, our CDN did something like 50GB worth of traffic, serving entirely tiny CSS and JS files. Big, big win for our servers’ load, our hosting costs, and our users (a la faster pageloads).
The only minor issue was cache busting. Occasionally we’d need to push out a bug fix or change to the javascript or CSS. We could clear the CDN’s cache, but the browsers that’d already cached the old version wouldn’t re-request the file until it expired. For years, we did the old query string method, simply appending a ?v=2
and incrementing the integer as needed. (Some proxies don’t cache these correctly, according to common wisdom, but we’ve never seen any real data about how prevalent those are, so … yeah.)
With FoxyCart v2.0, however, that approach became untenable. Unlike previous versions, FoxyCart 2.0 has tons of template configuration options that result in much more dynamic CSS and JS includes. Whereas before we could simply say “Copy/paste this bit of HTML into your templates” (and that bit of HTML would have the appropriate querystring), we now needed much better cache busting control.
How Browser Caching Works: A Quick Primer
“Caching” refers to a browser saving a file locally, then reusing it later instead of re-requesting the same asset from the server again. This works for HTML, javascript, images… just about anything a webserver can serve up.
For example, if your website has some large images, it doesn’t make sense for the browser to re-download them every pageload if the images haven’t actually changed; the webserver and browser should figure out “Hey, this file hasn’t changed, so don’t bother re-downloading it.” Faster browsing and lower loads on servers. Win all around!
The way the browser and server figure this out is using request and response headers, which is just the term given to the extra pieces of info that automatically get sent back and forth. When the webserver responds to the browser’s request, the response includes the image itself as well as the headers telling the browser how to cache the file (among other things). The headers we care about for today are:
Cache-Control
tells the browser whether or not to cache the file at all, and if so, for how long. For example,max-age=600
tells the browser to use it’s cache for the next 600 seconds (10 minutes). During this time, the browser won’t even bother to talk to the server about this particular image.ETag
contains a unique fingerprint for the file, generated by the server (typically a hash). Once themax-age
has passed, the browser will send the etag value with its request (in anIf-None-Match
header), which basically tells the server “Hey, I’ve got this particular version of the file.” If the server still has the same version of the file (as represented by the identicalETag
), the server will respond with304 Not Modified
, which is a computer way of saying “It’s cool, bro. Just use what you have. That file is still fresh.” (If theETag
is different, the server will respond with the updated file and the browser will now cache that instead.)Last-Modified
is effectively identical to theETag
, except it uses a date instead of a hash. You only want to use one or the other.Expires
is an older way to doCache-Control
, and isn’t really needed anymore.
Seems pretty straightforward, right? And it is, except when you change a file. Let’s say you decide to change your homepage image or contact page HTML, but your returning visitors already have the old versions cached. They won’t see the new, and that can be a problem (seeing the wrong promotion, calling the wrong number, seeing sensitive information that wasn’t supposed to go public, etc.).
The easy way around this is to give the image or webpage a new filename. Instead of homepage.jpg
you’d do homepage_new.jpg
or homepage.jpg?v=2
. The browser will say, “Oh, that’s something new! Better go get it from the server!” This is called “cache busting”, and is a pretty widely used technique to ensure returning visitors see new content. It works well, but it relies on control of the assets being requested by the browser. For something like FoxyCart, which has a copy/paste bit of HTML that should be “set it and forget it”, we can’t very well tell our users to update their sites constantly as make changes (or as they make changes to their store settings).
CDNs and Caching
Another dimension of caching is with Content Delivery Networks (CDNs). Briefly, a CDN is a large number of servers all over the world that serve up content. So instead of a browser getting the file from your single server in Texas, the browser gets the file from your CDN. This has two huge benefits:
CDNs are usually global, so a visitor in Australia will get files from a server in Australia. Much faster for the visitors.
The requests don’t hit your own server, so your servers have to do less work (and therefore cost you less in CPU usage and bandwidth).
There are two basic approaches to getting content on a CDN. The first is to treat the CDN like an FTP site, and just push up what you want. The second (which we use) is for the CDN to act as a proxy between the browser and the webserver. When a browser requests a file, it makes the request to the CDN. The CDN then checks its own cache to see if it has an appropriate file to serve up. If so, it serves it. If not, it makes the request to the webserver. The webserver responds (with all the cache headers it wants), and the CDN then serves that back to the browser (and caches it according to the cache headers it just received).
There are two important differences between a CDN’s caching and a browser’s, however:
Unlike a browser, we can purge the CDN’s cache on demand.
We can set a
s-maxage
for the CDN to respect (s
for “shared”), which it will use instead of themax-age
. ie. We can tell the CDN to cache something for longer than the browser, which is a trick we use (explained below).
Our Goals with FoxyCart v2.0
To understand our approach, let’s first run through how FoxyCart 2.0 works. We need the user’s website to have some javascript and CSS so we can get our Sidecart going on:
To do that, we need:
Generic JavaScript, not account-specific except for setting 2 variables needed to attach handlers and manage sessions, and these values change very very rarely. ~120KB
Account-specific JavaScript, with the store’s configuration (language strings, cart template customizations, etc.). ~22KB
CSS, mostly generic but with account-specific customizations. This info could change frequently, especially during development. ~40KB
What we want seems straightforward:
Serve up the right files. Especially during development, these files can change frequently. We don’t want the CDN or the browser caching and loading an old version of a file.
Utilize a CDN to reduce pageload times and reduce load on our servers.
Use local (browser) caching to ensure immediate pageloads.
Our Solution: Script Loader + CDN + Caching + Automatic Cache-Busting + localStorage
The solution isn’t exactly obvious, but once you understand the pieces, it’s wonderfully straightforward. The first and most important piece of it all is a quickie script loader. This is a change from copy/pasting HTML referencing the JS and CSS directly (as in previous FoxyCart versions). It looks like this: This single file has quite a bit to it. Let’s start with the script tag itself.
ASYNC and DEFER
You’ll notice the async
and defer
attributes on the script
element. More info is available from the reference links in the bottom, but the basic idea is to ensure the js doesn’t slow down the document load event.
The loader.js Cache-Control Headers
loader.js
is where most of the magic happens. First, let’s look at the response headers:
Cache-Control
ismax-age=90, s-maxage=21600, public
ETag
as computed for the current output. The browser will cache this for 90 seconds (max-age
), during which time new pageloads won’t even make a HTTP request to our CDN. Instead, the browser will happily use what it’s already got. After the response is 90s (on subsequent pageloads), the browser will make a request to the CDN, and in that request theETag
value is included.
The CDN will check the ETag
value it has, and if it matches the request, it’ll respond with a 304 Not Modified
header, prompting the browser to use what it had before. If the ETag
doesn’t match (as would be the case if the file had changed and we’d purged the CDN’s cache), the CDN would serve the latest file.
Why not set the maxage
to 6 hours directly and skip the requests? Good question. Since the javascript that loader.js
is serving up can change rapidly, we don’t want a possible lag between a dev making a change and a customer seeing the change. Otherwise we have the “It’s broken!” cry from the customer and the “Oh… sorry, you need to clear your cache or wait 6 hours” response from the dev. Once the browser loads loader.js
, it’ll respect the max-age
, and there’s nothing we can do about that.
What we can do something about, however, is the CDN cache. Whenever a store’s settings change, we clear that store’s cache on the CDN. We can set the CDN’s cache separately using the s-maxage
parameter, so the CDN responds to the vast majority of requests without hitting our servers.
Why 90 seconds and 6 hours? It’s somewhat arbitrary, but we’re using 90s instead of longer so we cut down at least some requests from the browser, yet we don’t set it long enough for most users to get a stale copy for more than a pageload or two. We could go longer on the CDN cache, since we can purge that at will, but 6 hours is a safe range. If a cache purge request fails, at least we aren’t serving a stale file for a week (or month or year).
The loader.js Response
So loader.js
is cached, but what’s it actually doing? Loading stuff, as you might have guessed. Here’s what the response looks like:
var fc_css=document.createElement("link");fc_css.setAttribute("rel","stylesheet");fc_css.setAttribute("media","screen"); fc_css.setAttribute("href","//cdn.foxycart.com/foxycart-demo/responsive_styles.1409442579.css");var fc_script=document.createElement("script");window.jQuery&&(1<=window.jQuery.fn.jquery.match(/(\d+).(\d+)/)1&&7<window.jQuery.fn.jquery.match(/(\d+).(\d+)/)[2]||2<=window.jQuery.fn.jquery.match(/(\d+).(\d+)/)1)?fc_script.src="http://cdn.foxycart.com/foxycart-demo/foxycart.jsonp.sidecart.min.1410278735.js":fc_script.src="http://cdn.foxycart.com/foxycart-demo/foxycart.jsonp.sidecart.with-jquery.min.1410278735.js";function fc_loader(){document.getElementsByTagName("body")[0].appendChild(fc_script);document.getElementsByTagName("body")[0].appendChild(fc_css)}window.addEventListener?window.addEventListener("load",fc_loader,!1):window.attachEvent?window.attachEvent("onload",fc_loader):window.onload=fc_loader;
Tons of javascript, but it’s basically adding a script
tag (after checking for jQuery’s presence) and a link
tag for the CSS. The key is that both those files include timestamps in them (the 1409442579
and 1410278735
). Those timestamps get generated and included in the output there automatically based on the modification dates of the files on the server. A quick bit of server rewrite rules handles the requests (made by the CDN to our system). They don’t actually reference a specific version of the file; they’re simply an effective way of cache busting (like the query string approach).
Other JS and CSS
The CSS and JS that loader.js
actually loads gets its own Cache-Control
setting the max-age
to 30 days, and it gets ETag
s as well. That’s pretty standard, and we can be comfortable caching those for 30 days because we have automatic cache busting.
localStorage
So we’ve got the loader and its “loadee” files handled. What’s left? For our purposes, there’s one more piece. With the new Sidecart approach, we’re no longer doing an iframe
like we used to, so every aspect of Sidecart needs to be loaded. This includes a few big pieces of javascript:
The Twig.js template to render Sidecart.
The language strings to output text to Sidecart.
A truly massive JSON object of countries and states/provinces, along with other helper information like patterns to match each country’s postal code format. (Used in the shipping estimation functionality new to v2.0.)
Cool stuff, but not something we want to pass around any more than we have to. We could conceivably load that up in other javascript files, or we could load it in a separate js file and cache the file, but since this is all data and since localStorage
has pretty wide support, we throw this into the browser’s localStorage
, along with a hash of the object (to ensure it’s refreshed if and only if it’s stale).
Pros, Cons, and Possible Improvements
This approach gives us a few huge wins:
The latest files are always loaded. Maximum delay for seeing the freshest files is 90 seconds for the browser caching.
Cache busting is automated.
Async script loading, browser caching, and
localStorage
ensures FoxyCart code doesn’t slow down the pageload at all.A single
loader.js
include will eventually allow us to accommodate additional functionality, a new cart approach, or even a completely new version of FoxyCart. All of this can happen without the dev needing to touch the script. Nothing’s truly future-proof, but this is a step in a positive direction.
There are two disadvantages:
(Almost) Every pageload needs to make an HTTP request to get the latest
loader.js
. This is a super tiny response (<1KB), but it’s an extra request nonetheless. We could conceivably extend themax-age
on it, perhaps even allowing stores to enable a “Production / Performance” mode that’d set themax-age
to longer, but at this point we feel a super tiny request is a reasonable tradeoff for the benefits above.Browser preloading of CSS and JS is also unavailable with script loaders. Since this will almost always only occur on the initial pageload (and subsequent pageloads will have the assets cached), this seems acceptable as well.
Why not use Require.js or something similar?
We did explore using Require.js and other script loaders, but ultimately felt that it’d just add additional complexity and overhead that didn’t actually help beyond what we’ve done. Also, since FoxyCart includes are used in nearly every environment imaginable, we’re trying to reduce dependencies and potential conflicts.
Additional Reading
Did you make it this far? If so, leave a comment 🙂 We’d love to know how you’re tackling your own caching dilemmas.