I have the pleasure of managing the technology side of a fairly large-scale Web site for a traditional media company. We do somewhere around 2.7 million pageviews on a normal day, and as many as 7 or 8 million on a busy day.
But just because the bandwidth is cheaper at the CDN doesn’t mean that it’s free. We still want to minimize the amount of data we push through that network.
Obviously, a smart caching policy is the first place to start. We use Apache’s mod_expires to set sensible expiration policies on content. For example, we allow visitors to cache our CSS files for a week before their browser even needs to check to see if it’s changed.
Despite such efforts, we have had trouble containing the traffic to the site. When I performed a deep analysis of our CDN traffic and compared it to the number of pageviews and visitors, I found some disturbing results.
We were seeing extremely high numbers of requests for very cacheable content. As an example, /site.css was being requested on average once for every 3 pageviews! Considering the average user consumes somewhere in the neighborhood of 7 pageviews in a day, this didn’t make sense.
We have a pretty dedicated audience. In other words, visitors will probably come to our site 8 or 9 times each month. With a fully functioning browser cache (and assuming the cache never has to purge objects), such a user should only make 4 requests for /site.css in a month. That would probably be 4 times for every 50-60 pageviews. Much lower than the 29% request rate we were seeing.
This method will cost you tons of bandwidth, because the browser will omit the If-Modified-Since header from its HTTP requests, and the server will return the full content of every object on the page.
If you omit the forceget parameter (or specify it as false), the browser will reload the page, and it will send the If-Modified-Since headers in its requests. This behavior is exactly the same as if the user clicked the reload button in Firefox (without holding the Shift key). Note that while your browser will still use objects from cache and won’t actually download most of the objects on the page, it will still check the objects. So if your page has 50 objects, you’re looking at 50 requests to the server, most of which will return 304 (Not Modified).
This can slow down the page reload for the user, and while a 304 response is much smaller than a 200, the bandwidth consumed adds up.
On our site, these requests amounted to about 10 Mbps of billable bandwidth (out of 70 or 80 Mbps). This is substantial.
window.location = window.location
Rather than using the reload() function, you can force the browser to “revisit” the current URL. This is akin to a Firefox user clicking at the end of the URL in the address bar and hitting “return”. It’s also similar to browsing away from the page and browsing back by clicking a link back to the first page.
If your cache policy on your HTML prevents caching (and if you’re using dynamic, personalized HTML, you probably should be preventing caching), the HTML will be reloaded for the user, but the objects on the page will be pulled straight from cache. The browser won’t even check for updates on objects that are within their cache periods.
The bottom line
window.location.reload() is expensive from a bandwidth standpoint no matter what you pass as the forceget option. Avoid it if you’re sensitive to excessive bandwidth usage.
Keywords: content distribution network, cost, expensive, shift-reload, auto reload, auto refresh, cache