Please stop using CDNs for external Javascript libraries
I want to discuss a (minor) antipattern that I think is (slightly) harmful.
Lots of websites use large Javascript libraries. They often include them by using a 3rd party Content Delivery Network like so:
HTML<script src="https://cdn.example.com/js/library-v1.2.3.js"></script>
There are, supposedly, a couple of advantages to doing things this way.
- Users may already have the JS library in their cache from visiting another site.
- Faster download speeds of large libraries from CDNs.
- Latest software versions automatically when the CDN updates.
I think these advantages are overstated and lead to some significant disadvantages.
Cacheing
I get the superficial appeal of this. But there are dozens of popular Javascript CDNs available. What are the chances that your user has visited a site which uses the exact same CDN as your site?
How much of an advantage does that really give you?
Speed
You probably shouldn't be using multi-megabyte libraries. Have some respect for your users' download limits. But if you are truly worried about speed, surely your whole site should be behind a CDN - not just a few JS libraries?
Versioning
There are some CDN's which let you include the latest version of a library. But then you have to deal with breaking changes with little warning.
So most people only include a specific version of the JS they want. And, of course, if you're using v1.2 and another site is using v1.2.1 the browser can't take advantage of cacheing.
Reliability
Is your CDN reliable? You hope so! But if a user's network blocks a CDN or interrupts the download, you're now serving your site without Javascript. That isn't necessarily a bad thing - you do progressive enhancement, right? But it isn't ideal.
If you serve your JS from the same source as your main site, there is less chance of a user getting a broken experience.
Privacy
What's your CDN's privacy policy? Do you need to tell your user that their browsing data are being sent to a shadowy corporation in a different legal jurisdiction?
What is your CDN doing with all that data?
Security
British Airways' payments page was hacked by compromised 3rd party Javascript. A malicious user changed the code on site which wasn't in BA's control - then BA served it up to its customers.
What happens if someone hacks your CDN?
You gain extra security by using SubResource Integrity. That lets you write code like:
HTML<script src="https://cdn.example.com/js/library-v1.2.3.js"
integrity="sha384-oqVuAfXRKap7fdgcCY5uykM6+R9GqQ8K/uxy9rx7HNQlGYl1kPzQho1wx4JwY8wC"
crossorigin="anonymous"></script>
If even a single byte of that JS file is changed, the hash won't match and the browser should refuse to run the code.
Of course, that means that you could end up with a broken experience on your site. So just serve the JS from your own site.
So what?
This isn't the biggest issue on the web. And I'm certainly guilty of misusing CDNs like this.
Back when there were only a few CDNs, and their libraries didn't change rapidly, there was an advantage to using them.
Nowadays, in an era of rampant privacy and security violations, I think using 3rd party sources for Javascript should be treated as an anti-pattern.
Bryan Rieger said on twitter.com:
I remember when ANY 3rd party code was considered an unacceptable security risk, but enter ad money, tracking and analytics and 3rd party code quickly became the gold standard.
Šime Vidas says:
What about https://www.google-analytics.com/analytics.js, https://platform.twitter.com/widgets.js, and similar? Google, Twitter, and other major companies still instruct users to load scripts directly from their domains.
@edent says:
I don't think users should be tracked by third-parties. There is a security and privacy risk when you include other people's code on your website.
Personally, I'm looking for a way to move away from Twitter's JS embedding on this blog.
formfeed says:
As a user, I often block these scripts exactly for these reasons. Scripts used across different sites can easily gather information that data aggregators are looking for. (Kind of like the free android app-developer libraries which become part of the new app but still report home.) I also don’t get the other side of the deal. As a developer you get a convenient way to embed some js, but a third party gets your customers’ movement data to combine with data from other sites. It’s like if a clown went to all the bike stores downtown offering to attach a free balloon to every bike, but wants customer data in exchange. -Would look suspicious to me. Especially if he also sold bike carriers 😉
Anonymous says:
Given the mentioned sites are not simple CDNs and they harvest data from IP addresses/referrer headers, including them in the network request before the user explicitly opts-in can result in a very high GDPR penalty.
John Doe says:
You can use a Embetty server for Twitter: https://github.com/heiseonline/embetty
For videos, use (use youtube-dl to download): https://www.ctrl.blog/entry/stop-embedding.html
You don't need Google Analytics. If you really need analytics, open source self-hostable solutions exist, such as Plausible Analytics and Matomo (formerly Piwik).
🇪🇺Jonathan Matthews said on twitter.com:
I agree with your thesis, but disagree that there’s a letter E in the word “caching” ...
Daniel says:
More and more browsers have origin-isolated their caches. Safari — including on mobile— has been doing it for years! There is no such thing as a shared cache in these browsers.
P.S.: You should swap out Gravatar for Libreavatar with a proxied image cache. It's an open alternative that uses an open standard and a federated system. Your blog would query the user's email domain for the face icon (or libreavatar.org only as a fallback).
Dewald says:
While I like the idea, Libravatar’s API is exceptionally slow and also error-prone. I scripted a benchmark for making 100 requests for 100 email addresses with the default size of 80x80px. The email addresses were a combination of Gravatar email addresses and Libravatar specific email addresses. The average file size is ±10KB while the average response time is ±6.8s. The 99th percentile of the response time is ±21s. That’s really not acceptable.
If privacy is really a concern here, get rid of Gravatar altogether.
John Doe says:
This, so it isn't slow as crawl.
Paul Kinlan said on twitter.com:
It’s been kinda useless pattern since the advent of double-keyed caching in browsers.
Jakub G said on twitter.com:
Plus anyway every website uses a different version of a lib than you do:
stevesouders.com/blog/2013/03/1…
Jakub G said on twitter.com:
Re:caching Chrome 86 just shipped cache partitioning last week. Safari did long time ago. Hence cache sharing is not possible anymore.
developers.google.com/web/updates/20…
Colin Bendell says:
100%!
The TCP+TLS blocking delay for jquery (and other 3p javascript) libraries is the single greatest drain on the economy.
Colin Bendell said on twitter.com:
The TCP+TLS blocking delay for jQuery (and other 3p javascript) is the single greatest drain on the economy.
For the love of #webperf move your critical js and css on domain!
shkspr.mobi/blog/2020/10/p…
Fortnitewise says:
I agree with your thesis
av0ider says:
The BA hack seems like a bit of a strange example, considering they were hosting that JS themselves, not on a CDN or with a third party.
Regardless, it definitely shows the risks of someone being able to modify your assets, and perhaps the importance of monitoring them for unexpected changes.
Bret says:
I noticed the same thing:
It looks like in the British Airways example you cited, they were serving some shared libraries off a CDN managed and operated by BA itself, so in effect, they were "serve[ing] the JS from [their] own site.", be it a different system from their presumably server rendered portion of the site. Am I mistaken here?
Even if they bundled, they would have likely hosted it in that same (apparently) hackable/insecure CDN, and the hacker, who clearly went out of their way to target BA could have just as easily modified the bundle to include the malicious code.
Hacker News 250 said on twitter.com:
Please stop using CDNs for external JavaScript libraries shkspr.mobi/blog/2020/10/p… (news.ycombinator.com/item?id=247451…)
Nick Reilingh says:
Until I learned about cache partitioning/double-keyed caching from the comments today I wondered if the web platform needed a way to load a subresource by using a content hash and a series of possible locations to try to load it from. In order words, you would say . It would check if the URIs were already cached, and if not, would attempt them in order. But then would it be possible to mitigate the cache response timing attack by fuzzing response times somehow?
But even then, you have the overhead of an extra HTTP request when it might just be faster to bundle your libs with your own code into one JavaScript file.
I’m starting to think the best of all worlds solution would be a way to tell the browser that a given subresource was allowed to be stored in a content-addressable cache. Maybe…
Lisa Penderil says:
That security script has me wondering: does it have a ">" too many or a tag too short?
@edent says:
Good spot! I've removed the surplus
>
Michael Fasani says:
I use these free CDN sometimes when making quick POC or similar, but I never saw them as something that made any sense in production, especially since browserify and webpack tend to do the bundle in any real-world projects. It’s even worse when you see a site using 2-3 different CDN domains. IMO The first CDN of this nature perhaps was a good idea but the person who created the second made an anti-pattern.
Peter Nikolow says:
There is also another issue. This time legal.
Sometime in country X something is prohibited. Can be online casino, can be social network, etc. So court there prohibit site X to be viewed from country.
And fun starts here IF you share IP with one of these blocked IPs.
Example:https://www.theguardian.com/world/2014/mar/27/google-youtube-ban-turkey-erdogan
Since YT share IPs with GC some sites getting blocked in Turkey.
Marcus Downing says:
I'd add fonts and other assets to this as well. The security implications aren't as serious as for JS files, but the reliability, versioning and privacy benefits are just as real.
There's a performance benefit to be had as well. Modern web protocols (HTTP/2 and HTTP/3) are built to anticipate additional files you're going to need and send them alongside the page reply, saving the client browser from having to go back to the server to ask for them. If you're loading a hundred SVG icons and a couple dozen other files, that can make a big difference to how quickly your page appears. But they can only do that for files on the same domain - any files you load from an external source don't get that benefit.
John says:
I would also tend to think that most webdevelopers/webmasters copy and paste URIs from third party providers such as bootstrap and al out of lazyness, to see what it does. And then later on they forget to move the script to their own servers...
Emma Stebbins said on twitter.com:
Interesting discussion about using external CDNs. I personally have avoided external CDNs in most projects because of the ease of creating bundles with tools like webpack. shkspr.mobi/blog/2020/10/p…
Stephan Sokolow says:
I agree, but I ran out of time trying to diagnose why webpack was breaking Django Debug Toolbar and I generally found it fragile and flaky when being used to bundle JavaScript and CSS for a "progressive enhancement over all" site design, so I turn to other bundling solutions.
Say goodbye to resource-caching across sites and domains | Stefan Judis Web Development said on :
This Article was mentioned on stefanjudis.com
nita daniel said on twitter.com:
Feeling vindicated better never quite passionate enough to actually you know, write it all out. Third Party CDNs for anything other than POC or demo apps are meh.
shkspr.mobi/blog/2020/10/p…
Brent says:
I just wanted to give some counterpoints to some of your arguments:
Caching: There's more of a chance of having already downloading a CDN/version combo than there is if every site hosted the content themselves. >1:1 for CDN/version combo vs 1:1 for self hosted. It doesn't really matter the ratio, as long as it is greater than 1.
Versioning: This is an obvious design consideration choice when selecting current version vs explicit version. It's not really an argument against the use of CDNs but rather a suggestion that there are best practices for development and user experience.
Reliability This is a problem whether using CDNs or not. If something is self hosted and parts of your services service become inaccessible, then either your entire site is down or the experience is degraded. If a CDN is down, its likely that the outages are localised to specific regions, otherwise it wouldn't be a very good CDN. This means, only a portion of a sites users are effected. Likewise, it is an obvious architecture design consideration choice when selecting CDNs and how to handle and/or mitigate outages. Again, this isn't a reason not to use CDNs. What happens if one of your database servers goes down, or the worker services managing your background job queue goes down? Managing CDN outages is no different from managing any other part of your application. You're argument is just suggesting there are best practices to follow for development and user experience.
Privacy This is your only valid point for not using a CDN. However, you have overstated the risk. The resource will be downloaded once, and cached locally. Therefore, the CDN can only track when the resource is being downloaded. If the CDN is working as intended, this should only ever happen once for each CDN/version combo. If the user navigates to 10 sites using the same CDN/version, the CDN provider will only see 1 of the 10 sites visited.
Security As you have stated your self, security is well supported. You're argument is just suggesting there are best practices to follow for development and user experience.
Brent says:
Even though there might be many more CDNs than there were in the past, most libraries and frameworks will specify what CDN to use. This means, content is usually accessible from one or a limited number of CDNs. There is no combinatorial explosion of CDN/version combos that is implicitly suggested.
Stephan Sokolow says:
Re: Brent:
As one of the pingbacks above your message points out, cross-site caching is no longer something you can expect. Browsers either have implemented (WebKit) or plan to implement (Blink, Gecko) double- or even triple-keyed caching to prevent cross-site user tracking based on things like advertisers storing tracking IDs in ETag headers.
No matter which CDN you choose, the browser either does or soon will download and cache a separate copy for each site you request it from.
https://xsleaks.dev/docs/defenses/secure-defaults/partitioned-cache/