So when I visit https://remysharp.com/404_ it renders the 404.html - except that it’s not being gzipped at all. The “normal” 200 pages are remy sharp's b:log (this page actually exists).
The problem for me is that I’ve got a 13 years old blog with over 500 posts and there’s a lot (relatively) 404s coming in and they’re contributing to my bandwidth limits. But if they’re not being gzipped, it’s a LOT more bytes for no good reason.
Is this a bug in Netlify or some configuration that I’m missing?
Not sure if we intend to gzip 404 pages. I can see that it would make sense, just not sure what the code says there. If you’re seeing that, it’s likely a case that we forgot to cover and I can file a feature request.
In the meantime, all the assets it loads ARE compressed, so maybe making it a very small page and referring to other assets is a good workaround for you?
I also do not think you can set a status like 401 successfully. Might try not having the status code there and see if it works better, or better yet, not have the redirect setup at all and just put a 404.html in the root?
(Note: I had a typo in my original report when I used 401 - it’s actually 404 in my code - and 've updated the original comment).
I’d argue that not gzipping 404 redirects is definitely a bug (or needs feature added). A good deal of random traffic comes through from Google spiders and the like that cause 404s and are contributing to unnecessary for both me but more importantly: Netlify.
There’s also the question of whether a user might use non-200 codes with Netfliy functions, I’d expect (or hope) anything that’s HTML and being sent from Netlify to be gzipped.
Hi @remy, I checked your custom 404 page via a non-existent url and do show that the page is served gzipped. Can you provide some more details on how you were able to determine the page was not gzipped?
Above is a regular page being requested with gzip support - specifically I’m looking at the Content-Encoding. Below is a non-existent URL which is redirecting using a Netlify redirect of /* /404.html 404.
You can see the status code is correct, but the Content-Encoding header is missing and the content length is about 30K (it should be 13019 when compressed over gzip).
Again, @Dennis I’m not sure what kind of request you ran to see it being compressed. I originally noticed this from Chrome’s network devtools tab, but I think using curl is a pretty good test too.
Hi @remy. I tested using the chrome dev tools and did see the gzip under content-encoding. That’s why I wanted to confirm how you were testing. Just needed that to file the issue. Now that I have that, I’ll test with a curl command and get it filed as soon as I can.
Just to add a bit more context to this problem. I recently turned on analytics on my site remysharp.com to analyse where the bandwidth was being chewed up.
Obviously this is bad (and the reason I turned on analytics was to catch these unknown 404s) and I’ll fix this going forward.
But the cost to Netlify and myself has been: 34,133 x 35Kb (the size of the uncompressed redirected 404 page): ~1.1Gb of bandwidth per month.
With gzip enabled, this bandwidth would have been ~510Mb. This is large to me, but small to Netlify on the microscale, but should be obvious when we think of all the other sites Netlify hosts and the redirects in place that aren’t compressed.
Even if you ignore the fact I have a bloated 404 page (yes!), the default Netlify page is also uncompressed.
# regular curl with compression doesn't get a gzip header back
$ curl --compress https://ffconf.org/__XXXXX___ -I -X GET
HTTP/1.1 404 Not Found
Cache-Control: public, max-age=0, must-revalidate
Content-Type: text/html; charset=utf-8
Date: Tue, 23 Jul 2019 09:41:14 GMT
Etag: 1555960685-ssl
Age: 36
Content-Length: 2785
Connection: keep-alive
Server: Netlify
X-NF-Request-ID: 0bb582f8-2928-4a42-82d3-4c01df393a0c-15218774
# current site
$ curl --compress https://ffconf.org/__XXXXX___ | wc -c
2785
# potential size
$ curl --compress https://ffconf.org/__XXXXX___ | gzip | wc -c
1250
You’re looking at saving 55% of the current bandwidth spent on redirected 404s. I can imagine that amounts to real money.
@remy, I totally agree that gzipping custom 404 pages is a important and would definitely amount to saving for both customer and Netlify. I’ll be sure to add this context to the issue regarding serving un-gzipped custom 404 pages. Thanks for providing that context.
@remy mentioned that the default Netlify 404 page is also not gzipped. Can we gzip that first?
I think there’s a lot crawlers out there not respecting the robots.txt so the only option left is to redirect them somewhere, and the default 404 page would be an easy target for the users. Do you think this is something worth doing? Can this potentially save bandwidth to the users and ultimately for Netlify itself ?
It certainly could! Not certain how easy this is though - that page is generated from our API not in the usual “there’s a static page that can be easily cached” way. Still, great idea and very worth filing a feature request and I’ve done so for us.
Hiya folks! Just wanted to follow up that at some point in the last 11 months we changed this - we will now automatically gzip this content assuming the page HTML is >1kb. Since our 404 page is a bit above that, it is compressed . Below that size, it isn’t much of a reliable win, so we don’t do it.