Netlify Function with query strings ignores custom Cache-Control header

Scott · February 9, 2021, 11:30am

Sorry all, think I may have been crossing my wires with this topic when I replied last! This related to redirects, functions and query string params.

Today, our cache key uses the raw path without query params (as this is optimised for static assets). The cache is only invalidated on a new deploy and not for subsequent requests with unique params. But, it’s a talking point internally so your insight, feedback and ideas are greatly appreciated.

You can set the cache-control header like this however, given the nature of Lambdas being stateless, this could be dropped earlier than the advertised time due to a cold start.

IgnusG · February 9, 2021, 11:57am

I guess if the cache is optimized for static assets without query strings, one possibility to allow caching of lambda query param responses would be to make it opt-in using a custom header or something like that instead. That way the static asset caching would not be influenced, while the developers would have the choice to opt-in to this query param caching behaviour.

I think even with a shorter than-advertised amount of caching time, this might be very beneficial. If the request can be cached and a lot of traffic hits the function unexpectedly at once, the cache could respond instead of running the function. Less lambda calls for Netlify, less execution time for developers. Win win ^^

As it is right now, if one uses query params, they cannot use caching, as it might return an incorrect response - which in a lot of cases is more destructive than returning nothing at all.

snorkypie · February 9, 2021, 1:39pm

I don’t think this is a good default since it’ll trip up more people than it would help. Caching based on pathname should be opt-in imho. If you’re worried about breaking backwards compatibility one option might be to add a cacheKey option to the return value of the function and cache based on that (and default to current behavior).

I would also like to see an option to not invalidate cache on deploy (for obvious reasons).

– my two cents

Scott · February 15, 2021, 5:00pm

@snorkypie,

Appreciate the feedback and I’ve added it to our internal feature tracker!

drzax · March 15, 2021, 10:51am

For what it’s worth Vercel does this flawlessly. I have a fairly expensive function which takes a screenshot with some inputs specified as query params. The code blow does exactly what you’d expect.

const file = await getScreenshot(config);

res.statusCode = 200;
res.setHeader("Content-Type", `image/${config.fileType}`);
res.setHeader(
  "Cache-Control",
  `public, s-maxage=${+config.ttl || ttl}, max-age=${+config.ttl || ttl}`
);
res.end(file);

I’m pleased it’s being discussed internally. I hope a solution can be found. Caching of function results is a fundamentally different thing to caching static assets. It makes some sense to exclude query params the cache key on static assets because it probably shouldn’t be possible to bust the cache by requesting https://example.com/huge-image.png?bust=<new guid>. But functions are supposed to return something different depending on inputs.

perry · March 23, 2021, 6:20pm

hey there @drzax - just letting you know we haven’t forgotten about you, and we are still thinking on this. More soon! thanks for your patience.

buzinas · March 26, 2021, 4:17pm

Wow, 10 months since this nasty bug (yes, it’s a bug!) was reported and Netlify still hasn’t fixed it.

Netlify’s UX is nicer than Vercel’s and other competitors, the speed is great, and developing is a breeze. That said, if a simple, primary error like that isn’t fixed in 10 months (and it seems that Netlify support engineers don’t even treat it as a major bug as they should), how can we trust Netlify for scale?

fool · March 30, 2021, 9:02pm

Hi @buzinas and thanks for that feedback!

It’s a complicated situation to fix for us based on how we cache things and our desire not to break workflows already in place for our millions of customers - and for which it turns out that we are building entirely new CDN components. Not a quick process, and not one we can accelerate and have good, reliable results.

I understand that from your point of view the error is “simple”. The fix is not, and we are focusing on a better fix than we could have done in any shorter timeframe. This problem affects relatively few of our customers as well, but we are still working on it with tremendous focus and engineering effort.

In the end, that’s how we believe you can trust us: we are building better solutions that will prevent related future trouble, and not putting half-assed “fixes” out there in cases where it wouldn’t serve you, or us, well.

You can make the decisions you’d like, for your business, based on that transparency, which we provide for exactly that reason: to let you make reasoned judgments about how to implement your service and build your business. If how we work is not up to your standards, we don’t want you to waste time or effort to use our systems.

Thanks for participating in the transparency process

platform-kit · April 20, 2021, 2:29am

Can you provide any transparency on how quickly this is being addressed? Seems like a major issue, to me.

Scott · April 29, 2021, 3:01pm

Hey @platform-kit,

Full disclosure – the team are blocked on this based on a larger set of works targeting our edge routing. We’re firing on all cylinders to get this work over the line but it’s a tall order and involves a lot of background services. Heck, even this change which may seem unrelated is a prerequisite; one which the team are working to implement soon.

Although I cannot promise a ‘date of completion’ and given the above constraint as an example, I do know that this work will not be completed prior to Q3 this year.

There’s always this strategy to implement cache-control headers in the interim but we’ll be sure to update y’all when we can.

spiralstack · December 21, 2021, 9:16pm

Are there any updates on this issue? Being able to cache differently based on query params would is close to a “must have” when using Netlify for newer frameworks like Remix. They use query params for getting data when doing client side routing for a SPA. Without being able to cache on query params, frameworks like Remix have to essentially not cache any HTML pages which isn’t as great of an experience for the site/app’s users.

Thanks!

hrishikesh · December 22, 2021, 5:46pm

Hi @spiralstack,

Netlify rolled out On-Demand Builders a few months ago now and that supports running the function once and caching it across our CDN. Mind giving that a try?

spiralstack · December 30, 2021, 8:17pm

Hi @hrishikesh,

According to On-Demand Builders documentation:

They don’t provide access to HTTP headers or query parameters from incoming requests.

So, they probably won’t solve for this use case. What I’m looking for is the ability to cache different html pages at the same “route”, but have different query parameters. Here is a simple example:

Imagine a page at https://www.example.com/shoes which shows a list of all types of shoes. Then there is another page at the same route but with a query parameter: https://www.example.com/shoes?brand=nike (notice the ?brand=nike) which only shows shoes made by Nike. Currently, Netlify’s CDN will cache both of those urls the same. So if a user goes to https://www.example.com/shoes first, then another user goes to https://www.example.com/shoes?brand=nike, then the second user will see all brands instead of just Nike. And the reverse is also true, if the first user goes to the https://www.example.com/shoes?brand=nike page and the second user goes to https://www.example.com/shoes, the second user won’t see all shoes, but only Nike shoes. That’s the issue.

luke · December 31, 2021, 6:40am

Hi, @spiralstack. I think I’m missing something here and so have some questions. Please know I’m only trying to understand this solution, not to criticize it.

First, some background: Netlify was designed with the Jamstack design philosophy in mind:

One of the design choices there is not to do server-side rendering of HTML at browse time. Instead, the HTML is build once during a site build process. After the site is deployed, the pre-built HTML is served as static content. When the HTTP request occurs, there is no server-side rendering - just a static file server sending the pre-built HTML.

However, if you have a site with many hundreds of thousands of pages, many of which are never visited between deploys, there can be a build time savings by not pre-building those pages. Why build 200K pages every deploy when typically only 1K high traffic URLS are typically viewed between deploys.

There is still an edge case of when one of the 199K rarely used pages is requested: there must be some browse time way to render the HTML. For those instances, the on-demand builder (ODB) function would render the HTML.

The URLs you are asking about don’t seem to be low traffic, but that could be because they are just examples and you are keeping things simple. For high traffic URLs, the jamstack approach would be to turn the GET data in into paths and then pre-build those pages to static HTML during the build itself.

It also becomes possible to use the ODB caching if the GET data is modified to be part of the path. For example, if you made a call to the ODB function for these three URLs, only the first request made would be cached:

https://www.example.com/shoes
https://www.example.com/shoes?brand=nike
https://www.example.com/shoes?brand=adidas

However, if you changes those URLs to this:

https://www.example.com/shoes
https://www.example.com/shoes/brand/nike
https://www.example.com/shoes/brand/adidas

then the ODB function would cache each one independently.

I don’t know if the Remix framework is dogmatic about how query data is parsed. Is changing the format of the URL as described above possible?

spiralstack · December 31, 2021, 9:47pm

Thanks for the detailed response @luke!

To answer your question about Remix, it does not appear as though there is an option to change the URL format. It uses React Router (created by the same people as Remix) under the hood – essentially making the site/app into a server side rendered SPA. When a link is clicked the client gets the data for the URL from something like https://www.example.com/articles?__data=routes/articles/index and then renders the data client side without a page refresh. And that’s where the issue stems from when URLs are served from the CDN instead of the function. Since instead of returning a JSON response from https://www.example.com/articles?__data=routes/articles/index, it returns the HTML document from https://www.example.com/articles which then causes the error.

It could be that Netlify’s core philosophy and Remix’s are simply incompatible. Although with Netlify’s On Demand Builders, it’s so close to being able to work. The only issue is that the CDN doesn’t take query params into account when caching (probably for good reasons). It would be amazing if there was an option to create something like a query param “allow list” in the netlify.toml file so one could specify “__data” somehow.

I know there’s been community discussion around this from Netlify’s developer advocates and I hope it raises internal discussion with y’all as well. There also seems to be some excitement around Remix from at least some Netlify folk (Let's Learn Remix! - YouTube).

Anyway, thanks again for the response. Netlify and Remix both have amazing developer experience and fingers-crossed that this kink can possibly be ironed out.

hrishikesh · January 1, 2022, 10:44am

They don’t allow you to access that information inside the function logic, but I believe they do work for the mentioned use case (in the sense, ODBs with different query strings are cached differently).

This change was made after the launch of ODBs, specifically to support Remix.

So in short, you can’t use the query parameter to do something differently inside the ODB, but you can rely on caching based on query params.

spiralstack · January 1, 2022, 4:15pm

Hey, thanks for this. I did give this a try and it doesn’t work as expected. The requests are getting cached, but the urls at something like https://www.example.com/articles?__data=routes/articles/index (which should return JSON) return the html document at that is cached from https://www.example.com/articles

hrishikesh · January 6, 2022, 6:19pm

Hey @spiralstack,

Sorry for the delay here. We’re trying to get more up to date information on this. Meanwhile, would it be possible for you to share the actual site URL so we can try to gather more info?

hrishikesh · January 6, 2022, 7:42pm

Update, I checked and turns out I was wrong. The documentation that I was referring to was a result of a discussion and was pretty outdated. As per the latest documentation, yes, this is not supported. Sorry to have made you spend extra time on this, but we don’t know if/when the support for this will be added.

ryanflorence · February 4, 2022, 10:38pm

URLSearchParams are not the same as parsed URL path segments. Paths are hierarchy, URLSearchParams are cross-cutting concerns on any path.

On document requests in Remix you get HTML, on client side transitions, Remix calls the same route path (so cookie “path” works as expected) with a search param to know to return JSON instead of HTML (cross cutting concern at any path).

Remix will always expect a hosting platform to handle URLSearchParams in a standard way. I wouldn’t call this dogmatic . Every CDN–except Netlify–does this correctly. In fact, even every browser handles this correctly. It’s standard HTTP caching that Netlify is currently doing wrong.

From RFC 7234:

The primary cache key consists of the request method and target URI.

URIs include URLSearchParams. From RFC 3986

URI = scheme “:” hier-part [ “?” query ] [ “#” fragment ]

I’m happy to hear you’re working hard on this