Netlify Function with query strings ignores custom Cache-Control header

I’m switching from Firebase functions to Netlify and this is a bit of an issue. I would have thought it very common to want a single function to return different output depending on query string based input (and to have that response cached, at least a little). What is the intent behind stripping query string? Could you provide more detail about what currently is part of the cache key so maybe we could work around it somehow?

Hey @drzax,

The problem is that we can’t determine what should and shouldn’t be cached. If we do cache function responses with query string params, some customers may be using this to hold sensitive/customer PII which shouldn’t be relayed to other customers, for example.

Did you wanna delve in a bit on your use case? Maybe there’s something we can do.

Hi @Scott ,

I’m not following your use-case.

Why would a customer using a function with query string containing sensitive information be caching response in the first place?

If a user did include sensitive information in the query string and enabled caching, wouldn’t subsequent users see the exact same response as the first user regardless of query string? and the sensitive data passed in the query string wouldn’t even hit the function because the response is cached?

If I am missing a user-case where sensitive information is passed in the query string and cache is enabled, please kindly reply below~

p.s. A good example cache key from CloudFlare is:
${header:origin}::${scheme}://${host_header}${uri_iqs}

Thanks!

1 Like

I don’t think anyone expects it to work the way it does with Netlify functions when caching is turned on. With cache-control, you effectively turn off the ability to use query string parameters for functions because the response cannot be relied upon.

Would you expect, for example, a function that returns product details to return the same thing, regardless of the item you’re requesting while also relying on cache?

/.netlify/functions/product-details?pid=1
/.netlify/functions/product-details?pid=2

I would expect to be able to cache these responses individually, not simply as “product-details”.

I might be wrong since I have not tried using the function caching yet (due to this thread I found, I figured I will postpone it until it is resolved), but in the case @spencewood highlighted above, applied to the example of @Scott, a request of …/sensitive?id={UUID} if cached could return sensitive information for other users if a response for a previous UUID is returned for a new UUID:

.../sensitive?id=ACTA4 returns { birthday: "1.1.1970", name: "Jones", id: "ACTA4" }
.../sensitive?id=TIRK8 returns { birthday: "1.1.1970", name: "Jones", id: "ACTA4" }

Or would caching in this case not work? Or is it up to the developer to “not do this”?

Parameter caching would tie the cache to the UUID and thus prevent the very thing not implementing parameter caching is supposed to prevent, no?

Sorry all, think I may have been crossing my wires with this topic when I replied last! This related to redirects, functions and query string params.

Today, our cache key uses the raw path without query params (as this is optimised for static assets). The cache is only invalidated on a new deploy and not for subsequent requests with unique params. But, it’s a talking point internally so your insight, feedback and ideas are greatly appreciated.

You can set the cache-control header like this however, given the nature of Lambdas being stateless, this could be dropped earlier than the advertised time due to a cold start.

I guess if the cache is optimized for static assets without query strings, one possibility to allow caching of lambda query param responses would be to make it opt-in using a custom header or something like that instead. That way the static asset caching would not be influenced, while the developers would have the choice to opt-in to this query param caching behaviour.

I think even with a shorter than-advertised amount of caching time, this might be very beneficial. If the request can be cached and a lot of traffic hits the function unexpectedly at once, the cache could respond instead of running the function. Less lambda calls for Netlify, less execution time for developers. Win win ^^

As it is right now, if one uses query params, they cannot use caching, as it might return an incorrect response - which in a lot of cases is more destructive than returning nothing at all.

I don’t think this is a good default since it’ll trip up more people than it would help. Caching based on pathname should be opt-in imho. If you’re worried about breaking backwards compatibility one option might be to add a cacheKey option to the return value of the function and cache based on that (and default to current behavior).

I would also like to see an option to not invalidate cache on deploy (for obvious reasons).

– my two cents

@snorkypie,

Appreciate the feedback and I’ve added it to our internal feature tracker!

For what it’s worth Vercel does this flawlessly. I have a fairly expensive function which takes a screenshot with some inputs specified as query params. The code blow does exactly what you’d expect.

const file = await getScreenshot(config);

res.statusCode = 200;
res.setHeader("Content-Type", `image/${config.fileType}`);
res.setHeader(
  "Cache-Control",
  `public, s-maxage=${+config.ttl || ttl}, max-age=${+config.ttl || ttl}`
);
res.end(file);

I’m pleased it’s being discussed internally. I hope a solution can be found. Caching of function results is a fundamentally different thing to caching static assets. It makes some sense to exclude query params the cache key on static assets because it probably shouldn’t be possible to bust the cache by requesting https://example.com/huge-image.png?bust=<new guid>. But functions are supposed to return something different depending on inputs.

1 Like

hey there @drzax - just letting you know we haven’t forgotten about you, and we are still thinking on this. More soon! thanks for your patience.

1 Like

Wow, 10 months since this nasty bug (yes, it’s a bug!) was reported and Netlify still hasn’t fixed it.

Netlify’s UX is nicer than Vercel’s and other competitors, the speed is great, and developing is a breeze. That said, if a simple, primary error like that isn’t fixed in 10 months (and it seems that Netlify support engineers don’t even treat it as a major bug as they should), how can we trust Netlify for scale?

Hi @buzinas and thanks for that feedback!

It’s a complicated situation to fix for us based on how we cache things and our desire not to break workflows already in place for our millions of customers - and for which it turns out that we are building entirely new CDN components. Not a quick process, and not one we can accelerate and have good, reliable results.

I understand that from your point of view the error is “simple”. The fix is not, and we are focusing on a better fix than we could have done in any shorter timeframe. This problem affects relatively few of our customers as well, but we are still working on it with tremendous focus and engineering effort.

In the end, that’s how we believe you can trust us: we are building better solutions that will prevent related future trouble, and not putting half-assed “fixes” out there in cases where it wouldn’t serve you, or us, well.

You can make the decisions you’d like, for your business, based on that transparency, which we provide for exactly that reason: to let you make reasoned judgments about how to implement your service and build your business. If how we work is not up to your standards, we don’t want you to waste time or effort to use our systems.

Thanks for participating in the transparency process :slight_smile:

2 Likes

Can you provide any transparency on how quickly this is being addressed? Seems like a major issue, to me.

Hey @platform-kit,

Full disclosure – the team are blocked on this based on a larger set of works targeting our edge routing. We’re firing on all cylinders to get this work over the line but it’s a tall order and involves a lot of background services. Heck, even this change which may seem unrelated is a prerequisite; one which the team are working to implement soon.

Although I cannot promise a ‘date of completion’ and given the above constraint as an example, I do know that this work will not be completed prior to Q3 this year.

There’s always this strategy to implement cache-control headers in the interim but we’ll be sure to update y’all when we can.

Are there any updates on this issue? Being able to cache differently based on query params would is close to a “must have” when using Netlify for newer frameworks like Remix. They use query params for getting data when doing client side routing for a SPA. Without being able to cache on query params, frameworks like Remix have to essentially not cache any HTML pages which isn’t as great of an experience for the site/app’s users.

Thanks!

1 Like

Hi @spiralstack,

Netlify rolled out On-Demand Builders a few months ago now and that supports running the function once and caching it across our CDN. Mind giving that a try?

Hi @hrishikesh,

According to On-Demand Builders documentation:

They don’t provide access to HTTP headers or query parameters from incoming requests.

So, they probably won’t solve for this use case. What I’m looking for is the ability to cache different html pages at the same “route”, but have different query parameters. Here is a simple example:

Imagine a page at https://www.example.com/shoes which shows a list of all types of shoes. Then there is another page at the same route but with a query parameter: https://www.example.com/shoes?brand=nike (notice the ?brand=nike) which only shows shoes made by Nike. Currently, Netlify’s CDN will cache both of those urls the same. So if a user goes to https://www.example.com/shoes first, then another user goes to https://www.example.com/shoes?brand=nike, then the second user will see all brands instead of just Nike. And the reverse is also true, if the first user goes to the https://www.example.com/shoes?brand=nike page and the second user goes to https://www.example.com/shoes, the second user won’t see all shoes, but only Nike shoes. That’s the issue.

Hi, @spiralstack. I think I’m missing something here and so have some questions. Please know I’m only trying to understand this solution, not to criticize it.

First, some background: Netlify was designed with the Jamstack design philosophy in mind:

One of the design choices there is not to do server-side rendering of HTML at browse time. Instead, the HTML is build once during a site build process. After the site is deployed, the pre-built HTML is served as static content. When the HTTP request occurs, there is no server-side rendering - just a static file server sending the pre-built HTML.

However, if you have a site with many hundreds of thousands of pages, many of which are never visited between deploys, there can be a build time savings by not pre-building those pages. Why build 200K pages every deploy when typically only 1K high traffic URLS are typically viewed between deploys.

There is still an edge case of when one of the 199K rarely used pages is requested: there must be some browse time way to render the HTML. For those instances, the on-demand builder (ODB) function would render the HTML.

The URLs you are asking about don’t seem to be low traffic, but that could be because they are just examples and you are keeping things simple. For high traffic URLs, the jamstack approach would be to turn the GET data in into paths and then pre-build those pages to static HTML during the build itself.

It also becomes possible to use the ODB caching if the GET data is modified to be part of the path. For example, if you made a call to the ODB function for these three URLs, only the first request made would be cached:

https://www.example.com/shoes
https://www.example.com/shoes?brand=nike
https://www.example.com/shoes?brand=adidas

However, if you changes those URLs to this:

https://www.example.com/shoes
https://www.example.com/shoes/brand/nike
https://www.example.com/shoes/brand/adidas

then the ODB function would cache each one independently.

I don’t know if the Remix framework is dogmatic about how query data is parsed. Is changing the format of the URL as described above possible?

1 Like

Thanks for the detailed response @luke!

To answer your question about Remix, it does not appear as though there is an option to change the URL format. It uses React Router (created by the same people as Remix) under the hood – essentially making the site/app into a server side rendered SPA. When a link is clicked the client gets the data for the URL from something like https://www.example.com/articles?__data=routes/articles/index and then renders the data client side without a page refresh. And that’s where the issue stems from when URLs are served from the CDN instead of the function. Since instead of returning a JSON response from https://www.example.com/articles?__data=routes/articles/index, it returns the HTML document from https://www.example.com/articles which then causes the error.

It could be that Netlify’s core philosophy and Remix’s are simply incompatible. Although with Netlify’s On Demand Builders, it’s so close to being able to work. The only issue is that the CDN doesn’t take query params into account when caching (probably for good reasons). It would be amazing if there was an option to create something like a query param “allow list” in the netlify.toml file so one could specify “__data” somehow.

I know there’s been community discussion around this from Netlify’s developer advocates and I hope it raises internal discussion with y’all as well. There also seems to be some excitement around Remix from at least some Netlify folk (Let's Learn Remix! - YouTube).

Anyway, thanks again for the response. Netlify and Remix both have amazing developer experience and fingers-crossed that this kink can possibly be ironed out.

1 Like