Netlify won't return 304 from function?

I have a lambda function that creates a PDF from url. (See the source code at the bottom of this post).

It should be smart enough to use netlify’s caching system. If a request comes in with an etag in the “if-none-equals”, that etag is compared with the current etag of the source document. If the etags are the same, a status code of 304 is returned with an empty body. Otherwise, the PDF is generated and a 200 is returned. For example, here are the header responses (locally) on 304 with a curl:

HTTP/1.1 304 Not Modified

X-Powered-By: Express

etag: “69d78a182422efb388bd0d67be027c43-ssl”

cache-control: public, max-age=0, must-revalidate

age: 24484

x-nf-request-id: 01FPXBZZ66SQ7WZ6TV5E8DNTSW

Date: Tue, 14 Dec 2021 21:01:24 GMT

Connection: keep-alive

Keep-Alive: timeout=5

That’s how the code is written and how it runs locally on the netlify dev server. Great.

Once deployed, however, the code always returns 200, regardless of the etags. See for example this request x-nf-request-id: 01FPXBGJB2E78VY1J2T7FBJG4J. So, why won’t 304 work? For reference, here is what the response looks like:

HTTP/2 200

cache-control: public, max-age=0, must-revalidate

content-disposition: filename=“PSprints-ets-keathley-gae.pdf”

content-type: application/pdf

link: https://peacefulscience.org/prints/ets-keathley-gae; rel=“canonical”

server: Netlify

x-nf-request-id: 01FPXCC8B7337M3BV1VGT5GPBD

content-length: 1363195

date: Tue, 14 Dec 2021 21:08:06 GMT

age: 21

etag: “a502404ce6b7e8c9ec44d7a66c330b40-ssl”

The curl request was:

curl https://peacefulscience.org/pdf/prints/ets-keathley-gae/ -H ‘if-none-equals: “c613561eba2f7733b4b1deeba3fcb149-ssl”’ -I

The 200 response includes a body with the pdf, and the age increases with time, so it seems that netlify is caching and serving the result rather than responding with a 304. Is this intended behavior? It seems that this is impacting performance on the client side, by requiring re-downloads of files when it is unneeded.

Is there a way to fix this?

Any thoughts on this?

Hi @swamidass

I started writing this response a day or two ago but got sidetracked…

As per Can Netlify Functions share cache/memory? - #2 by jonsully

AWS Lambda functions are made to not persist memory across runs.

and Netlify Function response caching - #3 by hrishikesh

Function responses are not cached in the CDN, so the requests that you’re seeing are all uncached.

Therefore a function cannot (as per my understanding) return a 304 as there is nothing for it to compare against if the function itself is generating the returned PDF.

I’m not caching anything between runs. There is no persistent memory.

Instead, I’m checking to see if the etag in the source html used to generate the PDF matches the requested etag. If it does, then the cached data is up to date.

Here is the key code:


    return axios.head(url, headers=headers)
      .then(res => {    

        if (headers["if-none-match"]?.includes?.(res.headers['etag']) )  {
          return {
            statusCode: 304,
            headers: {
              "etag": res.headers['etag'],
              "cache-control":  res.headers['cache-control'],
              "age":  res.headers['age'],
              "x-nf-request-id": res.headers['x-nf-request-id'],
            }
          }
        }

        
        return prince(url)
          .catch( error => {return {statusCode: 500, body: util.inspect(error)}})
          .then(response => {
            if (response.statusCode === 200)
              response["headers"] = {
                "Content-Disposition": `filename="${name}"`,
                "Cache-Control": res.headers['cache-control'],
                "content-type": "application/pdf",
                "Link": `<${canurl}>; rel="canonical"`,
                "Etag": res.headers["etag"],
                "age":  res.headers['age'],
                "x-nf-request-id": res.headers['x-nf-request-id']
               }
            return response;
           })
      })
      .catch (error => {return {statusCode: error.response.status }})
      .then(r => {console.info( `${ r.statusCode }:\t${ url }\t${JSON.stringify(event.headers)}`); return r;} )

Note that the “url” variable is the source html url. So I am checking the headers of that URL to get the etag, and comparing with the request header:

headers[“if-none-match”]?.includes?.(res.headers[‘etag’])

If that’s true, then I don’t need to generate the PDF, so I return a 304.

No cache or persistence at all in my function.

Greetings @swamidass :wave:

I’ve noted your responses across a few different threads in the last couple of weeks but figured I’d hop in on this one since @coelmay tagged one of my other threads :+1:

When I first read your OP on this thread my idea was essentially “Sure, you can setup 200/304 (HTTP caching via etags) but you’re going to have to do it yourself” along with “if you’re generating the PDF on-the-fly you’ll either need to store it somewhere external or have some form of oracle to validate whether the browser’s copy is still fresh given the transient nature of Functions”. It appears you’ve done both of those things. Neat! :slight_smile:

If I understand your code and workflow correctly, you are indeed receiving the Function request and dispatching a 200 (with a regenerated PDF) or a 304 based on whether the incoming etag is valid against your oracle — which in this case sounds like is some kind of HTML document that you presumably populate and collate into a PDF. I think this is a perfectly fine setup and pushing the caching directly to the browser like this is neat.

I think where you may be running into trouble

Is that indeed, your Function response is being cached at the Netlify level rather than being run at all (to determine whether a 200/regen or 304 should be returned). I’m not sure if it’s outlined in the docs anywhere but essentially the idea is that if you use the cache-control header, Netlify will effectively interject in future requests having ‘learned’ the cache pattern for that particular URL (query strings are ignored, FYI). See here (and here).

So the way to ‘fix’ this would be to not use the cache-control header while still implementing etags — granted, I don’t know how proper or correct that is, I haven’t worked with cache headers in a little while and would need to read up on the spec for that… but hopefully this sheds some light on your issue.

Hope that helps!


Jon

2 Likes

Thanks for thinking about this with me!

Turns out that this is user error. I.e. my error. I was testing with a curl command and “if-none-EQUALS” (a non existent header) instead of “if-none-match”. So sorry about that.

Problem is not totally solved though. I’m using netlify’s cache policy:

cache-control: public, max-age=0, must-revalidate

What is puzzling to me is that the browser (Chrome in this case) never sends “if-none-match” headers, even if it has just downloaded the PDF. Why is that? And if that’s happening with Chrome on the functions, how exactly does caching work for netlify? Perhaps it doesn’t work as advertized?

Okay even some more information to share, in case someone else has a similar problem. Turns out that netlify modified the etags here, adding a “df”-tag because it was compressed. This might be related to this past topic (Excessive Bandwidth Usage - #17 by luke).

To explain further.

On initial request, my function returns with pdf and an etag like this;

f1bdb37a0c7bb8e30e745721c9d36d38-ssl

It seems that netlify decides to compress it, and then changes the etag to:

f1bdb37a0c7bb8e30e745721c9d36d38-ssl-df

Now, the browser/cach send an IF-NONE-MATCH header with that etag. But my script isn’t smart enough to realize the equivalence (yet). All I need to do is change the logic here

And I suppose I have to figure out what etag the function should give in the response. Probably the same as the one in the reasponse…

1 Like

Hey there @swamidass :wave:

Sorry for the delay here. Are you still experiencing obstacles here? If so, can you please share what you have completed in the last five days since you last wrote in?

Additionally, your site name and function name would be beneficial if you are still encountering obstacles and would like us to look into this further.

Solved the problem and got it working.

Then I discovered it is far easier to implement and far more performant with netlify on-demand builders:

That works great in the end, but it also indicates that netlify isn’t really using cache headers returned from functions in a way that is optimized. At least the builders are a work around.

I linked in the OP to the source code of the function in the repo.

1 Like