Possible bug in _redirects behaviour

Context

I’m archiving an old PHP website, and turning it into a static site. I want all the URLs (some-page.php?someParam=this&otherParam=2 etc) to remain exactly as they were, so that old links and bookmarks etc all still work.

Attempted solution

I spidered the whole live site with wget --recursive --adjust-extension --restrict-file-names=windows which gave me the static files I want to serve.

The --adjust-extension part adds .html at the end of filenames which didn’t already have that, so now I have for example index.php.html.

The --restrict-file-names=windows alters filenames in a few ways, the most important of which is that it replaces ? with @. This is needed since Netlify doesn’t let me deploy files with ? in the name.

But I want the old URLs to still work, so I’ve got a _redirects file.

To serve as an example, there were the legacy URLs

  • /photos.php
  • /photos.php?group=0
  • /photos.php?group=0&pic=0
  • /photos.php?group=0&pic=1
  • /photos.php?group=1
  • /photos.php?group=1&pic=0
  • /photos.php?group=1&pic=1

These are saved as

  • /photos.php.html
  • /photos.php@group=0.html
  • /photos.php@group=0&pic=0.html

and so on.

So having read the documentation, I have things like this in my _redirects:

/photos.php group=:g pic=:p /photos.php@group=:g&pic=:p.html 200
/photos.php group=:g /photos.php@group=:g.html 200
/photos.php /photos.php.html 200

Unfortunately this is not behaving as I expect. For all of the legacy URLs listed above I’m getting /photos.php.html served up to me.

Test case

I noticed that other similar rewrites were working just fine. I boiled it down and down until I found the one difference. I found that in the above case if I just remove the /photos.php.html file, all the rewrites on the other /photos.php*.html files suddenly work just fine.

I ended up with this test case: https://chipper-platypus-8de0b0.netlify.app/

Source code here: GitHub - tremby/netlify-redirect-test

In this test case, the links menu is the same on every page. There are two sets of URLs: test1 and test2. There are identical redirects written for both sets:

/test1.php x=:x y=:y /test1.php@x=:x&y=:y.html 200
/test1.php x=:x /test1.php@x=:x.html 200
/test1.php /test1.php.html 200

/test2.php x=:x y=:y /test2.php@x=:x&y=:y.html 200
/test2.php x=:x /test2.php@x=:x.html 200
/test2.php /test2.php.html 200

(That last line exists just to show that it’s not the redirect line itself which is the problem; read on…)

Then these files exist:

test1.php.html
test1.php@x=0.html
test1.php@x=0&y=1.html
test1.php@x=0&y=2.html
test2.php@x=0.html
test2.php@x=0&y=1.html
test2.php@x=0&y=2.html

(There’s also an index.html, just to serve as an entry point.)

Note that they both have a similar set files except there’s no test2.php.html.

When clicking through the links, you’ll find that all the test2 links work just fine: each one loads its own corresponding file. But the test1 links do not work as expected: each one loads test1.php.html and the other test1.php*.html files are never served.

This strikes me as a bug. I think there is possibly some (undocumented?) “magic” happening to do with the presence of a file named as the requested URL with the query string stripped, which is affecting how the rewriting logic works.

Workaround

Working from that assumption I experimented and found that if I rename my “query-string-free” file test1.php.html to test1.php@.html, and adjust the rewrite rule accordingly, things work fine. This is what I’ll do for now.

Did you try forcing the redirects like: 200! instead of 200?

I didn’t know about the exclamation mark. I now see it’s documented on this other documentation page.

I tried adding it to the rule lines which have query parameters and yes, that does the job.

Example which works the way I want it to:

/test1.php x=:x y=:y /test1.php@x=:x&y=:y.html 200!
/test1.php x=:x /test1.php@x=:x.html 200!
/test1.php /test1.php.html 200

/test2.php x=:x y=:y /test2.php@x=:x&y=:y.html 200!
/test2.php x=:x /test2.php@x=:x.html 200!
/test2.php /test2.php.html 200

The docs say

By default, you can’t shadow a URL that actually exists within the site. This applies to rewrites using a splat or dynamic path segment as well as rewrites for individual routes or files. This means that even if you’ve setup the following rewrite rule:

/*   /index.html   200

The path /partials/chat.html would still render the contents of that file, if that file actually exists. This tends to be the preferred behavior when setting up rewrite rules for single page apps, etc.

I don’t immediately see how it actually applies to my example, however, since there is no /test1.php or /test2.php file. The only things I can see pointing to those files are the rewrite rules, and earlier rules are supposed to have higher priority than later rules.

There must be other magic going on. When /uri is requested is it additionally looking for a file /uri.html and lumping that into the “does this URL exist as a file” logic too? That’s my guess, though I can’t find it documented.

It seems because of the same magic the final rule of each set isn’t necessary either, so it can be just

/test1.php x=:x y=:y /test1.php@x=:x&y=:y.html 200!
/test1.php x=:x /test1.php@x=:x.html 200!

/test2.php x=:x y=:y /test2.php@x=:x&y=:y.html 200!
/test2.php x=:x /test2.php@x=:x.html 200!

Yeah, that’s weird. We’ve asked the devs to confirm if that’s expected.

If you don’t specify an extension, then yes, we look for .html too.

Just confirmed with the devs, we always check for .html and /index.html versions of the string. So, in your case of test1.php, we were checking for test1.php.html and test1.php/index.html.