Context
I’m archiving an old PHP website, and turning it into a static site. I want all the URLs (some-page.php?someParam=this&otherParam=2
etc) to remain exactly as they were, so that old links and bookmarks etc all still work.
Attempted solution
I spidered the whole live site with wget --recursive --adjust-extension --restrict-file-names=windows
which gave me the static files I want to serve.
The --adjust-extension
part adds .html
at the end of filenames which didn’t already have that, so now I have for example index.php.html
.
The --restrict-file-names=windows
alters filenames in a few ways, the most important of which is that it replaces ?
with @
. This is needed since Netlify doesn’t let me deploy files with ?
in the name.
But I want the old URLs to still work, so I’ve got a _redirects
file.
To serve as an example, there were the legacy URLs
/photos.php
/photos.php?group=0
/photos.php?group=0&pic=0
/photos.php?group=0&pic=1
/photos.php?group=1
/photos.php?group=1&pic=0
/photos.php?group=1&pic=1
These are saved as
/photos.php.html
/photos.php@group=0.html
/photos.php@group=0&pic=0.html
and so on.
So having read the documentation, I have things like this in my _redirects
:
/photos.php group=:g pic=:p /photos.php@group=:g&pic=:p.html 200
/photos.php group=:g /photos.php@group=:g.html 200
/photos.php /photos.php.html 200
Unfortunately this is not behaving as I expect. For all of the legacy URLs listed above I’m getting /photos.php.html
served up to me.
Test case
I noticed that other similar rewrites were working just fine. I boiled it down and down until I found the one difference. I found that in the above case if I just remove the /photos.php.html
file, all the rewrites on the other /photos.php*.html
files suddenly work just fine.
I ended up with this test case: https://chipper-platypus-8de0b0.netlify.app/
Source code here: GitHub - tremby/netlify-redirect-test
In this test case, the links menu is the same on every page. There are two sets of URLs: test1 and test2. There are identical redirects written for both sets:
/test1.php x=:x y=:y /test1.php@x=:x&y=:y.html 200
/test1.php x=:x /test1.php@x=:x.html 200
/test1.php /test1.php.html 200
/test2.php x=:x y=:y /test2.php@x=:x&y=:y.html 200
/test2.php x=:x /test2.php@x=:x.html 200
/test2.php /test2.php.html 200
(That last line exists just to show that it’s not the redirect line itself which is the problem; read on…)
Then these files exist:
test1.php.html
test1.php@x=0.html
test1.php@x=0&y=1.html
test1.php@x=0&y=2.html
test2.php@x=0.html
test2.php@x=0&y=1.html
test2.php@x=0&y=2.html
(There’s also an index.html, just to serve as an entry point.)
Note that they both have a similar set files except there’s no test2.php.html
.
When clicking through the links, you’ll find that all the test2 links work just fine: each one loads its own corresponding file. But the test1 links do not work as expected: each one loads test1.php.html
and the other test1.php*.html
files are never served.
This strikes me as a bug. I think there is possibly some (undocumented?) “magic” happening to do with the presence of a file named as the requested URL with the query string stripped, which is affecting how the rewriting logic works.
Workaround
Working from that assumption I experimented and found that if I rename my “query-string-free” file test1.php.html
to test1.php@.html
, and adjust the rewrite rule accordingly, things work fine. This is what I’ll do for now.