Disabled Pretty URLS but .html-less links continue to work and be indexed by search engines

I have disabled Pretty URLs for unruffled-sinoussi-fbd7d8.netlify.app (primary domain is tinyapps.org), but .html-less continue to work and be indexed by search engines, causing duplicates to appear in search results, e.g., this Kagi Search returns two results for the same page, one with the .html extension and one without. Is it possible to return a 404 or 301 for such pretty links? If not, what is the best way to handle this while maintaining normal .html extensions? Thank you.

You can add a canonical tag: rel=canonical: the ultimate guide to canonical URLs • Yoast

Thank you for your kind reply, hrishikesh. Is there not a way to prevent the pretty URLs from being generated? I’ve disabled the feature, and yet the links continue to exist. Or could a general 301 be set up in _redirects that adds an .html extension to any file without an extension?

You can use Edge Functions to handle this:

but not with simple _redirects.

Thanks, @hrishikesh - I’ll look into those. Can you please tell me why there is even an option to enable or disable Pretty URLs (Site Configuration > Build & deploy > Post processing > Pretty URLs) when disabling them apparently does nothing?

I think you may be confused about what that option does.

I think that turning it off does not prevent us from serving:

https://site.com/file

when you publish https://site.com/file.html

leaving it off just does not redirect https://site.com/file.html to https://site.com/file

So: turn it off, but don’t publish both URL’s and indexers won’t find them both and you won’t have any SEO penalty, which is most folks’ concern from having both work (or use the rel=canonical trick)

Thanks for taking the time to reply here, @fool. Sadly, the previously-unknown-to-me Pretty URL feature must have been on for some time, as search engines have apparently indexed the .HTML-less links as well. It would be swell if those links did not exist at all (or at least could be redirected via regex to their .HTML originals); I try conserving every byte, and even the extra ~100 bytes per page with a tag that should not be necessary seems wasteful. But I certainly understand that my position is somewhat of an edge case (to put it mildly ;-).

To add to the offense, adding an explicit redirect to _redirects from /file to /file.html does not work, /file.html will still be served with HTTP 200 OK.

I wonder if that is a bug or a feature. Does not seem to be documented anywhere.

Hi, @salomvary. How is something working exactly as designed and documented an “offense”? (I’m genuinely curious why you feel this way.)

It is also not true that such redirects do not work. It definitely is possible to redirect from a URL ending in .html to one without it using redirect rules.

Please note, as a file for the path will exist, the rule must be “forced”. This is done using force = true if using the netlify.toml format or with a ! character after the status code using the _redirects file format (example below):

/path.html  /path   301!

If you make the rule without the ! after 301 as shown below, it will never trigger:

/path.html  /path   301

Similarly, omitting the status code defaults to an unforced 301 (which, again, will fail to work because the redirect must be forced when a file exists for the path in question).

This “must be forced” behavior is covered in the file shadowing section of the redirects documentation here:

https://docs.netlify.com/routing/redirects/rewrites-proxies/#shadowing

@luke The original “offense” was enabling the Pretty URLs feature for existing websites where it created potentially a large amount of duplicate content (from the SEO perspective), unless the website was already using canonical URLs. Opting everyone in to this feature was a heavy-handed move, and a mistake on Netlify’s side in my opinion, hence the term “offense”.

I was not aware of the “forced” rule option, although the exact nature of Pretty URLs does not seem to be documented. It was far from obvious for me that it creates a “file”, it could have also created some sort of rewrite rule on the server without an actual file existing.

Anyway, thanks for clarifying how to make these redirects work. My bad not reading through the redirect docs carefully enough.

Pretty URLs does not create any file. It is indeed a server-side rule.