Not understanding Netlify's removal of .htm

I am trying to sort out how Netlify determines removal of .htm from URLs.

From what I have read Netlify will take a request for page “myfirstpage.htm” and return it as “myfirstpage”. But how far does this go?

I looked at one of our pages today served from Netlify and it seems that the .htm was removed from all URLs in <a href=" "> links in the actual html code served. BUT the .htm was not removed from the “canonical” link in the HEAD nor from the LDJson schema section. Also sitemap.xml file was not altered so showing all URLs with .htm. Is this the way it is supposed to work?

I have gotten messages from Google Search Console listing as many “Page with redirect” and “Alternate page with proper canonical tag” as I have pages on our website. And I am wondering if this is the cause of those?

I could rebuild all pages without the .htm and go from there, but how would that affect the pages already indexed by Google or Bing? Would I need to add 301 redirects from the .htm version to the non-.htm version?

Our “build” is done using zip file upload, the actual pages created with MODX.

Call me confused.

@mrcycling Here’s the Support Guide for ‘Pretty URLs’, give it a read and see if it answers your questions:

https://answers.netlify.com/t/support-guide-how-can-i-alter-trailing-slash-behaviour-in-my-urls-will-enabling-pretty-urls-help/31191

That really didn’t answer my query, but thanks.

All of my pages are named using .htm ie: myfirstpage.htm and the majority are at root level ie: mydomain.com/myfirstpage.htm. Only a few pages reside in a folder ie: mydomain.com/learn/myfirstblogpost.htm.

But when I look at the problematic URLs listed on Google Search Console, I see URLs such as:
mysecondpage.htm/
mythirdpage.htm.htm
myfourthpage

As mentioned I am uploading a zip folder of static webpage files, so my “build” has no affect and I don’t see an option for “pretty URLs” on my Netlify admin pages.

This one in particular is odd:

I don’t actually host with Netlify, so haven’t been in their UI in some time, and don’t actually know if they still have the “Pretty URL” system, but from what I remember it would remove .html and .htm from URL’s (as you mentioned in your first post).

So if you had a link to /mypage.htm or /mypage.html the post-processing (after build or upload) would rewrite the file so the link would literally change to /mypage.

It never adds .htm or .html

I presume you’ve searched through your static output files to see if there are any weird links or double extensions in them?

I looked at both the static files created for uploading to Netlify, as well as the files saved on Netlify for all the pages mentioned on Google with either ‘.htm/’ or with ‘.htm.htm’ and they all just have ‘.htm’

It is perplexing. We have been using the MODX->static->netlify.zip method for quite a few years without issue.

@mrcycling I do know from past experience that the ‘Pretty URLs’ behaviour wouldn’t change these…

It only changes the actual links in html files, not links or link fragments in other document types or JS.

So if you had…
A file called myfirstpage.htm
A link to /myfirstpage.htm
A canonical reference to /myfirstpage.htm
A JSON-LD reference to /myfirstpage.htm
A sitemap.xml reference to /myfirstpage.htm

You would end up with…

A file called myfirstpage.htm
A link to /myfirstpage
A canonical reference to /myfirstpage.htm
A JSON-LD reference to /myfirstpage.htm
A sitemap.xml reference to /myfirstpage.htm

Netlify would serve the myfirstpage.htm file contents with a 200 on both URLs.
So requesting either /myfirstpage or /myfirstpage.htm would show the page.

Ideally (to have clean URLs) you would be outputting all references to the page without .htm

If you do want it to have .htm, and for that to be your canonical I believe you need to use Edge Functions to do the redirect, (Netlify’s built in redirect system can’t do it), as per discussion/links here:

Thanks. I was thinking about going the clean route and change everything over to no .htm

But from the SEO side of the equation, would that mandate 301 redirects from every former .htm version of a page to new non .htm version. Or would ‘.htm’ vs ‘no .htm’ have no affect on the ‘Google juice’ from previously indexed pages and backlinks pointing to the .htm versions?

@mrcycling I’m not the right person to ask SEO questions of, I’d suggest googling around, asking an AI, or asking somewhere frequented by “SEO Experts”.

You could also try reading through Google’s related documentation here:
https://developers.google.com/search/docs/crawling-indexing/canonicalization

If you find you do need to implement a redirect, and cannot do it with Netlify’s built in _redirects system, then as previously mentioned you could use Edge Functions to have precise control.