Html files without a file name suffix are served as content-type: text/plain

Netlify is serving most of my html files as content-type: text/plain. [example] The only ones getting a correct content-type are those named with the suffix “.html”. This seems rather silly, since the files are all easily identified by their <!DOCTYPE html> header.

I assume I could avoid the problem if I wanted to adopt a Microsoft-style file naming convention and expose it in my URLs, but I have no interest in that.

I figured out how to set custom headers by fiddling about with toml files and wildcard path matching whenever my site’s directory structure changes, but that leads to a life of hassles that seem like they shouldn’t be necessary.

Is there some way to make Netlify be smarter about the content-type it serves by default, at least for the extremely common case of html5 files? If not today, would the dev team consider it as a new feature?

Why is it you don’t want to add .html to your files?

Have you looked at Post processing | Netlify Docs which includes Asset Optimisation / Pretty URLs (e.g. about.html -> /about)?

Because I dislike needlessly noisy file naming conventions left over from the 1970s, I dislike exposing implementation details, and I dislike ugly URLs.

It looks like the pretty_urls option could make requests for /filename get the content of /filename.html, but that doesn’t solve the problem, because it would still require me to clutter my file names. (It would merely hide the clutter from site visitors.)

…which brings me back to my goal as stated: Getting Netlify to recognize html files without depending on a file name suffix.

It occurs to me that this could also be useful to people migrating legacy sites (whose html page names might end in .asp or .php) to static site generation, while preserving their old URLs.

@inx Just out of curiosity, how would your proposed arrangement handle the following situation?

You have an important speech you want to publish on your website, so you create an HTML file called filename. However, you feel it is really critical that visitors have access in as many different ways as possible, so you also have a MarkDown version called filename, and a text version called filename, and a PDF version called filename.

For proper attribution, you have images of the presenter of this speech. You have a JPG image called filename, a PNG image called filename, a GIF image called filename, a WEBP image called filename, and even a SVG image called filename.

Aside from the fact that I don’t think there has ever been an OS that would allow files with duplicate names like this, how would do handle linking to these different assets, given that only one of the has a DOCTYPE declaration?

1 Like

This issue is about correctly reporting a file’s content type, automatically, without requiring its name to embed metadata. It is not about anyone adopting new naming conventions. Let’s stay on topic, please.

(Edited for clarity. I encourage anyone puzzled by this to get some experience with different operating systems and desktop environments that do and have done this over the decades, and for programmers, to get familiar with libmagic.)

@inx I must confess that after 30 years of building web pages using the HTML extension – and loving it – I’m not certain what the topic is here. File extensions such as HTML are a simple and effective way of defining file types and distinguishing among various versions of the same file that are formatted differently.

1 Like

I believe @gregraven is very much on topic @inx.

If I understand it correctly, this doctype header is there to help the browser figure out the best way to display the content of the .html page, not to tell the server (or in Netlify’s case, the CDN) what kind of document it is. Furthermore, I’d be shocked if each of the many other file types had a similar XML-style declaration in the first bytes of the file. Therefore, even if you could get the server / CDN to observe the doctype declaration in your HTML files, there almost certainly isn’t something similar in other files, so you’d have the same situation but without the consistency of applying the desired file type extension to each file. Finally, this approach would require the reading of each file before sending it to the requester, which would introduce a delay (and possibly create security issues). You might be able to program a server to do this, but the whole point of a a CDN is that it serves the files requested as quickly as possible – hence, there is neither time nor the apparatus for reading the file to determine what else should be done to make it appear on the other end.

P.S. You might be able to set up Apache to do what you want, but of course Netlify doesn’t run a server to provide files to visitors, so there’s no Apache configuration possible.

1 Like

I’d be shocked if each of the many other file types had a similar XML-style declaration in the first bytes of the file

Most common file types are easily identified with similar means. It’s not always XML, of course, but it’s there. That’s why I can double-click files on my desktop and they open with the right program, regardless of whether their names have special suffixes.

there almost certainly isn’t something similar in other files

There is. See libmagic or the unix file --mime command if you want to learn about how this works.

this approach would require the reading of each file before sending it to the requester

The files already have to be read in order to send them.

which would introduce a delay

It wouldn’t. The files are already being read. There’s no need to read them twice for each request. Alternatively, the file types could be detected at build/deploy time, rather than serve time. (In fact, it looks like they already are, as we can see from Netlify’s [[headers]] and _headers support.)

Even the worst case of a lazy implementation serving a HEAD request wouldn’t be particularly slow, as I’ve just seen by timing a stand-alone program that does it without caching on a low-power ARM machine. The operation is quite fast, and as I’ve already pointed out, avoidable at serve time.

there is neither time nor the apparatus

There absolutely is. If there wasn’t, Netlify’s manually-created .toml and _headers files wouldn’t work. I’m just talking about automating it.

Like this?

$ file --mime filename.*
filename.js:  text/plain; charset=us-ascii
filename.md:  text/plain; charset=us-ascii
filename.txt: text/plain; charset=us-ascii

So how would you differentiate between .js, .md, and .txt files?

I mentioned that command to help you understand the subject matter better than you do, on the off chance that you were genuinely curious and not just trolling. It was not an implementation proposal.

(Also, your example is meaningless, since you didn’t include the files’ contents, and just as irrelevant as your earlier scenario, since my use case is obviously not the same as yours. I don’t present javascript or markdown URLs to users.)

Now, since you have made it abundantly clear that you have nothing helpful to contribute here, please take your obstructionism elsewhere.

The forum won’t let me edit my first post, and I’m about to manually apply a Content-Type header to the example link there, so it will no longer work as an example. For the record, here’s a new example link.

@inx Your local computer is evaluating files during the opening process. Even a low-powered ARM machine is doing more computation than the CDN. My understanding is that the Netlify CDN just sends the file. What it does during the build seems irrelevant for the purposes of this discussion because once the site is built, the server turns everything over to the CDN, which has little to no computational input regarding the files being sent.

I also doubt that the files are being read. Files can be sent without being read, but not the file names. The browser reads the files once it receives them, but not the CDN.

If you find applying Content-Type headers to your files less fussy than distinguishing among them with filename extensions, why not just do that?

I answered that question in my first post. I will probably end up doing that, but it would be nice if I didn’t have to.

@inx You seem only sort of to have emulated the normal behavior by setting the Content-type header. For your first two files, the return is content-type: text/plain; charset=UTF-8. For your third file with the Content-type header set, the return is content-type: text/plain.

Are you referring to my new example site? Only one of those pages is served as text/plain. Did you mean to type text/html instead of text/plain?

The first two pages there (page and page.html) are using whatever Netlify chooses for content-type. They exist to show the difference in what Netlify does based on the file name.

The third page (page-with-custom-header) has a custom header of text/html, demonstrating that html files can be served and rendered correctly with no special file naming convention, although Netlify seems to require manual header configuration to accomplish this.

@inx I used the curl -v command. Netlify served both page and page.html with the same content-type for me. The different in display is probably in the browser, not in what Netlify does or does not send.

Hi, @inx. I do see different content-type headers for your demo site:

$ curl -svo /dev/null https://blue-noodle-8a68ad.netlify.app/page.html  2>&1 | egrep 'content-type'
< content-type: text/html; charset=UTF-8
$ curl -svo /dev/null https://blue-noodle-8a68ad.netlify.app/page  2>&1 | egrep 'content-type'
< content-type: text/plain; charset=UTF-8

We have filed a feature request for this (automatically setting the content-type header to text/html for HTML files without file extensions). If this does become possible we will post an update here to let you know about it.

One of the biggest factors in determining if feature request is actually created or not is the number of requests for the feature. As noted by others in this topic, this isn’t a common request and therefore the likelihood the feature becoming a reality is quite low.

Please note, I’m not saying the feature request isn’t valid. I’m just trying to set expectations realistically about the chances of this occurring. I don’t want you waiting for a feature request which won’t actually become available.

On the other hand, if many other people also make similar requests, it will increase the priority and make the feature request becoming available more likely.

If there are other questions about this, please let us know.

Thanks very much, @luke.

No need to worry about expectations. I understand completely. Not everyone cares about URL aesthetics, after all. :slight_smile: