XML file served as html

I have a sitemap.xml file at root of site. I have added a header rule in netlify.toml specifying correct content type, but when I view the file in a browser it displays as html.

# Sitemap
[[headers]]
  for = "/sitemap.xml"
  [headers.values]
    Content-Type = "application/xml"

Here is the file:

https://h2qz.netlify.app/sitemap.xml

Why isn’t it displaying as xml?

Hi, @mjgs. Because there is no such file and a default 404 page is being sent instead. The 404 page is HTML and not XML. If you request an XML file the content-type is text/xml

$ curl --compressed -svo /dev/null --stderr -  https://h2qz.netlify.app/feeds/links/rss/feed.xml  | egrep '^< content-type'
< content-type: text/xml

So, the real problem to solve here is that is there no sitemap.xml file in the deploy. You will need to generate or include that file in the deploy and then the the [[headers]] setting will work.

Thanks for the reply.

Shute, in the meantime I must have deleted the file to test something else. I’ll re-generate it, and update the thread shortly.

Hi @luke I’ve put the file back. I’ll try to keep it there for a day or so.

Here’s a screenshot of what I’m seeing when I load it in a browser:

The URLs in the sitemap have the live site hostname, this is deployed into a staging server. The point is that I was expecting to see XML not HTML.

Any ideas why it’s not displaying the XML that’s in the sitemap file?

I have to remove the file from the staging server to get on with other things. I can put it back again later if that’s helpful.

Could you list a few troubleshooting things I could do? Or some additional info I could provide to you to help me figure out the cause?

Hi @luke, I’ve put the sitemap.xml file back on the staging server. (Should be there once the latest build completes in a few mins)

It would be great if you could take a look and let me know you were able to see it. Thks

What’s the link to your staging server?

Here is the file:

https://h2qz.netlify.app/sitemap.xml

That’s being served as XML, which is why I asked where your file is. If that’s the same link, I don’t see a problem there.

For comparison, here’s another xml file, different format, which displays as actual xml, which is what I would expect for the other file:

https://h2qz.netlify.app/feeds.opml

I noticed that the opml file, which is xml, didn’t have a header rule, so I tried both

  • no header rule for both
  • same header rule for both

In neither case did the sitemap display as xml.

On the other hand the feeds.opml did display as xml without the header rule. With the header rule, it displayed as html.

Here’s the opml file with the header rule defined:

What’s going on with these header rules?

Why doesn’t the sitemap.xml file display as xml in the browser no matter what I do?

Updated: correct path to staging server OPML file
Updated: screenshot of staging OPML file

The file is being served as XML:

$ curl --compressed -svo /dev/null --stderr - https://64d042e3be3d0c14820eb0e8--h2qz.netlify.app/sitemap.xml  | egrep '^(<|>)'
> GET /sitemap.xml HTTP/2
> Host: 64d042e3be3d0c14820eb0e8--h2qz.netlify.app
> User-Agent: curl/8.1.2
> Accept: */*
> Accept-Encoding: deflate, gzip
>
< HTTP/2 200
< accept-ranges: bytes
< age: 0
< cache-control: public,max-age=0,must-revalidate
< content-encoding: gzip
< content-type: application/xml
< date: Wed, 09 Aug 2023 23:48:17 GMT
< etag: "2ce0332f1b207d555c965b546df4538b-ssl-df"
< server: Netlify
< strict-transport-security: max-age=31536000; includeSubDomains; preload
< vary: Accept-Encoding
< x-nf-request-id: 01H7EB6YMYEM53EKFWJJV0H85M
< x-robots-tag: noindex
<

It shows content-type: application/xml above. Also, you can see the file itself is XML:

$ curl -s https://64d042e3be3d0c14820eb0e8--h2qz.netlify.app/sitemap.xml
<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:news="http://www.google.com/schemas/sitemap-news/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns:video="http://www.google.com/schemas/sitemap-video/1.1"><url><loc>https://markjgsmith.com/about/index.html</loc></url><url><loc>https://markjgsmith.com/archives/index.html</loc></url><url><loc>https://markjgsmith.com/blog/index.html</loc></url><url><loc>https://markjgsmith.com/contacts/index.html</loc></url><url><loc>https://markjgsmith.com/feeds/index.html</loc></url><url><loc>https://markjgsmith.com/feeds.opml</loc></url><url><loc>https://markjgsmith.com/index.html</loc></url><url><loc>https://markjgsmith.com/job-interview-policy/index.html</loc></url><url><loc>https://markjgsmith.com/latest/index.html</loc></url><url><loc>https://markjgsmith.com/links/index.html</loc></url><url><loc>https://markjgsmith.com/newsletter/index.html</loc></url><url><loc>https://markjgsmith.com/podcast/index.html</loc></url><url><loc>https://markjgsmith.com/portfolio/index.html</loc></url><url><loc>https://markjgsmith.com/pricing/index.html</loc></url><url><loc>https://markjgsmith.com/recommendations/index.html</loc></url><url><loc>https://markjgsmith.com/services/index.html</loc></url><url><loc>https://markjgsmith.com/sponsorships/index.html</loc></url><url><loc>https://markjgsmith.com/tags/index.html</loc></url><url><loc>https://markjgsmith.com/archives/blog/index.html</loc></url><url><loc>https://markjgsmith.com/blog/2021/index.html</loc></url><url><loc>https://markjgsmith.com/blog/2021/06/01/ipsum-dolor-sit-amet/index.html</loc></url><url><loc>https://markjgsmith.com/blog/2021/01/01/ipsum-dolor-sit-amet/index.html</loc></url><url><loc>https://markjgsmith.com/blog/2022/index.html</loc></url><url><loc>https://markjgsmith.com/blog/2022/06/01/ipsum-dolor-sit-amet/index.html</loc></url><url><loc>https://markjgsmith.com/blog/2022/02/08/up-with-templating-in-modern-javascript-frameworks/index.html</loc></url><url><loc>https://markjgsmith.com/blog/2022/01/01/ipsum-dolor-sit-amet/index.html</loc></url><url><loc>https://markjgsmith.com/archives/links/index.html</loc></url><url><loc>https://markjgsmith.com/links/2021/index.html</loc></url><url><loc>https://markjgsmith.com/links/2021/08/index.html</loc></url><url><loc>https://markjgsmith.com/links/2021/08/12/index.html</loc></url><url><loc>https://markjgsmith.com/links/2021/08/12/151441-markjgsmith.com/index.html</loc></url><url><loc>https://markjgsmith.com/links/2021/08/12/141441-share.transistor.fm/index.html</loc></url><url><loc>https://markjgsmith.com/links/2021/08/12/052614-ckarchive.com/index.html</loc></url><url><loc>https://markjgsmith.com/links/2022/index.html</loc></url><url><loc>https://markjgsmith.com/links/2022/01/index.html</loc></url><url><loc>https://markjgsmith.com/links/2022/01/01/index.html</loc></url><url><loc>https://markjgsmith.com/links/2022/01/01/163045-markjgsmith.substack.com/index.html</loc></url><url><loc>https://markjgsmith.com/links/2022/01/01/162741-blog.markjgsmith.com/index.html</loc></url><url><loc>https://markjgsmith.com/links/2022/01/01/162156-blog.markjgsmith.com/index.html</loc></url><url><loc>https://markjgsmith.com/archives/newsletter/index.html</loc></url><url><loc>https://markjgsmith.com/newsletter/2020/index.html</loc></url><url><loc>https://markjgsmith.com/newsletter/2020/10/21/third-issue/index.html</loc></url><url><loc>https://markjgsmith.com/newsletter/2020/10/21/second-issue/index.html</loc></url><url><loc>https://markjgsmith.com/newsletter/2020/10/19/first-issue/index.html</loc></url><url><loc>https://markjgsmith.com/newsletter/2021/index.html</loc></url><url><loc>https://markjgsmith.com/newsletter/2021/02/05/fifth-issue/index.html</loc></url><url><loc>https://markjgsmith.com/newsletter/2021/02/04/fourth-issue/index.html</loc></url><url><loc>https://markjgsmith.com/archives/podcast/index.html</loc></url><url><loc>https://markjgsmith.com/podcast/2020/index.html</loc></url><url><loc>https://markjgsmith.com/podcast/2020/10/21/0003-noisey-cafe-2/index.html</loc></url><url><loc>https://markjgsmith.com/podcast/2020/10/21/0002-noisey-cafe/index.html</loc></url><url><loc>https://markjgsmith.com/podcast/2020/10/19/0001-silly-chant-too-early/index.html</loc></url><url><loc>https://markjgsmith.com/podcast/2021/index.html</loc></url><url><loc>https://markjgsmith.com/podcast/2021/02/05/0017-foot-badminton-in-the-park-at-sunrise/index.html</loc></url><url><loc>https://markjgsmith.com/podcast/2021/02/04/0016-morning-sound-check-in-the-park/index.html</loc></url><url><loc>https://markjgsmith.com/feeds/blog/rss.xml</loc></url><url><loc>https://markjgsmith.com/feeds/links/rss.xml</loc></url><url><loc>https://markjgsmith.com/feeds/newsletter/rss.xml</loc></url><url><loc>https://markjgsmith.com/feeds/podcast/rss.xml</loc></url><url><loc>https://markjgsmith.com/tags/blog/index.html</loc></url><url><loc>https://markjgsmith.com/tags/links/index.html</loc></url><url><loc>https://markjgsmith.com/tags/newsletter/index.html</loc></url><url><loc>https://markjgsmith.com/tags/podcast/index.html</loc></url></urlset>%

I can confirm is that the file is sent with the correct content-type header and the correct content is sent. If you are seeing errors in your browser, that I cannot troubleshoot as I don’t have access to your browser to do so.

To summarize, I cannot see any errors when I test. Can you send us a HAR recording of the incorrect response? (That or the x-nf-request-id HTTP response header for the incorrect response?)

I don’t think there is an incorrect response, though. However, if there is, please let us know.

Thanks for taking a look @luke

I guess what’s viewable via the browser isn’t that important, as long as those commands you run return the right results. Also I presume Google will report an error when I submit the sitemap if there is an issue.

I’m glad it’s working, though I want to be sure I have it configured in the most stable way.

Currently there are no rules configured, they are commented out:

#[[headers]]
#  for = "/sitemap.xml"
#  [headers.values]
#    Content-Type = "text/xml"

# RSS feeds
[[headers]]
  for = "/feeds/blog/rss.xml"
  [headers.values]
    Content-Type = "text/xml"

[[headers]]
  for = "/feeds/links/rss.xml"
  [headers.values]
    Content-Type = "text/xml"

[[headers]]
  for = "/feeds/newsletter/rss.xml"
  [headers.values]
    Content-Type = "text/xml"

[[headers]]
  for = "/feeds/podcast/rss.xml"
  [headers.values]
    Content-Type = "text/xml"
    
#[[headers]]
#  for = "/feeds.opml"
#  [headers.values]
#    Content-Type = "text/xml" 

Is the application/xml content type that was returned in your command output some sort of default?

Is it better to configure a rule to ensure things don’t change unexpectedly?

Which is better for xml files: text/xml or application/xml ?

Yes, we have default content-types for a lot of formats, and we don’t plan to change this (as we’re following the current standards). You can choose to specify it explicitly, though that’s not required.

Your final question is answered here: rest - What’s the difference between text/xml vs application/xml for webservice response - Stack Overflow

Thanks for the help on this thread.

I managed to get Google Search Console to parse the sitemap, looks like no errors reported.

See screenshot attached to this email, replying via email because thread content isn’t currently loading on the website.