My understanding is that Netlify have automatically added a canonical link tag to the response headers of the base .netlify.app domain; checking rubberring.netlify.app there is indeed a canonical link tag.
Unfortunately, this link tag does not appear to be present in the response headers of the staging branch or the deploy preview urls.
I’m wondering if there is a way to prevent Google from indexing any of them at all?
I’ve tried creating a _headers file my build path with the following code, but I’m not seeing any difference in the response headers:
Thanks for responding. Since the code in my staging branch is more or less the same as the code in my main branch, it sounds like option 1 is going to be the best approach. I have a follow up question…
Do you know what the contents of my robots.txt file needs to be in order to
@davidf Ideally to achieve the indexing of the main branch, but not the staging/preview deploys, you should output different robots.txt files depending on the “Deploy Context”.
I explain the approach here regarding passwords, but it’s much the same for adjusting the output for any purpose:
Effectively you would configure your contexts to run a build command that outputs a permissive robots.txt for your main branch and a restricted one for any other branch.
I noticed that it mentioned a further solution using netlify-sitemap, which is a package I am actually already using in my project.
It seems that I’ll just have to set the config to generate the robots.txt file based on whether the ‘ENVIRONMENT’ .env variable is production or not. The one part of this that I’m unsure about is how do I make sure the ‘ENVIRONMENT’ variable is production in my main branch but ‘DEVELOPMENT’ in the rest of my branches? This article suggests that it’s possible, but I’m not sure that Netlify allows different env variables for each environment?
Edit:
So did a bit of further digging through the docs and it looks like you can set the environment variables in the toml file.
My toml file now looks as follows (not sure if all of these are necessary, but making sure I cover all bases)
If it works, it works.
I’d just check the robots.txt by either visiting the /robots.txt url of the generated site, or by downloading the output of a specific build e.g.
I’ve tried this out and, after capitalising ‘process.env.ENVIRONMENT’ in next-sitemap.js, I’m now getting the result I want for the main/production branch and the staging branch.
Just an issue I’m still dealing with:
I’m still having problems with the deploy previews (the versions of the site that are accessed by clicking the ‘Preview Deploy ’ link in your screenshot). They seem to have the same robots.txt as the main/production site, despite specifying the ENVIRONMENT variable as ‘dev’ in the netlify.toml file under [context.deploy-preview.environment]. Do you know why this might be the case?
Digging into the docs on deploy contexts a bit further, the deploy-preview is defined as “a deploy generated from a pull request or merge request”. I’m not sure that this matches the scenario I have here as these preview links are not generated from a pull or merge request, they’re simply generated, along with an update to the main/production site, each time I push a change to the main branch. Is there another way I should be targeting these previews?
I’ve never utilised the [context.deploy-preview.environment] context myself, but I believe that link points to the result of the build which in this case would be the result of the main build… which should be indexed (hence having the robots.txt for main/production).
Thanks for clarifying. So it looks like we’re still without a way to prevent these previews from being indexed. I’m hoping someone from Netlify can chip in to say how to either target them with the netlify.toml file or disable them completely
Netlify may also be able to clarify the “preview” terminology and features too.
Ultimately I’d imagine the previous deployments not being deleted is tied to both the cdn and the fact Netlify let you instantly re-deploy any previous build.
(the last line in the above screenshot). This header is automatically applied to any permalink-based deploy so that it doesn’t end up in search engines. Almost all popular search engines respect this header and I’ve never seen my preview deploys indexed by Google.
With that being said, here’s a little clarification about the contexts. Each build will show you what context it’s running with:
All the links of this deploy (the custom domain, the Netlify subdomain and the deploy permalink, that’s the preview deploy button), have the same context. So, you cannot individually target these links. You can use _redirects to redirect all those domains to the same domain, but that’s about it.
The contexts that was being talked about above (branch, deploy-preview) are the other two contexts usually used apart from production. They can be individually targeted with netlify.toml.