Prevent google indexing dev/staging subdomain

Hello,
as the title say, I have a custom domain and I need to avoid google to index dev and staging subdomains

Thank You in advance

You’d either have to use HTTP headers or the noindex meta tag. If you can detail more about your setup, we can help you find the most suitable one.

I’m using gridsome for built the website. I’ve 2 branch, “main” and “dev” and I don’t want “dev” to be indexed

If both the branches are having a different code altogether, I think the noindex meta tag would be easier to configure if all you need to do is add the line once in your template somewhere.

Otherwise, you can use a seperate Netlify.toml file for each branch and in the file for the dev branch you can add the custom headers that would tell search engines to not index your website.

Lastly, I guess the simple solution would be to use a robots.txt and add the root of your dev subdomain to the block list.

I think I’ll consider the first and second options, because for the third I need to create 2 separated robots.txt (one for the main domain and the other one for dev.mysite.com)

I’ve two question:

  • If I use two separated .toml files, once I’ll do a git merge will the .toml file be ovewrited?

  • How can I use multienvironment variables with Netlify? i.e. I’ve “.env” and “.env.development” in my project, but they are not version controlled, how can I configure environment variables based on the environment?

You can add multiple environment variables according to contexts in netlify.toml like this:

[build]
context = "branch-delpoy"
[context.branch-deploy.environment]
KEY=VALUE

Also, you don’t need to keep 2 robots.txt files. If you want your production website to be indexed fully as in if you don’t need a robots.txt for the production website you can do something like this:

Create the required robots.txt in your base path and for your dev stage, change your build command to <your current build command> && cp robots.txt <publish path>/robots.txt. So, this file would be copied to the publish folder before Netlify uploads it and since it will exist at the root of your publish folder, search engines would read it without any issue.

Hi @hrishikesh,
Please tell me if I’m wrong, but if I create a robots.txt file on the dev branch, then when I’ll do a merge between “dev” and “main” branch, will it also be copied to the main branch?

Also, how to configure headers in different branches? Via .toml file or via multiple _headers files? Can you give me an example please?

Create a robots.txt in the base path of your dev branch, that is the folder with package.json. Files from this folder are not generally used by any builders (except for configs). So even if both branches do end up having the robots.txt in the root (base path), it won’t have any effect on your SEO unless it’s copied to the publish path. This is why, you can have the same netlify.toml file with the following config:

[context.production.build]
  command = "<current build command>"
[context.branch-deploy.build]
  command = "<current build command> && cp robots.txt <publish path>/robots.txt"

With the above config in netlify.toml, your production builds (have the context as production by default), will just build the website and then publish it, however, the branch-deploy builds (have the context as branch-deploy by default), will build the website and then copy the robots.txt file from the base path to publish path and then publish it. Since it’s copied to the root of the publish path, the file would be available at yourdomain.tld/robots.txt (ideal for most search engines).

I am not personally sure about if this can work in the same netlify.toml file or not, but you can give it a go. The file would look like this:

[context.branch-deploy.headers]
  for = "/*"
  [context.banch-deploy.headers.values]
    X-Robots-Tag = "noindex"

The above should work, but as I said, I’m not sure. It should not affect your production branch, but just check it once. If this doesn’t work, you’d have to maintain individual _headers or netlify.toml file for both the branches.

I’ve the same folder structure for both branches (both have package.json in the root), the build command is gridsome build and the dist folder is /dist

That’s why I think that once the merge is done, the files (robots.txt, netlify.toml) will be overwrited

Do you need any rules in robots.txt for your production branch?

This for production:

User-agent: *
Allow: /
Sitemap: http://www.example.com/sitemap.xml

And this for development:

User-agent: *
Disallow: /

Okay, so you can keep 2 separate files say robots.production.txt and robots.branch.txt. And then your config can look like:

[context.production.build]
  command = "gridsome build && cp robots.production.txt dist/robots.txt"
[context.branch-deploy.build]
  command = "gridsome build && cp robots.branch.txt dist/robots.txt"

So it will copy the robots.txt file according to context.

2 Likes

Fantastic! I’ll try It :slight_smile:

Thanks for Your help, very helpful

1 Like