How to remove duplicate URL from Google Search? (indexed all *.html urls by accident)

Hey, I need some suggestions due to the fact that I accidentally indexed duplicate urls of my website. Now Google Search shows double queries for all of my pages - paths with .html extension and without.

I’ve read that this may cause some kind of problems with SEO and is generally not good at all.

What I tried so far:

  • redirecting all /.html routes to / but by default Netlify forces to leave both version of the url anyway and won’t redirect them

Any suggestions?

You should have a canonical URL on each of your pages. Read here: 網址標準化和標準標記 | Google 搜尋中心  |  說明文件  |  Google Developers. Then, even if Google sees duplicate pages, it won’t be a problem.

As for what you can do for now is, try to remove the duplicate pages from Search Console.

2 Likes

Thank you for the answer.
I already added canonical url for every duplicated page so that’s a plus.
I didn’t find a way to remove a page from Google Search Console that actually works in my case. I can only temporarily remove the url for 6 months, but to remove it permanently Google suggests to create 301 redirects but I cannot do that since Netlify _redirects file doesn’t work in this case.

Yes, the only option is to temporarily remove them. You could add a robots.txt to block them (so they don’t get re-indexed after 6 months) but I’m not sure if it can be specifically applied only to the .html pages or not.

EDIT: The page I linked above mentions that don’t use URL removal tool to remove canonical URLs. It will remove all versions. I’m not completely sure if it affect you 100%, but you should try with only a few URLs for now.

Also, Google actually automatically selects duplicate pages as canonical, so I’m not seeing a reason to worry.

1 Like