As of last night at approximately 6:50pm PDT May 19th, a small number of our users globally (confirmed US and UK at least) began reporting the following certificate error. We did not make any configuration changes, and we have since started receiving more and more of these support messages from our customers. They attempted many different browsers and devices on their networks but kept getting the same error. Ultimately, it looks like switching to a mobile network/hotspot did cure the issue for those who had access to another network. For other users, the issue went away after a few hours.
Our site is hosted at https://app.codingrooms.com/ (via https://codingrooms.netlify.app). The error has been consistent for every report, however, it is only affecting a small number of our users. We have also not been able to personally replicate this issue either yet. Any thoughts on what might be causing this and how we can help to resolve this for our customers?
Edit 12:25pm PDT May 20th, 2022: Some users are able to access via the Netlify subdomain (https://codingrooms.netlify.app/), but continue to get error when accessing the site via our custom domain (https://app.codingrooms.com/). This isn’t a usable workaround for us, just adding for troubleshooting.
Next steps would be to determine what address customers are contacting when they receive this. Oftentimes, they are connecting to some proxy at their ISP which is intercepting the SSL connection or some DNS misconfiguration is causing some of their requests to go to incorrect destinations.
I replied to your tweet and before I did, I walked all CDN nodes to ensure that they all had an up to date and correct certificate (as it is a potential failure mode for one or more nodes to have a wrong/expired cert). The “protocol error”, though, sounds less like “bad certificate” and more like “man in the middle screwing things up in their network path, outside of your or Netlify’s control” as I described above.
Is there any commonality in the customers who reported it’s network connections? for instance “all using comcast” or “all using Tor” or anything like that? That’s where I’d start digging - what the trends are in the affected people.
If you can reproduce it yourself, or have a savvy customer, who can tell us:
what does host app.codingrooms.com return when the failure occurs?
are there more details available in the browser? they might be able to click on the “no lock” icon in the browser to see certificate details and that might help you see that the certificate is signed by cisco or something (meaning that a network device is interrupting things, since there are no cisco devices in the network path at Netlify’s side of things).
perhaps a savvy (affected) user could (potentially install curl, an open source program, and then ) run curl -v https://app.codingroom.com and share the output? That will show precise details about the SSL negotiation failure.
…that would lead to some likely-actionable next steps. Without more details, we won’t have any more specific advice.