Hey @webshaped.biz, I can try to answer some of what you’ve inquired about here, although as a community volunteer, definitely do not take my word as hard ahead of a response from a Netlify employee.
I had to go check status.netlify.com, since I didn’t recall hearing of any major outages. It looks like prior to October 8th, the only other recent issue with the origin servers was on September 22nd when there was some latency trouble. With the outages that occurred most recently, it looks like customers on pro and enterprise tiers were largely unaffected, so those that have invested in Netlify to guarantee a lower risk of outages are getting what they’re paying for. If you’re someone who depends heavily on having your site be online 99.9% of the time (i.e. running an API that needs to be up at all times, a storefront that could cost your business a minimum of thousands of dollars if down or maintaining infrastructure for clients that have trusted you to keep up) and are currently using Netlify’s free plan, moving to a paid plan will definitely give you the peace of mind you’re looking for.
Computing infrastructure isn’t perfect, and for a company like Netlify that has a massive free-user base there’s bound to be hiccups in the system, overtaxing of limited resources shared by thousands, and even with fail-safes in place sometimes it just isn’t enough to compete with a critical issue that brings the whole system to a halt.
In the case with the issue that took place on October 8, the CDN, dashboard and sites appeared to remain online; only the origin servers and build pipeline were down. In a Content Delivery Network setup there are 2 different types of machines: origin servers and edge servers. Since it’s not practical to deploy a site to hundreds of servers all over the world, the site is usually deployed to one “origin server” and that is turn mirrored to many “edge servers”. The edge servers serve the site to the users that are geographically closest to them to keep latency low. On a regular basis, the resources cached on the edge servers expires and forces the edge servers to re-pull from the origin. Netlify takes this a step further by instantly invalidating the edge server cache so that your updated site can be made available immediately.
When the origin servers goes down though, this creates a problem, as now the deployment system has nowhere to deploy files to, meaning that no builds can go out. Secondary to this, no sites can get updated because the edge servers can only serve what they have cached until the origin servers come back online and a new copy can be pulled and cached.
In a perfect world, there wouldn’t be any downtime at all, but as I said above if downtime is something critical to your business then a Pro or Enterprise plan would definitely give you more peace of mind. Thankfully, redundancies and backups exist so when servers do go down they can be brought back up in a matter of minutes to hours rather than days.
I’m sure the Netlify team can cite more specifics, but in the meantime hopefully this explanation helps some.
For more info, check out this article from CloudFlare on how CDN’s work: https://www.cloudflare.com/learning/cdn/what-is-a-cdn/, or explore Netlify’s product pages at: https://www.netlify.com/products/.