[Incident] Netlify service outages affecting ability to depend on Netlify - please comment here

ALL our sites are down - 30+ sites running on Netlify - all FUBAR.

image

image

1 Like

Frustrating to see the status page showing ā€œdegradationā€ when weā€™re all experiencing outage.

Would be good to at least acknowledge itā€™s actually outage so we can follow that page & have faith in it.

5 Likes

Iā€™m sorry this is wrong status here:Netlify Status - Increased errors and latency affecting multiple services

Its not a case of ā€œhigh latency on build and APIā€ - its completely outage. I am neither trying to build, nor do I use Netlify APIs.

3 Likes

Same here. 2 weeks, 10 hours down and counting. Itā€™s getting really difficult to rely on Netlify.

2 Likes

Hey folks,

Thanks so much for reaching out. We have mitigated the degradation now and are continuing to monitor the situation.

I hear how this has been impactful, and I will be sharing all of your feedback directly with our teams.

1 Like

Sadly, the mitigation isnā€™t working for our website at the moment.

Thanks for reaching out, @niels-nijens.

Our mitigation was not fully effective and we are seeing more latency, timeouts, and errors for serving uncached content, API responses, and builds for all customers. Our team is working hard on a fix.

We will continue to update our status page: https://www.netlifystatus.com/

2 Likes

Letā€™s see how soon the Netlify team will fix this.

1 Like

Honestly, the reason we came to Netlify and Jamstack in general was exactly that we wouldnā€™t have to deal with this so often. And Iā€™m not saying this because I want to troll, but because I am running a business. We will at least go have a look at Vercel now

1 Like

Hey there, @madsem :wave:

Thank you for taking the time to reach out and share this feedback-- we take customer experience very seriously and I assure you that I will share your feedback with all of the relevant teams as we work to fix this.

Thanks @hillary, I am mostly annoyed by the fact that there was no notification or anything. I found out by looking at our clients paid ad campaign stats that something wanā€™t right.

Outages can happen, but I believe itā€™s important to be upfront about this and not downplay it. Notify your customers.

1 Like

@hilary myself and many others in this thread (likely tens of thousands who didnā€™t post) think outages do happen, but refusal to acknowledge the outage is extremely frustrating when our sites go down

site is back up, but we need a correction to the ā€œdegradationā€ narrative as it feels like a shady policy to pretend no outage occurred to protect the uptime stats etc.

Without acknowledgment we donā€™t know whether Netlify engineering is even aware. Or are there gaps in the monitoring which we need to inform somebody about.

I put a more elegant version of this in another thread:

3 Likes

hi everyone,

Support Leadership here - thank you for your patience as we continue to work with the folks who are working to remediate the underlying problems.

I understand your frustrations - hearing you loud and clear - and of course that is never the experience we want you to have. As soon as we have a better idea of what, when, how and why these incidents have been impacting services - we will be happy to share as much information with you as we can.

I promise that we understand the impact and are working as hard as we can to remediate. :pray:

3 Likes

Thank you for the update - but please, correct the categorization to ā€œpartial outageā€ at least. We are not lying or mistaken. Itā€™s also a very clear-cut definition.

1 Like

Agreed; if this doesnā€™t rise to the level of at least ā€œminor outageā€ Iā€™m not quite sure what does. What myself and many others who are in directorial or client-facing positions need today is a cogent and transparent report we can point to that explains what happened and what steps are being taken to address it into the future.

Netlify, truly, I love you, but youā€™re bringing me down. As a smaller agency weā€™ve come to rely on your services and youā€™ve been critical to our growth, and weā€™ve been huge boosters for what your platform provides. I still am. But I struggle with the status pageā€™s response because ultimately we have clients of our own that we answer to and ā€œdegraded performanceā€ doesnā€™t capture their experience or give me what I need to calm them down. I know this is a rough day for you all, and I appreciate that you all are working as hard as you can. The extent of the problem might not yet be fully understood. But what our clients understand right now is that their sites were fully broken and the status doesnā€™t make them (or many here) feel heard. Thank you, and I hope you all get to take a vacation after this one.

Edit: I see ā€œWe will be writing a public root cause analysis describing what led to the issue and how weā€™ve resolved it.ā€. Thank you! (And also, everything appears to be resolved on our sites)

3 Likes

Itā€™s like somebody in Netlify has said "OK letā€™s communicate this but use the word ā€˜degredationā€™ not ā€˜downtimeā€™ or ā€˜outageā€™ ".

totally hear you mfan, and i will ask someone else who has a more higher level view to weigh in on the process of classifying the incident as soon as we can.

for the time being, we think we have things fixed -

are you still seeing issues? if yes, can you report back here with some information about where and what you are seeing? thank you.

1 Like

google5 - i promise that this was not malicious. I understand the impact, and i hear that you are angry, frustrated, and feel like your trust has been broken. But accusing us of trying to mislead our customers isnā€™t appropriate, and will never be appropriate - we have built our track record not on infallability, but on transparency and honesty, and this incident is no exception. Please be mindful that complexity isnā€™t always readily apparent, and incidents like this one is one of the times where we try and prioritize speed over accuracy.

more details as soon as they become available, and please do share with us if you are still seeing errors.

2 Likes

Iā€™m not still seeing errors. All I want is an acknowledgment that this was a downtime incident for myself & many others here.

Itā€™s not purely principleā€¦ itā€™s also pragmatism. Many of us checked the status-page, saw an update about latency and a reality of 500 status for hours, so we did troubleshooting in that time which we wouldnā€™t have needed to do if there was a clear/correct reporting on the status page.

Troubleshooting info:

If it helps with your troubleshooting, I had a build in-flight when the downtime hit which could have put the cache into an unknown state or something. Itā€™s plausible that some timing component caused some of us to have full outage. Rebuilds and rebuild-with-cache-clear did not resolve it.

Please donā€™t dismiss the people who had 500 status for hours.

Even if this only affected 0.1% who happened to have builds in flight, or be on a certain pod or whatever - itā€™s an outage incident for those users and not calling it that feels very dismissive. If this was only some users, thatā€™s a partial-outage status. We shouldnā€™t have to campaign to essentially log a ticket (in this case telling you we experienced outage not degradation).

Can you directly answer the question of why the categorization remains yellow?

hi there google5,

by yellow, do you mean this:

on netlifystatus.com?

it is our policy to leave incidents in monitoring until we feel sure that the issue is resolved - we do this for all incidents. We will move to resolved as soon as we can confidently say that it is resolved. I would check again in 5-10 minutes.

as far as categorizing the incident is concerned, i am seeing that there is currently a meeting underway to discuss and that we will be releasing as much information as we safely can when we can. Unfortunately i donā€™t have a timeline on that, it could be later today, or it could be once we have had time to get the team together to do a retrospective. Once/If that timeline gets clarified, iā€™ll bring that info here asap.

regardless of the label on the status page, i do agree with you and everyone else who feels this was a severe, impactful incident - and i donā€™t think anyone at netlify disagrees. I promise we will treat it as such, that promise is based on seeing department heads etc etc who are jumping into a call to discuss as i am typing this.

again, definitely not the experience we want you or anyone else to have, and i totally get that this was a rough one, i would be upset too if i were in your position.

for now Iā€™m glad we fixed this for yā€™all, and we will be moving to resolved (if things stay stable), shortly. :pray:

1 Like