Connection Issues
Incident Report for Help Scout
Postmortem

What happened

Yesterday around 5:00pm EST, our hosting provider had a brief network interruption between availability zones. Since Help Scout is clustered across availability zones, this wasn't a big problem, but did result in a few minutes of slow page loads for the help desk and Docs products.

The more concerning issue is that the network interruption caused partitioning of our email processing systems. To ensure integrity of the data, we had to take this system offline, between 5:30-8:00pm EST. During this time, inbound and outbound messages were held and not processed.

At roughly 8pm EST, we brought email processing back online and pushed all pending messages into Help Scout.

Today we're looking at all of the backups to make sure all messages received in this time period were properly processed. If you know of any conversations that failed to make it into Help Scout, please send over some specifics and we'll investigate further.

Going forward

Our Ops team will immediately be adding more redundancy to our email processing infrastructure and altering our clustering strategy to mitigate fallout from network partitions in the future. We'll take all steps necessary to prevent this issue from happening again.

Posted almost 3 years ago. Apr 22, 2016 - 09:42 EDT

Resolved
All queued messages have been processed and inbound/outbound delivery is back on track. We're still searching all backups for any data that was not processed and will add anything we find. Once the investigation is completed, we'll publish a postmortem with full details. So sorry for the email processing delay today!
Posted almost 3 years ago. Apr 21, 2016 - 21:07 EDT
Monitoring
We have all email processing again and are monitoring progress. Everything should be up-to-date within 20 minutes or so.
Posted almost 3 years ago. Apr 21, 2016 - 19:59 EDT
Update
The network issue with our hosting provider between 5:00 and 5:05pm EST has caused a problem with our email processing infrastructure. So that we can prevent any data loss, we're being very careful to resolve the problem and bring inbound/outbound email back online. That's what is holding everything up. We'll post more information as it's available.
Posted almost 3 years ago. Apr 21, 2016 - 19:30 EDT
Identified
We're still working to get email processing back online. Very sorry for the trouble today! We'll update again soon as we continue to make progress.
Posted almost 3 years ago. Apr 21, 2016 - 18:27 EDT
Update
We're still investigating an issue with inbound/outbound email processing and have those services paused for the time being.
Posted almost 3 years ago. Apr 21, 2016 - 17:32 EDT
Update
A brief network issue with our hosting provider has caused email processing infrastructure to become unresponsive. Help Scout remains up and running, but we currently have inbound and outbound email processing paused for further investigation. Additionally, folder views/counts, as well as workflows, will be slow to update until we're in the clear.
Posted almost 3 years ago. Apr 21, 2016 - 17:18 EDT
Investigating
We're currently investigating slow page load times when attempting to access the web app, as well as Docs sites. We'll continue to update as we have more information.
Posted almost 3 years ago. Apr 21, 2016 - 17:09 EDT