Connection Issues
Incident Report for Help Scout
Postmortem

What happened

Today from 11:32 - 11:52 EST, Help Scout users experienced somewhat elevated page load times – and in some cases page load failures – in the web application. Mailbox APIs 1.0 and 2.0 also had elevated response times and some failures.

The root cause of the issue was unrelated to the few performance issues we faced last week. In this case, a drastic increase in email traffic (did someone say Cyber Monday?) created a chain reaction that caused Help Scout to perform slowly for the 20-minute window. The Ops team was able to increase available resources and handle the load eventually, resolving the problem.

What we’re doing about it

We haven’t seen an issue like this one before. Fortunately there are a couple of software upgrades we can make, and one configuration change that will be made to our message queueing service. Both should make our systems more resilient if this situation were to happen again.

We understand how critical it is for Help Scout to be on point for your team during the holiday season, and we’re up to the task. Very sorry for the trouble today, but we’re poised to make sure it does not recur.

Posted 13 days ago. Nov 26, 2018 - 17:32 EST

Resolved
Performance is stable across the board, so we're closing this out. Email queues are empty and processing messages in a timely manner, as expected, though you may receive a large batch of email (including email notifications) all at once over the next few minutes. We have a hunch about what caused today's performance issues. We'll publish another update soon with details about what happened. Sorry for the Monday morning trouble.
Posted 13 days ago. Nov 26, 2018 - 12:20 EST
Update
Web-app performance is back to normal, and all services are stable at the moment. Folder counts, Workflows, search and email delivery are still catching-up. We're keeping an eye on things for a bit longer, we'll post another update before closing this out.
Posted 13 days ago. Nov 26, 2018 - 11:57 EST
Monitoring
Web app performance is stabilizing, though page loads are still a little sluggish. It's worth mentioning Workflows, inbound and outbound email, folder counts, search, and reports will be delayed until we're in the clear.
Posted 13 days ago. Nov 26, 2018 - 11:51 EST
Update
We’re still seeing sporadic long page loads accompanied by timeouts when trying to access the web app. If you're using Beacon and live chat, you might notice that messages fail to send. We're working to get everything back to normal as quickly as possible.
Posted 13 days ago. Nov 26, 2018 - 11:43 EST
Investigating
We're looking in to reports of slow page loads across the app. Some customers may see loading errors when attempting to open folders or send replies. We'll update here when we have more information.
Posted 13 days ago. Nov 26, 2018 - 11:38 EST
This incident affected: Web App, Beacon, Mailbox API 1.0, Mailbox API 2.0, Email Processing, Reports and Search, and Live Chat.