Delayed inbound email
Incident Report for Help Scout
Postmortem

Yesterday between 10am -12pm EST, we received a few reports of emails being delayed for 10-20 minutes prior to reaching Help Scout. Upon investigation, we found our incoming email queues to be working normally.

However, we continued to receive a small number of inquiries about email delays. It seemed to get worse between 4:30-5:30pm EST and we ultimately discovered a problem with one of our Mongo databases.

The database in question serves as a backup and is not supposed to impact customer-facing systems. But when messages started to fail while being added to the database, it didn't work as designed.

The database problem was resolved and incoming email was back to normal processing by 6:10pm EST, but we believe a small number of emails that silently failed yesterday between 10am-6pm did not make it into Help Scout.

This morning, we are going to re-queue all messages that came in during this window. Duplicates will be discarded, so you won't receive anything twice, but any emails that were not delivered the first time will be processed within a few minutes. Any emails that failed the first time and are processed this morning could be delayed for up to 24 hours.

There's no way for us to know how many emails failed yesterday, but we believe it to be a small number considering the volume of emails that were processed successfully yesterday.

How we'll prevent this in the future

Our systems didn't respond how they were designed in this case. A failure with this database should never have an impact on customers. We've come up with some good ideas on how to resolve this moving forward and will have the fix in production in one week or less so that this never happens again.

We're very sorry for how this went down. Our systems are designed to deliver your emails at all costs, and although there was no data lost, the delivery time is unacceptable in this case. We're going to fix it.

Posted Sep 10, 2015 - 11:31 EDT

Resolved
Inbound processing is back to normal. Let us know if you run into any additional problems!
Posted Sep 09, 2015 - 19:08 EDT
Monitoring
Inbound processing is back to normal, but older delayed emails may still take up to an hour to funnel in. We'll keep monitoring the situation closely.
Posted Sep 09, 2015 - 18:12 EDT
Investigating
We're investigating reports of inbound email being delayed by 10-20 minutes. The incoming queues are clear, so we're still looking for a root cause.
Posted Sep 09, 2015 - 17:55 EDT