Yesterday between 10am -12pm EST, we received a few reports of emails being delayed for 10-20 minutes prior to reaching Help Scout. Upon investigation, we found our incoming email queues to be working normally.
However, we continued to receive a small number of inquiries about email delays. It seemed to get worse between 4:30-5:30pm EST and we ultimately discovered a problem with one of our Mongo databases.
The database in question serves as a backup and is not supposed to impact customer-facing systems. But when messages started to fail while being added to the database, it didn't work as designed.
The database problem was resolved and incoming email was back to normal processing by 6:10pm EST, but we believe a small number of emails that silently failed yesterday between 10am-6pm did not make it into Help Scout.
This morning, we are going to re-queue all messages that came in during this window. Duplicates will be discarded, so you won't receive anything twice, but any emails that were not delivered the first time will be processed within a few minutes. Any emails that failed the first time and are processed this morning could be delayed for up to 24 hours.
There's no way for us to know how many emails failed yesterday, but we believe it to be a small number considering the volume of emails that were processed successfully yesterday.
Our systems didn't respond how they were designed in this case. A failure with this database should never have an impact on customers. We've come up with some good ideas on how to resolve this moving forward and will have the fix in production in one week or less so that this never happens again.
We're very sorry for how this went down. Our systems are designed to deliver your emails at all costs, and although there was no data lost, the delivery time is unacceptable in this case. We're going to fix it.