Yesterday around 5:00pm EST, our hosting provider had a brief network interruption between availability zones. Since Help Scout is clustered across availability zones, this wasn't a big problem, but did result in a few minutes of slow page loads for the help desk and Docs products.
The more concerning issue is that the network interruption caused partitioning of our email processing systems. To ensure integrity of the data, we had to take this system offline, between 5:30-8:00pm EST. During this time, inbound and outbound messages were held and not processed.
At roughly 8pm EST, we brought email processing back online and pushed all pending messages into Help Scout.
Today we're looking at all of the backups to make sure all messages received in this time period were properly processed. If you know of any conversations that failed to make it into Help Scout, please send over some specifics and we'll investigate further.
Our Ops team will immediately be adding more redundancy to our email processing infrastructure and altering our clustering strategy to mitigate fallout from network partitions in the future. We'll take all steps necessary to prevent this issue from happening again.