Degraded performance on the BLiP
Incident Report for Take Blip
Postmortem

Hi blipper! On 11/09/2021, Saturday, we faced an unavailability on Blip that affected the operation of many Smart Contacts.

To be transparent with you, Blip user, we are writing to tell you what happened.

What happened

An increased number of database connections in our cache service was identified, which is used to store contact information, causing an interruption in the exchange of messages from our smart contacts.

How this issue impact you

Because of this failure, the Blip CRM, our customer base management functionality, faced problems in executing message exchanges.

In the name of Take Blip, I want to say sorry for any problems caused to you, your company and your customers.

What we do to solve it

As soon as we identified the problem at 05:30 am -3UTC, our team put together and acted quickly to start the treatments. The correction was immediately applied and the service was normalized at 07: am -3UTC.

Where we are now

The Blip CRM is working again, and our technical team is following up with the cloud provider to identify the main cause of the failure.  In addition, we have internal actions to prevent events like that again.

We also want to say thank you for your patience and remind you that we are always here to help you in any need. Just open a request on our Support or create a new topic on Blip Forum, the exclusive space to the whole users' community.

Sincerely,

Posted Sep 21, 2021 - 14:41 GMT-03:00

Resolved
Fault identified:
Through our monitoring, it was identified that our Cache service lost connection, as this is where the cache is used to store the information of contacts that talk to the bots, we had total unavailability in message traffic in the period.

Workaround: As a workaround, the counter generation service that was generating the connection leak on our cache server was restarted. After the palliative action, the scenario did not reoccur.

Start date/time: 5:30 am
End date / time: 7:00 am

Ongoing actions: Our technical team continues to work on a definitive fix for the scenario.

Root cause: Cache server connection leak.
Posted Sep 11, 2021 - 07:37 GMT-03:00
Investigating
We are suffering a degradation in the performance of the BLiP platform, our technical team is already working on the case.
Posted Sep 11, 2021 - 06:50 GMT-03:00
This incident affected: Desk, Hosting Enterprise (Bot Builder, Bot Router), Take Blip Platform (CRM, Core, Analytics, Artificial Intelligence, Portal, Cloud Infrastructure), and Hosting Business (Bot Builder, Bot Router).