Degraded performance on the BLiP
Incident Report for Blip
Postmortem

Hi blipper! On 15/09/2021, Wednesday, we faced an unavailability on Blip that affected the operation of many Smart Contacts.

To be transparent with you, Blip user, we are writing to tell you what happened.

What happened

An automatic unscheduled date change for 2022 has been identified on one our Take Blip server, causing an interruption in the exchange of messages from our smart contacts.

How this issue impact you

Because of this failure, our Blip Portal functionalities faced problems to access and also Blip Desk faced problems to open and distribute tickets.

In the name of Take Blip, I want to say sorry for any problems caused to you, your company and your customers.

What we do to solve it

As soon as the problem was identified at 10:50 am - 3UTC, our team acted quickly to prevent our community from being more affected.‌ We took actions to isolate the divergent server and made interventions in other servers, equalizing the date information and thus, normalizing the operation of our platform around 12:30 pm - 3UTC.

Later, around 2:20 pm - 3UTC, as a repercussion of the scenario reported above, we observed an impact on the tickets’ distribution. This happened because the agents' statuses were still dated 2022. Once again, our team acted immediately by intervening in the services responsible for distributing tickets and correcting agents' status actions with the date 2022, reestablishing the service flow normalizing the operation around 15:20h pm -3UTC.

Where we are now

The Blip Portal and Blip Desk is working again, and our technical team is following up action to map the root cause. In addition, we have internal actions to prevent events like that again as:

  • Investigation to identify how the date was automatically changed on that server;
  • Adjustments in databases that were registered with the year 2022 date information

You can check this history and all other Blip features status on our Status Page.

We also want to say thank you for your patience and remind you that we are always here to help you in any need. Just open a request on our Support or create a new topic on Blip Forum, the exclusive space to the whole users' community.

Sincerely,

Posted Sep 15, 2021 - 18:49 GMT-03:00

Resolved
Update status: A failure has been identified in one of our clusters, we have seen an increase in HTTP and TCP connections causing an overload that may have caused our platform to be unavailable.

Impact to customers: Failed to load Bots in Portal Blip, failed to load Desk tool. We observed that the interaction is taking place successfully in the smart contact, but it is not being possible to open tickets in the Desk service.

Points identified: Failure in the authentication part, which makes it impossible to load the Blip Portal and Blip Desk.

Actions taken: The focus of our infrastructure team was on actions to contain downtime. Our platform environment is now restored.

NOTE: Messages that were sent at the time of the event may have the wrong date and time. The team is already evaluating whether it has a way to readjust the messages transmitted in the period.

Root cause: We don't have this information yet but the technical team continues to focus on investigating what was the cause of the failure in the period.

Start date/time: 15/09/2021 10:40 am
End date/time: 09/15/2021 12:30 pm
Posted Sep 15, 2021 - 13:00 GMT-03:00
Update
Updating status: A failure has been identified in one of our clusters, however, we still don't have a technical opinion on which main offender is causing the failure.

Impact to customers: Failure to load Bots in Blip Portal, failed to load the Desk tool. We observed that interaction is taking place successfully in the smart contact, but it is not being possible to open tickets in the Desk service.

Points identified: Failure in the authentication part, which is making it impossible to load the Blip and Blip Desk Portal.

Actions in progress: Despite the efforts of the engineering team in the investigation, we have not been able to restore our platform's environment yet.
Posted Sep 15, 2021 - 11:35 GMT-03:00
Identified
We are suffering a degradation in the performance of the BLiP platform, our technical team is already working on the case.
Posted Sep 15, 2021 - 10:48 GMT-03:00
This incident affected: Blip Platform (CRM, Core, Analytics, Artificial Intelligence, Portal, Cloud Infrastructure).