CRM [Degradation]
Incident Report for Blip
Resolved
This incident has been resolved.
Posted Feb 01, 2023 - 17:10 GMT-03:00
Monitoring
Status: The environment is back to normal and the team continues to monitor.

Impact: High CPU usage on the database responsible for all user management.

Reflection for the customer: All components of the smart contact and usability have degraded or are currently unavailable.

What generated the failure: Identified that the consumers of the application responsible for updating the Data that are on the monitoring page of the Blip portal.

Mitigation action:

Three actions were carried out:

-Switched off the consumers of the application that was causing the high consumption in the database responsible for all user management;

- Scale Up of the Database in order to support the load until the Rollback action is completed;

- Return of the change executed yesterday (01/31/2023 at 10:00 pm) was carried out in the application that presented the failure.

NOTE: For customers who consume data from Analytics applications, they may still experience a delay, since we had a queue during the period in which we needed to stop consumers.

Start time: 11:10 AM
End time: 12:45 PM
Posted Feb 01, 2023 - 15:21 GMT-03:00
Update
- Scale has been finished
- Rollback has been completed now we are returning consumers to what it was before the crisis

Next update: in 30 min or when a relevant new fact is presented;
Posted Feb 01, 2023 - 14:34 GMT-03:00
Identified
- We continue to monitor the finalization of the scale of resources;
- Preparation for the rollback done, let's start in a moment.
Regarding the impact on analytics pages, the impact should last until 8 pm today, as consumers have been stopped and requests are queued.

Next update: in 30 min or when a relevant new fact is presented;
Posted Feb 01, 2023 - 14:06 GMT-03:00
Update
- We continue to monitor the finalization of the scale of resources
- Let's start the rollback of a change made the day before, then we'll go up the services

Next update: in 15 minutes or when a relevant new fact is presented;
Posted Feb 01, 2023 - 13:50 GMT-03:00
Update
- We are still waiting for the finalization of the scale of the resources
- Services are still stopped so that the impact does not occur again, we are checking what actions are necessary to return services
Some customers have already signaled normalization after the actions, but we are still investigating.

Next update: in 15 minutes or when a relevant new fact is presented;
Posted Feb 01, 2023 - 13:28 GMT-03:00
Update
- We are in the process of scaling some features to temporarily mitigate the issue while we continue to analyze the issue.
- In addition, some services are being stopped momentarily to reduce the impact. With the actions reported above, the impact is already being mitigated and an improvement in the platform scenario was observed. The only point to note is that, at the moment, the platform's analytics pages will not show updated data.

Next update: in 15 min or when a relevant new fact is presented;
Posted Feb 01, 2023 - 13:12 GMT-03:00
Investigating
We are experiencing a partial degradation in our CRM application.

Impact:

CRM is responsible for all user management, so all smart contact components and usability that are tied to it will be degraded or unavailable at the moment.

Update:

Our technical team is already working on the case.
Posted Feb 01, 2023 - 11:23 GMT-03:00
This incident affected: Blip Platform (CRM, Analytics) and Desk.