Database Degradation
Incident Report for Blip
Resolved
This incident has been resolved.
Posted Sep 21, 2020 - 18:33 GMT-03:00
Update
After failover, the database is operational.

We continue to monitor the environment.

The root cause has not yet been identified and will be reported in Postmortem.
Posted Sep 21, 2020 - 16:34 GMT-03:00
Monitoring
The environment is normalized, we are monitoring.
Posted Sep 21, 2020 - 15:55 GMT-03:00
Update
Problem: Initially we noticed a slow search for contacts, but after investigating together with our infrastructure provider, we noticed an increase in requests in the database causing delay in the processing of messages, commands, notifications, etc.

Action plan: We registered a top priority ticket with our infrastructure supplier and we are working directly with the technical team, analyzing actions to mitigate the failure and identify the root cause.

Actions in progress:

1) The ScaleUp of the database was started around 10:45, but according to our infrastructure provider, due to the volume of data that database and also the impact during this process, the forecast for completion is many hours. Because of this, the action will be aborted.

2) The technical team is working together with our infrastructure provider to failover the database, in order to mitigate instabilities. The production bank, which is compromised, will no longer be used pointing to a copy of it.

Forecast: Unfortunately we still don't have a return from microsoft with an estimated deadline.

Date / time the problem started: 2020-09-21 at 09h20m
End date / time: No forecast
Posted Sep 21, 2020 - 13:40 GMT-03:00
Update
We are continuing to work on a fix for this issue.
Posted Sep 21, 2020 - 12:40 GMT-03:00
Update
We are in contact with an engineer from our infrastructure provider to verify the incident.

Impact verified due to slow message processing.
Posted Sep 21, 2020 - 11:53 GMT-03:00
Update
We identified that the failure appears to be related to the database infrastructure.

A ticket has been registered with our cloud provider for verification.
Posted Sep 21, 2020 - 11:21 GMT-03:00
Update
We are carrying out the Scale Up of the database in order to mitigate the degradation. The forecast for completion is a few hours.

The root cause of the degradation remains under analysis by the Take team.
Posted Sep 21, 2020 - 11:12 GMT-03:00
Identified
We are experiencing a degradation in the CRM application, CRM is responsible for all user management, so all smart contact components and usability that are tied to it will be degraded or unavailable at the moment.

Reflection for the client: The messages are not being delivered.
Posted Sep 21, 2020 - 09:20 GMT-03:00
This incident affected: Channels (WhatsApp, Messenger, BlipChat), Blip Platform (CRM, Core, Analytics), and Desk.