All Systems Operational
Desk Operational
90 days ago
99.67 % uptime
Today
BLiP Platform Operational
90 days ago
99.75 % uptime
Today
CRM ? Operational
90 days ago
99.83 % uptime
Today
Core Operational
90 days ago
99.58 % uptime
Today
Analytics Operational
90 days ago
99.84 % uptime
Today
Artificial Intelligence Operational
90 days ago
99.84 % uptime
Today
Portal Operational
90 days ago
99.55 % uptime
Today
Cloud Infrastructure Operational
90 days ago
99.84 % uptime
Today
Channels Operational
90 days ago
99.94 % uptime
Today
WhatsApp ? Operational
90 days ago
99.79 % uptime
Today
Telegram Operational
90 days ago
100.0 % uptime
Today
Messenger Operational
90 days ago
99.97 % uptime
Today
BlipChat Operational
90 days ago
100.0 % uptime
Today
Workplace Chat Operational
BusinessChat Operational
Skype Operational
E-mail Operational
Hosting Enterprise Operational
90 days ago
99.96 % uptime
Today
Bot Builder Operational
90 days ago
99.94 % uptime
Today
Bot Router Operational
90 days ago
99.98 % uptime
Today
Hosting Business Operational
90 days ago
99.97 % uptime
Today
Bot Builder Operational
90 days ago
99.95 % uptime
Today
Bot Router Operational
90 days ago
100.0 % uptime
Today
Hosting Standard Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
had a major outage
had a partial outage
Past Incidents
Jun 6, 2020

No incidents reported today.

Jun 5, 2020
Resolved - All platform features are currently operational
Jun 5, 22:08 GMT-03:00
Update - We are continuing to monitor for any further issues.
Jun 5, 21:28 GMT-03:00
Monitoring - Our environment is 100% restored after the actions taken by the technical team. We continue monitoring.

Start date / time: 05/06/2020 - 09:00 AM and End date / time: 03:00 PM

Customers can experience:
Error messages on Portal Desk;
Access to Portal Desk failed;
Message exchange fail in WHatsApp;

Actions implemented:
Scaling / redistribution of the machines;
Jun 5, 15:47 GMT-03:00
Update - The environment was restored after the implementation of the actions mentioned above.
Jun 5, 15:42 GMT-03:00
Update - Problem: From the analysis, problems of network degradation and failures of physical infrastructure elements were identified.

Customers can experience:
Error messages on Portal Desk;
Access to Portal Desk failed;
Message exchange fail in WHatsApp;

Palliative actions: The service team continues to carry out interventions in the environment to keep the platform operational.

Actions in progress:
Working in the machine scaling / redistribution part;
In parallel, another front working on application-oriented analysis;
Jun 5, 12:53 GMT-03:00
Update - We are continuing to work on a fix for this issue.
Jun 5, 11:34 GMT-03:00
Update - We are taking some palliative actions in an attempt to keep the platform operational, while the technical team is focused on investigations and actions to resolve definitively.
Jun 5, 11:26 GMT-03:00
Update - We have identified instability in the portal and Desk. Our team continues to investigate the issue.

Os clientes podem experimentar:

Mensagens de erro no Desk;
Portal, falha no acesso ao Desk
Falha na troca de mensagens no WHatsApp
Jun 5, 10:39 GMT-03:00
Update - We have identified instability in the portal and Desk. Our team continues to investigate the issue.
Jun 5, 10:35 GMT-03:00
Update - We are continuing to work on a fix for this issue.
Jun 5, 10:33 GMT-03:00
Identified - The issue has been identified and a fix is being implemented.
Jun 5, 10:32 GMT-03:00
Update - The environment is stabilized, with some corrective actions taken by the technical team.
We have a multidisciplinary team focused on analysis for identification and actions for definitive correction.
Jun 4, 15:05 GMT-03:00
Monitoring - A fix has been implemented and we are monitoring the results.
Jun 4, 12:21 GMT-03:00
Update - We had an instability that caused a momentary slow to access the desk portal. An intervention was performed on one of the servers and the accesses, after a peak of connections, normalized and are currently stable. Our team remains mobilized to investigate the problem.
Jun 4, 11:38 GMT-03:00
Update - We have identified instability, some users may experience slowness to access portal and Desk. Our team continues to investigate the issue.
Jun 4, 11:13 GMT-03:00
Identified - We have identified instability in the portal and Desk. Our team continues to investigate the issue.
Jun 4, 11:04 GMT-03:00
Update - The environment is stabilized, with some corrective actions taken by the technical team.
We have a multidisciplinary team focused on analysis for identification and actions for definitive correction.
Jun 3, 16:24 GMT-03:00
Update - The environment is stabilized, with some corrective actions taken by the technical team.
We have a multidisciplinary team focused on analysis for identification and actions for definitive correction.
Jun 3, 16:08 GMT-03:00
Monitoring - A fix has been implemented and we are monitoring the results.
Jun 3, 15:56 GMT-03:00
Update - Some users may have lost their connection. Need to update the page and try to reconnect
Jun 3, 14:59 GMT-03:00
Update - We are managing to maintain the operational environment, while the technical team is working on definitive solution actions.
Jun 3, 14:43 GMT-03:00
Update - We are continuing to work on a fix for this issue.
Jun 3, 13:55 GMT-03:00
Update - Customers may still experience some effects:

Error messages on the Desk;
Portal, Desk access failure
failed message exchange on WHatsApp
Jun 3, 13:14 GMT-03:00
Update - We are working on a hotfix to try to resolve the issue.
Jun 3, 13:01 GMT-03:00
Identified - We continue to work to resolve the problem as soon as possible.

We are working with palliative interventions to minimize impacts.
Jun 3, 12:25 GMT-03:00
Monitoring - A fix has been implemented and we are monitoring the results.
Jun 3, 11:42 GMT-03:00
Update - We carried out some palliative measures in order to restore the platform acess. The technical team continues to investigate based on the evidence collected so that a definitive solution can be implemented.

Period:

Start date / time: 10:17 AM
End date / time: of palliative solution: 11:15 AM

Impact for customers:

Error messages on the Desk;
Portal, Desk access failure
failed message exchange on WHatsApp
Jun 3, 11:39 GMT-03:00
Identified - The issue has been identified and a fix is being implemented.
Jun 3, 11:32 GMT-03:00
Investigating - We are currently investigating this issue.
Jun 3, 10:17 GMT-03:00
Jun 4, 2020
Jun 3, 2020
Resolved - This incident has been resolved.
Jun 3, 10:55 GMT-03:00
Update - The service was restored at 04:21 PM

NOTE: For customers who are having difficulty accessing BLiP, Desk and BLiP Chat, please follow the guidance below:
Guideline: Clear the DNS cache of the network.
Jun 2, 16:43 GMT-03:00
Monitoring - A fix has been implemented and we are monitoring the results.
Jun 2, 16:21 GMT-03:00
Identified - This issue has been identified and a fix is being implemented.
Jun 2, 16:13 GMT-03:00
Investigating - We identified instability in BLiP. The problem is being verified and we will update the status as soon as there is news.
Jun 2, 15:42 GMT-03:00
Jun 2, 2020
Jun 1, 2020

No incidents reported.

May 31, 2020

No incidents reported.

May 30, 2020

No incidents reported.

May 29, 2020
Resolved - We monitor the environment and it is normalized.

We are collecting the information and will register with Postmortem.
May 29, 17:34 GMT-03:00
Monitoring - At the moment the scenario is 100% restored. We continue to monitor the platform closely. The technical team continues to investigate what caused the problem.
May 29, 13:59 GMT-03:00
Identified - At the moment we have a more stabilized scenario, with few failure alerts. We continue to work to stabilize the environment and apply the definitive solution.
May 29, 13:00 GMT-03:00
Update - We continue with all allocated efforts working to resolve the issue.

Impacts perceived by the client:
We are facing a slow fluctuation in service (via the Desk) which is being perceived as degradation in several cases by the service. It was also observed that the access to the BLiP and Desk portals was slow, reflecting the failure of access to the client.

Some symptoms that can be noticed:

Failed to pull tickets;
Attendants going offline;
Monitoring screen does not display information;
Blip desk showing slowness;
slowness and failure to access the BLiP and Desk Portal;
May 29, 11:46 GMT-03:00
Update - We continue with all efforts allocated to solve the problem, with all areas involved. In the analyzes carried out so far, a high load on our cloud network infrastructure was identified and we are working on bypass actions to mitigate the impacts due to degradation of the cloud network.

Impacts perceived by the client:
We are facing a slow fluctuation in service (via the Desk) which is being perceived as degradation in several cases by the service.

Some symptoms that can be noticed:

Failed to pull tickets.
Attendants going offline.
Monitoring screen does not display information.
Blip desk showing slowness.
May 29, 10:46 GMT-03:00
Update - We continue to work with the highest priority to restore the environment. Despite efforts, we have not yet identified what is causing the failure.
May 29, 09:41 GMT-03:00
Update - We are continuing to investigate this issue.
May 29, 09:41 GMT-03:00
Investigating - We identified degradation in the BLiP with several impacts, we are investigating.
May 29, 08:51 GMT-03:00
May 28, 2020

No incidents reported.

May 27, 2020

No incidents reported.

May 26, 2020

No incidents reported.

May 25, 2020
Postmortem - Read details
May 28, 16:00 GMT-03:00
Resolved - Em análise inicial na parte da manhã foi levantada a suspeita de falha no roteamento das mensagens. Foram realizadas diversas ações para tentar mitigar o problema sem sucesso:

1 - Desativação do cache de roteamento de mensagens do BLiP;
2 - Rollback da atualização de um determinado componente que havia entrado em produção no dia 19/05.;
3 - Limpeza das chaves do banco de dados não relacional do BLiP;
4 - Criação de um novo banco de dados não relacional no mesmo provedor de Nuvem;

Esta hipótese foi descartada por volta das 15h após terem sido identificados alguns erros de Failover, na provedor de nuvem no banco de dados não relacional utilizado pelo BLiP.
Diante disto foi realizado rollback das ações realizadas durante o dia para que o ambiente voltasse ao estado inicial de hoje de manhã, com exceção no novo banco de dados criado, que foi mantido em produção. O ambiente estabilizou por volta 16h40min, porém tivemos novo cenário de instabilidade às 17:15h.

Após identificado este problema no outro banco não relacional de contingência, abrimos chamado com severidade máxima junto ao provedor, o qual nos foi informado sobre uma aplicação de patch de segurança o qual gerou impacto diretamente em nossa aplicação. Após confirmado este efeito colateral iniciamos a execução do plano de continuidade de negócio, chaveando o tráfego para outro serviço de banco não relacional. Toda atuação do desde então da equipe técnica foi na parte de chaveamento do tráfego. Finalizando às 20h40min.
May 25, 21:49 GMT-03:00
Monitoring - Correction actions completed at 08:40 PM
We continue monitoring
May 25, 21:00 GMT-03:00
Update - The action remains in progress, we have already reached 70% of the corrections applied. These are missing 30% with an estimated time of around 01 hour to be completed.
May 25, 20:09 GMT-03:00
Update - Failover errors were identified in the non-relational database allocated on the cloud server used by BLiP.
We registered a call with our provider for analysis and we were informed that an update to a security patch occurred, impacting BLiP.
After this return, we started executing a contingency plan, changing traffic to another non-relational database service.
This action is in progress.

Expected to finish: 07:40 PM
May 25, 19:27 GMT-03:00
Identified - We identified that the problem reoccurred. We continue the analysis and actions and will update the status as soon as we have news.
May 25, 17:28 GMT-03:00
Monitoring - After completion of the rollback and new configuration applied, the environment is normalized. We are monitoring the environment.
May 25, 16:17 GMT-03:00
Update - We've made some interventions that have improved the situation (but didn't solve all issues yet). Since many changes were made to the environment at the same time throughout the day, we are now trying to isolate those that fixed problems and undo those that had no impact.
May 25, 15:27 GMT-03:00
Update - Although it has been identified what is causing the failure, all the corrective actions applied to date have not been sufficient to restore the environment 100%. We continue to work on the case.
May 25, 13:55 GMT-03:00
Update - We are taking action to address the case.
May 25, 12:44 GMT-03:00
Identified - The team is working on the case, some interventions in the environment have already been carried out, however, they did not have the expected effect.
May 25, 11:50 GMT-03:00
Update - The team is working on the case, some interventions in the environment have already been carried out, however, they did not have the expected effect.
May 25, 11:43 GMT-03:00
Update - Every technical team remains focused on identifying what is causing the failure.
May 25, 09:35 GMT-03:00
Investigating - Dear, we are having a degradation in our BLiP platform. It is already under investigation.
May 25, 08:18 GMT-03:00
May 24, 2020

No incidents reported.

May 23, 2020

No incidents reported.