All Systems Operational
Desk Operational
90 days ago
99.83 % uptime
Today
Take Blip Platform Operational
90 days ago
99.91 % uptime
Today
CRM ? Operational
90 days ago
99.86 % uptime
Today
Core Operational
90 days ago
99.81 % uptime
Today
Analytics Operational
90 days ago
100.0 % uptime
Today
Artificial Intelligence Operational
90 days ago
99.99 % uptime
Today
Portal Operational
90 days ago
99.81 % uptime
Today
Cloud Infrastructure Operational
90 days ago
100.0 % uptime
Today
Channels Operational
90 days ago
100.0 % uptime
Today
WhatsApp ? Operational
90 days ago
100.0 % uptime
Today
Telegram Operational
90 days ago
100.0 % uptime
Today
Messenger Operational
90 days ago
100.0 % uptime
Today
BlipChat Operational
90 days ago
100.0 % uptime
Today
Workplace Chat Operational
BusinessChat Operational
Skype Operational
E-mail Operational
Hosting Enterprise Operational
90 days ago
99.83 % uptime
Today
Bot Builder Operational
90 days ago
99.83 % uptime
Today
Bot Router Operational
90 days ago
99.83 % uptime
Today
Hosting Business Operational
90 days ago
99.83 % uptime
Today
Bot Builder Operational
90 days ago
99.83 % uptime
Today
Bot Router Operational
90 days ago
99.83 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
Scheduled Maintenance
WAF deployment 29 September Sep 29, 21:00 - Sep 30, 00:00 -03
Update - We will be undergoing scheduled maintenance during this time.
Sep 29, 21:00 GMT-03:00
Scheduled - We would like to inform that Take Blip will implement WAF as a security system for products and services offered.

All companies that rely on the Take Blip service should perform a configuration check prior to this, at the risk of having the service unavailable.

Whats is WAF?

WAF stands for Web Application Firewall. It is a system that operates as a barrier to prevent cyber attacks, bringing security to those who use it. The WAF automatically monitors, filters and blocks the traffic that could be potentially malicious.


Who will be affected?

Some companies use IP address-based access control (Internet Protocol), an identifier associated with some resource, by example, a website such as take.net. Others use access control based on FQDN (name qualified).
However, if your company uses access control based on IP address for Take Blip utilities, there will be an impact, as the implementation of the WAF security feature will change the IP address of certain Take Blip products and services.

Impact:

For companies that fall into the category impacted, if the insertion of IP addresses is not held, there may be, for example, the loss of connectivity and unavailability with the products and services of Take Blip.

Questions:

We recently sent out an email release containing a playbook with all the information about the change. We also inform you that for this process, we created a dedicated team to support the customers. If in doubt, leave a contact comite.waf@blip.ai, if you need to make a request support exclusively on the maintenance.

Best regards!
Sep 29, 14:05 GMT-03:00
Past Incidents
Sep 21, 2021

No incidents reported today.

Sep 20, 2021

No incidents reported.

Sep 19, 2021

No incidents reported.

Sep 18, 2021

No incidents reported.

Sep 17, 2021

No incidents reported.

Sep 16, 2021
Resolved - This incident has been resolved.
Sep 16, 15:03 GMT-03:00
Investigating - Fault identified:

During our monitoring we identified that there was a sudden increase in the number of requests in the database that stores message contexts trafficked in all bots.

This environment underwent an update in a cleaning routine, it had a processing bottleneck that increased the number of requests, causing slow sending of messages in Bots.

Please validate, and if you identify any failure scenario, please give us a feedback.

Impact:

Customers noticed bots slow and failed when exchanging messages.

Solution:

The cleaning routine that was running was suspended. I emphasize that our team acted quickly to circumvent the scenario and consumption was fully normalized. We continue to monitor the system.

Note: The information is now being placed on our Status Page. As identification was quick as well as resolution, the update on the page will be available soon.

Yours sincerely,

Start date/time: 16/09/2021 2:45 PM
End date/time: 16/09/2021 3:03 PM
Sep 16, 14:45 GMT-03:00
Sep 15, 2021
Postmortem - Read details
Sep 21, 20:14 GMT-03:00
Resolved - Fault identified:

During our monitoring, we identified a failure in the distribution of tickets in the Desk.

Impact for customers:

The main impact here is on the operation of the customer service team that uses the Desk tool since tickets are not distributed in queues.

Solution:

Some interventions were carried out in the services responsible for distributing tickets so that the scenario could normalize. We also advise agents to reconnect to the tool so that they can receive tickets again.
We continue to monitor our environment and apologize once again.

If you identify any fault, please let us know.

More information about the root cause will be made available.

Start date / time: 14:20
End date / time: 15:20
Sep 15, 17:08 GMT-03:00
Update - Updating Status: We have identified a failure in distributing tickets in the Desk tool.

Reflection for the customer: Failure to distribute tickets within the Desk tool with impact on service.

Point identified: It was verified that the failure is due to an error that is occurring in the resource that is responsible for part of the ticket distribution in the service queues.

Action in progress: The technical team is still analyzing the problem to apply a fix.
Sep 15, 14:46 GMT-03:00
Investigating - Failure in the distribution of tickets in the service tool was identified.
Sep 15, 14:40 GMT-03:00
Postmortem - Read details
Sep 15, 18:49 GMT-03:00
Resolved - Update status: A failure has been identified in one of our clusters, we have seen an increase in HTTP and TCP connections causing an overload that may have caused our platform to be unavailable.

Impact to customers: Failed to load Bots in Portal Blip, failed to load Desk tool. We observed that the interaction is taking place successfully in the smart contact, but it is not being possible to open tickets in the Desk service.

Points identified: Failure in the authentication part, which makes it impossible to load the Blip Portal and Blip Desk.

Actions taken: The focus of our infrastructure team was on actions to contain downtime. Our platform environment is now restored.

NOTE: Messages that were sent at the time of the event may have the wrong date and time. The team is already evaluating whether it has a way to readjust the messages transmitted in the period.

Root cause: We don't have this information yet but the technical team continues to focus on investigating what was the cause of the failure in the period.

Start date/time: 15/09/2021 10:40 am
End date/time: 09/15/2021 12:30 pm
Sep 15, 13:00 GMT-03:00
Update - Updating status: A failure has been identified in one of our clusters, however, we still don't have a technical opinion on which main offender is causing the failure.

Impact to customers: Failure to load Bots in Blip Portal, failed to load the Desk tool. We observed that interaction is taking place successfully in the smart contact, but it is not being possible to open tickets in the Desk service.

Points identified: Failure in the authentication part, which is making it impossible to load the Blip and Blip Desk Portal.

Actions in progress: Despite the efforts of the engineering team in the investigation, we have not been able to restore our platform's environment yet.
Sep 15, 11:35 GMT-03:00
Identified - We are suffering a degradation in the performance of the BLiP platform, our technical team is already working on the case.
Sep 15, 10:48 GMT-03:00
Sep 14, 2021

No incidents reported.

Sep 13, 2021
Postmortem - Read details
Sep 21, 14:34 GMT-03:00
Resolved - Status Update:

Fault identified:
Unfortunately, even after the emergency actions that were carried out this last Saturday (11/09/2021), we again recorded impacts on customers as a result of disconnection from our Cache service.

Palliative correction:
Some interventions were carried out in the services, so that the scenario was normalized and we are following the environment.

Start date/time: 11:12 AM
End date/time: 11:40 AM

Actions in progress:
We keep our engineering team in the crisis room already raising new corrective actions.
Sep 13, 12:20 GMT-03:00
Identified - We are suffering a degradation in the performance of the BLiP platform, our technical team is already working on the case.
Sep 13, 11:22 GMT-03:00
Sep 12, 2021

No incidents reported.

Sep 11, 2021
Completed - The scheduled maintenance has been completed.
Sep 11, 23:00 GMT-03:00
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Sep 11, 22:00 GMT-03:00
Scheduled - Our engineering team will carry out this evening 09/11/2021 from 10PM on an emergency maintenance in order to contain the impacts we had during this week.

What will be the change: Improved performance of the CRM application (which is responsible for providing contact data during message exchanges), with the isolation of the cache service.

Impacts during Maintenance Window: Applications will stop responding and as a reflection to customers, smart contact message exchanges will be interrupted during this period.

NOTE: Data from message counters and users (Statistics) may be delayed in updating the Portal, due to the downtime during maintenance.
Sep 11, 10:28 GMT-03:00
Postmortem - Read details
Sep 21, 14:29 GMT-03:00
Resolved - Fault identified:

During monitoring we identified a high volume of connections in our cache service causing failures in our storage service.

Impact:

Slow message exchange in smart contacts

Solution:

It was necessary to carry out interventions in the services responsible for caching messages and after the actions the environment was normalized. An action plan was also created to implement improvements to prevent the scenario from happening again and we continue to monitor the environment.

Start time: 11:14 am
End time: 3:16 pm
Sep 11, 16:30 GMT-03:00
Identified - The issue has been identified and a fix is being implemented.
Sep 11, 13:42 GMT-03:00
Update - We are continuing to investigate this issue.
Sep 11, 11:52 GMT-03:00
Investigating - We are suffering a degradation in the performance of the BLiP platform, our technical team is already working on the case.
Sep 11, 11:50 GMT-03:00
Postmortem - Read details
Sep 21, 14:41 GMT-03:00
Resolved - Fault identified:
Through our monitoring, it was identified that our Cache service lost connection, as this is where the cache is used to store the information of contacts that talk to the bots, we had total unavailability in message traffic in the period.

Workaround: As a workaround, the counter generation service that was generating the connection leak on our cache server was restarted. After the palliative action, the scenario did not reoccur.

Start date/time: 5:30 am
End date / time: 7:00 am

Ongoing actions: Our technical team continues to work on a definitive fix for the scenario.

Root cause: Cache server connection leak.
Sep 11, 07:37 GMT-03:00
Investigating - We are suffering a degradation in the performance of the BLiP platform, our technical team is already working on the case.
Sep 11, 06:50 GMT-03:00
Sep 10, 2021
Resolved - This incident has been resolved.
Sep 10, 23:08 GMT-03:00
Update - Fault identified:

During analysis we identified a high volume of connections in our cache service.

Impact:

Slow message exchange in smart contacts

Solution:

It was necessary to carry out interventions in the responsible services and after the service was normalized.

Start time: 9:10 pm
End time: 21:29
Sep 10, 22:05 GMT-03:00
Monitoring - A fix has been implemented and we are monitoring the results.
Sep 10, 21:39 GMT-03:00
Update - Impact:

Smart contacts may not respond or show slowdowns.
Sep 10, 21:28 GMT-03:00
Investigating - We are suffering a degradation in the performance of the BLiP platform, our technical team is already working on the case.
Sep 10, 21:12 GMT-03:00
Resolved - Fault identified:
Through our monitoring, it was identified that our Cache service lost connection, as this is where the cache is used to store the information of the contacts that talk to the bots, we had total unavailability in message traffic in the period.

Palliative Correction: As a workaround, the counter generation service that was generating the connection leak in our cache server was restarted. After the palliative action, the scenario did not reoccur.

Start date/time: 00:10 AM
End date/time: 02:10 AM

Actions in progress: Our technical team continues to work on a definitive fix for the scenario.

Root cause: leak connection with caching server.
Sep 10, 02:10 GMT-03:00
Investigating - We are suffering a degradation in the performance of the BLiP platform, our technical team is already working on the case.
Sep 10, 00:10 GMT-03:00
Sep 9, 2021
Postmortem - Read details
Sep 21, 16:02 GMT-03:00
Resolved - Fault identified:

Through our monitoring, alarms were identified on the platform's servers. After analysis, we identified that there was a loss of connection to the services of some servers, generating the scenario of slowness.

Impact:

Customers noticed slowness in the exchange of messages from smart contacts and in the use of the platform.

Solution:

To normalize the environment and solve the failures, our team acted and carried out the necessary interventions and after acting all services were normalized.

More information will be made available via postmortem.

Start date / time: 5:30 pm
End date / time: 08:08 pm
Sep 9, 20:34 GMT-03:00
Monitoring - A fix has been implemented and we are monitoring the results.
Sep 9, 20:25 GMT-03:00
Update - During analysis our platform team identified loss of connection on some servers affecting platform performance. We are still working to solve the problem permanently.
Sep 9, 19:41 GMT-03:00
Update - Impact:

Some smart contatcs may experience slow message exchanges.
Sep 9, 18:38 GMT-03:00
Investigating - We are suffering a degradation in the performance of the BLiP platform, our technical team is already working on the case.
Sep 9, 18:10 GMT-03:00
Postmortem - Read details
Sep 10, 18:33 GMT-03:00
Resolved - Status Update:

Fault identified:

Through our monitoring, it was identified that our Cache service lost connection, as this is where the cache is used to store the information of the contacts that talk to the bots, we had total unavailability in message traffic in the period.

Palliative Correction: As a workaround, the counter generation service that was generating the connection leak in our cache server was restarted. After the palliative action, the smart contacts returned to respond.

Start date/time: 11:25 PM
End date/time:11:52 PM

Actions in progress: Uploading a fix in production to the service that is failing.

Root cause: leak connection with caching server.
Sep 9, 00:14 GMT-03:00
Identified - The issue has been identified and a fix is being implemented.
Sep 8, 23:42 GMT-03:00
Update - Identified:

It was identified by the monitoring team that service cache application lost the connection of all nodes, as it is in it that the cache is used to store the information of the contacts that talk to the bots, causing total unavailability in the message traffic.

Impact:

Smart contact stopped responding.

Solution:

Restarted the storage service and updated the default cache server validation time.

Start date/time: 08/09/2021 08:20 PM
End date/time: 08/09/2021 10:36 PM
Sep 8, 23:16 GMT-03:00
Monitoring - Our team identified a flaw in our platform caching service.
Service intervention was performed and after intervention, the bots responded again.
More details about the failure will be informed in our postmortem
Sep 8, 22:40 GMT-03:00
Update - Our platform team continues to investigate the scenario.
Sep 8, 21:33 GMT-03:00
Update - Impact:

Bots may not respond to user interactions. All channels are being affected
Sep 8, 20:51 GMT-03:00
Investigating - We are suffering a degradation in the performance of the BLiP platform, our technical team is already working on the case.
Sep 8, 20:37 GMT-03:00
Sep 8, 2021
Sep 7, 2021

No incidents reported.