Sign up Login

Power outage affecting worldwide services

Follow

UPDATE: Our hosting partner has now restored access to remaining systems on our platform and we are performing audits of those systems so we can advise on service availability. There are positive signs that activity is beginning to normalise and that backlogged messages and other communications are being processed. We will continue to audit our services through today before we are fully satisfied that our platform has no remaining issues, but our initial investigations show that the vast majority of services are working. Due to the backlog, you may experience some delays or performance issues as we stabilise demand and we will take all available actions to optimise delivery. We sincerely apologise to those clients affected by these major issues. Our hosting partner suffered a very major failure in a manner quite unprecedented in the context of modern data centre design and we are concerned about the failure of their redundant power and connectivity systems. We will be reviewing our relations in light of this and will press ahead with our major MessageCloud innovations, which will extend geographical redundancy and provide us with greater flexibility to route around larger-scale outages. Although we also acknowledge that outages on this scale are rare, the availability of our platform remains important to us and we will learn from this event. Thank you for your patience during this challenging time.

UPDATE: 
Our hosting partner is dealing with an unprecedented level of downtime across a number of server racks, which they are working to resolve as soon as possible. Our platform is partially restored but a number of key systems remain affected. We currently cannot advise further on timelines due to the scale of the issues with our partner infrastructure, which prevents more specific commitments to us. We continue to monitor the situation closely and our teams are on standby. We apologise for these issues, the scale of which is extremely rare, but have affected us and a number of other businesses across Europe. We will review our hosting provision in light of these issues.

UPDATE:Although access has been restored to a number of key systems, we are still awaiting full resolution from our hosting partner. They are entirely focussed on identifying remaining issues with data centres that affect several remaining systems on our platform. We have been advised that a further couple of hours will be needed to explore solutions. The provider has confirmed that there were two separate issues, centring on an interconnect bug between their data centres and a severe power failure on-site that required intervention from their power company. The effect of these issues has caused serious availability issues for us and for other businesses in those data centres. Our teams are on-hand to monitor and restore services as soon as there is any material change in the status of these issues. We apologise sincerely for these unusual problems and are committed to reviewing our ongoing provisions in light of these failures.

 

Date of Incident: Thursday 09/11/2017
Country: Global
Summary:Due to a severe power outage on our hosting partner's infrastructure, we are currently experiencing problems across our worldwide platform.

From 7am UK time, our monitoring systems notified us of connection problems with our core traffic managers. We then discovered wider outages across our platform.

Our partner subsequently updated us with information regarding the power outages, which affected a number of their key systems including interconnects between their data centres. By 9am UK time, we established that much of our platform remained powered on, but could not connect out to the Internet, which affects the availability of our billing and messaging services.

Since 9am, our partner has successfully worked with their energy company to supply power to the majority of their infrastructure, but some remain out of action. We are currently advised that a further 1 to 2 hours will be required to restore remaining services. We can then fully evaluate the impact on our platform and ensure delivery of services.

This is a severe and rare issue over which we have little control, and we will review our procedures in light of the situation, working as required with our hosting partner to evaluate the reasons for this issue and how we can extend our redundancy and failover provision. It is not yet clear why our partner's backup power facilities did not operate as required and in line with their ISO certification or SLA.

We apologise sincerely to all clients affected by the current issue and we will keep you updated.

Affected service(s):

Mobile Billing:
[ x ] WEB / WAP Billing / PayForIt
[ x ] Inbound SMS (MO)
[ x ] Outbound Premium SMS (MT)
[ x ] Direct Operator Billing
[ x ] Voice Services / IVR

Mobile Messaging:
[ x ] Inbound Longcode / Non-Premium SMS (MO)
[ x ] Outbound Non-Premium SMS (Bulk MT)
[ x ] Number Lookup / HLR
[ x ] Control Panel
[   ] Other 

Impact:
[    ] Total loss of service
[ x ] Partial loss of service for a period of time.
[ x  ] Reduced availability with temporary connection losses
[ x ] Delayed response
[ x ] Other

Was this article helpful?
0 out of 0 found this helpful

Comments

  • Avatar
    Robert txtNation

    UPDATE:Although access has been restored to a number of key systems, we are still awaiting full resolution from our hosting partner. They are entirely focussed on identifying remaining issues with data centres that affect several remaining systems on our platform. We have been advised that a further couple of hours will be needed to explore solutions. The provider has confirmed that there were two separate issues, centring on an interconnect bug between their data centres and a severe power failure on-site that required intervention from their power company. The effect of these issues has caused serious availability issues for us and for other businesses in those data centres. Our teams are on-hand to monitor and restore services as soon as there is any material change in the status of these issues. We apologise sincerely for these unusual problems and are committed to reviewing our ongoing provisions in light of these failures.

  • Avatar
    Robert txtNation

    UPDATE:
    Our hosting partner is dealing with an unprecedented level of downtime across a number of server racks, which they are working to resolve as soon as possible. Our platform is partially restored but a number of key systems remain affected. We currently cannot advise further on timelines due to the scale of the issues with our partner infrastructure, which prevents more specific commitments to us. We continue to monitor the situation closely and our teams are on standby. We apologise for these issues, the scale of which is extremely rare, but have affected us and a number of other businesses across Europe. We will review our hosting provision in light of these issues.

  • Avatar
    Robert txtNation

    UPDATE: Our hosting partner has now restored access to remaining systems on our platform and we are performing audits of those systems so we can advise on service availability. There are positive signs that activity is beginning to normalise and that backlogged messages and other communications are being processed. We will continue to audit our services through today before we are fully satisfied that our platform has no remaining issues, but our initial investigations show that the vast majority of services are working. Due to the backlog, you may experience some delays or performance issues as we stabilise demand and we will take all available actions to optimise delivery. We sincerely apologise to those clients affected by these major issues. Our hosting partner suffered a very major failure in a manner quite unprecedented in the context of modern data centre design and we are concerned about the failure of their redundant power and connectivity systems. We will be reviewing our relations in light of this and will press ahead with our major MessageCloud innovations, which will extend geographical redundancy and provide us with greater flexibility to route around larger-scale outages. Although we also acknowledge that outages on this scale are rare, the availability of our platform remains important to us and we will learn from this event. Thank you for your patience during this challenging time.

Powered by Zendesk