Testing Our Disaster Recovery Plan
Mar 22, 2021 Marine Learning Systems 0 Case Study, Company Blog, LMS, Managed ServicesDisaster Recovery Plan
Proactive technology companies prepare an IT Disaster Recovery Plan (DRP) for worst-case scenarios. When an IT disaster strikes, so begins the unforgiving, and often harsh, test of an organization’s DRP policies. The ongoing planning, testing, training and execution of these policies are put into practice.
On Wednesday, March 10th, 2021, Marine Learning System’s (MLS) Disaster Recovery Plan was, indeed, tested.
The Situation
Just after midnight, a fire broke out at a major OVH data center in Strasbourg, France. OVH is Europe’s largest cloud provider. Firefighters were immediately on the scene and luckily no one was injured. However, the damage was catastrophic. One data center was destroyed entirely. A second one was damaged. And the remaining centers were shut down. This impacted services across the world. Roughly 3.5 million websites and applications were brought offline.
The Strasbourg datacenter is used by MLS to serve some of our European customers. The moment the server went offline, MLS’s global server monitoring infrastructure (which tracks all production servers at two-minute intervals from over 20 locations) alerted the 24/7 emergency operations team at MLS.
In the early moments of an emergency, the scarcest resource is often verifiable information. Yet decisions must be made despite the lack of details. In this case, the emergency team was only able to determine that a fire was in progress. However, given that a fire can easily result in prolonged outages, the team decided to initiate our Disaster Recovery Plan. This involved failing over to a ready-to-go, “hot” backup server.
MLS maintains one hot backup server for every production server, at a different geographical location, for exactly this scenario. MLS hot backup severs are always online and fully configured. Each server is installed with the current LMS version and has a copy of the customer database and learning content. They are refreshed and tested once every 24 hours. Additionally, database backups of the live system are taken every 15 minutes and automatically stored offsite. This backup architecture is designed such that in a catastrophic failure, data loss is minimized and services can be restored as quickly as possible.
The service restoration protocol was put into effect. The steps required pose some complexity that is made more complicated by the fact that MLS services consist of a distributed architecture. This means ship-board severs communicate and synchronize to a central instance, which was now offline due to the fire. Due to the distributed architecture, the restoration protocol is driven by a highly structured, customized checklist. This ensures that all the steps are followed correctly in the heat of an event.
In the end, the hot backup failover was completed in record time. MLS restored online services for affected European customers within approximately 4 hours. This time included all restoration activities, as well as an extensive assessment of the restored system to ensure the fundamentals were working. Meanwhile, many of the other 3.5 million sites remained offline one week after the incident. MLS also initiated a more comprehensive quality assurance analysis, which would occur over the following days.
From Our Customer’s Perspective
For the affected European customers, the DRP process proceeded through the early hours of the morning.
Our customers awoke to a fully operational system and business as usual. The only indication of the disaster was a string of emails received throughout the night from the MLS operations team. Each email contained the latest update of the situation, as well as the progress on the activities being undertaken to restore service. Capping it all off was a personal email from the CEO of MLS which summarized the entire event and listed all the actions the MLS operations team had taken.
No involvement whatsoever was needed from the customer’s IT department, their business owners, nor their learning and training administrators. As is the case with our regular LMS services, MLS is “service-first”. We take the work of LMS technology and learning administration in-house and off the desks of our customers.
Follow this Blog!
Receive email notifications whenever a new maritime training article is posted. Enter your email address below:
Interested in Marine Learning Systems?
Contact us here to learn how you can upgrade your training delivery and management process to achieve superior safety and crew performance.