When dealing with complex systems, anything can happen, which means that no matter how well you've tested something, problems like system failures are inevitable and are often caused by events beyond your control. It's how you're responding to the crisis that matters. So if you find yourself experiencing an outage, you know that all your efforts need to be directed towards identifying the issue. Thus the question remains: what would you do to make sure things go as smoothly as possible?
Things to do in the event of an ERP system downtime
Have a generic response plan: If you go into a system outage with a plan in place, you're going to feel a lot more prepared for when things go wrong. During the shutdown, you and your team should refer to the protocol and take the appropriate measures to ensure that you do whatever you can to get back to work. Outages are often out of our control, but having a plan to get there is a great way to minimize the pain.
Assign someone to take control when an outage occurs: Ideally, this person would be an IT expert highly qualified to run to your system coming around with his own team or division. If multiple people are trying to come to the rescue, it's counter-productive and wastes time.
Store backup data on an external server or a disconnected system: To mitigate harm and recover quickly from device failure, you should always have your data backed up on an external server.
Consolidate collaboration software to enhance communication effectively: You should prepare for a potential outage before it occurs by making the right collaboration software and information readily available. It would help if you learned how to quickly get in touch with stakeholders and respondents for the best results. If you run around for this information after the outage strikes, it will cause chaos, stress, and potentially lost revenue.
Notify clients: If any client is directly affected by a system outage, it is essential to let them know as soon as possible. Handle the problem, make sure you know that you'll be working to remedy the issue as quickly as possible, and sincerely apologize for any inconvenience.
Identify the issue quickly: More likely than otherwise, you and your team will be able to identify the problem immediately or in a short time. It's still important to give this your No. 1 priority when you encounter an outage so you can solve it right away. Ensure that you have a good knowledge base and a medical manual on hand to diagnose the problem.
Have your team drill the recovery plan regularly: The key to a rapid recovery from the outages is to have a plan in place that is regularly tested. Don't make a plan and then put it on a shelf, it's going to be out of date. Have a disaster recovery plan, and test it at least twice a year to ensure the plan is working. Planned solutions for losses such as power, phones/internet, physical building, key people, and access to computers.
Record and document everything: When a system shutdown occurs, you must document everything. Documenting everything will allow you to easily define the problem and give you red flags to prevent this from happening in the future—document what you did before the incident, what happened during and after the incident. You can use a sheet of paper or a computer account so that you have a history of everything.
Use the continuity system: Having a sound continuity system that manages potential network outages will allow you to protect yourself and get the right people contacted as quickly as possible. This way, it doesn't matter when the shutdown finally happens as the continuity program will hit them right away.
Get as many workers as possible under the outage manager: In the event of an outage, anyone who is capable will stop what they are doing and concentrate on recovery. You're likely to want a well-trained, knowledgeable team that can handle these kinds of situations. If only a few people work on a fix, you may find that the troubleshooting and repair procedure takes a lot longer.
Prioritize recovery in the immediate: It'll get hectic, stressful, and confusing during a system outage. Stakeholders will make a phone call asking why the system is down, even while you're still in troubleshooting. It's essential to give priority to your tasks to restore the system. Ask the most pressing issues, then concentrate your time and energy on them instead of stretching thin, trying to put out all the fires.
Gauge the severity of the outage: Understand the importance of seriousness so that you know how many workers should be allocated to repair the outage. Then, focus on data recovery as quickly as possible so that as little of it as possible is lost. Make sure the appointed team gets to the source of the issue (and not just resolve it) so that you can put measures in place so that it doesn't happen again.
Work towards eliminating variables: One of the best practices for troubleshooting and recovery from system failures as quickly as possible is to gather data and start eliminating as many variables as possible to detect or troubleshoot the failure or error. While this is going on, be sure to let your team or customers know if the outage has caused delays or damage. The sooner you respond, the better people feel about the resolution.
Outages are going to happen, it's the rule of Murphy. The important thing is to plan and acknowledge it. Contact your server administrator and your development team and convey the urgency of the matter. Most hosting companies retain backups for at least one day, so you have recourse. It's important not to freak out and make things worse.