This past weekend our server team patched my Windows server that runs the DPA Database and since their patch and subsequent reboot, the DPA Datastore Service won't start. The Event Log contains an event from postgreSQL that says it timed out waiting for the server to start.
I try to start the service manually and it immediately stops and throws an error into the Event Log that says it can't start because there's a lock file in place. So I used this article:https://emcservice.force.com/CustomersPartners/kA2j0000000R4ksCAC
It gives these four steps to resolve the issue which I've tried to no avail:
Has anyone else encountered this, and if so, how did you fix it?
I have a SR open with Support about this, just trying to see if anyone else might have a quick solution.
Hopefully, Support wil get back to you quickly. I can only think that they can find a quick fix for you.
If your server is a VM, and you're feeling adventurous, you might try the following:
1) take a snapshot of your VM
2) use the installation executable you used to install the version of DPA you have.
3) run the installer and see if it will do a "repair" installation
If it succeeds, you're back in business. If not, revert your VM to the snapshot and await support from EMC.
Let us know if that helps!
It is best to have Technical Support investigate the current state of the Datastore database and see why it will not start and hopefully find a way to resolve it. There could be any number of reasons. Examining the Datastore logs would be a good start here.
Based on what you mention in your description, it sounds like your Server team likely patched the server and then rebooted it without considering the running applications on that server. Performing maintenance on any active database server in this manner is not recommended and very risky. It is important to understand that you have a database with active transactions occurring, which are utilizing the file system and memory. When the server is simply rebooted (shutdown and restart), the database will be told to immediately shut down by the OS and likely the OS won't wait for it to be a "clean" shut down. The problem is that if there are active transactions that were waiting or needed time to finish, they may be cancelled, interrupted, and/or lost. This can lead to database corruption.
Considering the Postgres database lock file was still present on the file system, it sounds like this was not a clean shut down of Postgres, as that lock file is automatically removed on a clean shut down. Since it was not a clean shut down, there could be other issues upon the next restart of Postgres. Postgres will automatically attempt to recover from non-clean shut downs if it can, but this is something that should not be relied upon to work every time. If enough corruption has occurred, this can cause automatic recovery to fail and could make any sort of recovery impossible.
Any time maintenance is going to be performed on the DPA Application and Datastore servers, the proper steps before that maintenance or at a minimum before rebooting are to:
- Shut down the DPA Application service (wait for a complete normal shut down)
- Shut down the DPA Datastore service (wait for a complete normal shut down)
- Reboot the Datastore server first (wait for it to finish restarting and for Posgres to be running normally again)
- Then reboot the DPA Application server. (the DPA Application needs the DPA Datastore database running and available so it can start up normally)
Also, should mention that a DPA Datastore export (command is: dpa ds export <path>) should be being taken regularly, as this is the only supported method of backing up the DPA Datastore database. An export can be used if necessary to recover a database that has been corrupted. The export is essentially a snapshot of the entire database taken at that point in time of the export. It can be used to restore the database back to that point in time. Regularly scheduled exports are highly recommended in any production environment.
Side Note: A backup application backing up the DPA Datastore server file system is not adequate protection and the resulting backup may not restore the database correctly. Moreover a backup application could cause corruption on a active running database. This is not a recommended method to protect the DPA Datastore database.
EMC Technical Support
Thank you for the detailed reply, Daniel. I'm still waiting for Support to analyze the log files they requested. If it ends up being the case that this was all caused by an ungraceful shutdown/reboot, I'll use the steps you've provided to give our server team a proper shutdown procedure for future patching.
It is the dpa ds export command which failed, alerting me that something wasn't right. Otherwise, DPA seems to be functioning properly.