When setting up an Oracle Database Appliance (ODA) the configuration of the network with odacli configure-firstnet does check, if the default gateway is pingable. If it is not, the configuration is not possible.
But one of our customers (or better the network administrators) later configured the environment (router), that pings are not responded, which means, they time out (this is part of the security strategy of this company). The network does work, but the ODA can not ping the default gateway anymore (no one can).
As we tried to patch an ODA from ODA Release 19.21 to 19.24 we therefore run into an ugly error while the Grid Infrastructure was upgraded from 19.21 to 19.24, leaving us with two unzipped home directories:
"DCS-10001:Internal error encountered: Failed to set ping target on public network."
If you take a look into the dcs-agent.log, you can find a more detailled error message:
DEBUG [Server patching : JobId=<something>] [] c.o.d.a.r.s.g.GiOperations: ping failed: unknown gateway <IP-Address of Gateway>
The creation of the prepatch report for the server was fine ([root@oda1 opt]# /opt/oracle/dcs/bin/odacli create-prepatchreport -s -v 19.24.0.0.0). It finished successful without error, so we didn't expect that we run into an error later on.
But while running the server patch ([root@oda1 opt]# /opt/oracle/dcs/bin/odacli update-server -v 19.24.0.0.0) we got the above error.
We asked the network team to fix the ping problem, tested the ping manually and restarted the odacli update-server -v 19.24.0.0.0 a second time. Sometimes, when an ODA update stucks, it is fine to just restart the update-server again, but this time we got another error:
“DCS-10001:Internal error encountered: Failed to patch GI with RHP : DCS-10001:Internal error encountered: PRGO-1664 : The specified source working copy \"OraGrid192100\" is a software-only working copy...”
Thanks god we did an "odabr backup -snap" before we started the update and by running "odabr restore -snap -force" we were able to restore the "before" image of the operating system stuff. If you start odabr restore without the force option, you will get a warning "Clusterware patchlevel at backup time is not what you have now. ODABR can't be considered a tool to perform Grid Infrastructure downgrade" and the snap will not be restored.
Unfortunately, the grid infrastructure is not part of the odabr backup with newer ODA software releases, which means, we still had two Grid Infrastructure homes and the problem with the working copy as a software only working copy.
Searching in support.oracle.com does not really help, searching for PRGO-1664 does not result in useful hits. But there is a small section in the note for the odabr tool (2466177.1). "if a GI update was in progress and you have done a restore", it may happen that the executable permissions are set wrong.
To fix this, set the GRID_HOME to the old grid environment as user root and run [root@oda1]# GRID_HOME/crs/install/rootcrs.pl -lock
This will change the executable permissions - in our case in the 19.21 grid directory back and after this change patching including the odacli update-server -v 19.24.0.0.0 was running fine.
Depending on the time you did the odabr backup, you need to start with patching the dcs stack or running the prepatch-report creation again.
For the next ODA software versions, there is an enhancement request placed for the development, so the gateway ping check is then part of the prepatch report and will not lead to a corrupted GI update.
In the meantime it could be helpful to test manually, if the default gateway is pingable. To find out the gateways ip address you can run a "ip route show" command. It outputs a line "default via <IP Address>". This ip address is your gateway - a "ping <IP Address>" will (hopefully) work. If so, you will not run into this problem, we got. We will add the ping test to our manual check list, it is a 10 second work and it will save an hour work later on.