More fun with patching! So this time I’m doing a fresh virtualized install and I decided to take my own sage advice of installing 18.104.22.168.0 first to get the firmware patches. I ran into a bunch of other issues which will be the topic of a different post but I digress. I got 22.214.171.124.0 fully installed, ODA_BASE deployed, everything was happy.
Remember that starting with version 126.96.36.199.0, you have to patch each node separately with the –local option for the infra patches. So I started the patch on node 0 and it got almost all the way to the end at step 12 where oakd is being patched. I ran into the “known issue” in 888888.1 item 9:
9. During the infra patching, after step 12 completed, IPMI, HMP done, if it appeared to be hang during Patching OAK with the following two lines
INIT: Sending processes the TERM signal
INIT: no more processes left in this runlevel
JDK is not patched, the infra patching is not complete to the end.
Workaround: To reboot the appeared hang node manually, then run
# oakcli update -patch 188.8.131.52 –clean
# oakcli update -patch 184.108.40.206.0 –infra –local
To let it complete the infra patch cleanly.
I waited about 30 minutes at this step before I started to wonder, and sure enough after checking some log files in /opt/oracle/oak/onecmd/tmp/ it thought oakd was fully patched. What I found is that oakd gets whacked because the patch doesn’t fully complete. After doing the reboot that’s recommended in the workaround above, sure enough oakd is not running. What’s more- now when I boot ODA_BASE the console doesn’t get to the login prompt and you can’t do anything even though you can ssh in just fine. So I ran the –clean option then kicked off the patch again. This time it complained that oakd wasn’t running on the remote node. It was in fact running on node1 but node0 oakd was not. I suspect that when the ODA communicates to oakd between nodes it’s using the local oakd to do so.
So I manually restarted oakd by running /etc/init.d/init.oak restart and then oakd was running. I rebooted ODA_BASE on node0 just to be sure everything was clean then kicked off the infra patch again. This time it went all the way through and finished. The problem now is that the ODA_BASE console is non responsive no matter what I do so I’ll be opening a case with Oracle support to get a WTF. I’ll update this post with their answer/solution. If I were a betting man I’d say they’ll tell me to update to 220.127.116.11.0 to fix it. We’ll see…
As an aside- one of the things that 18.104.22.168.0 does is do an in-place upgrade of Oracle Linux 5.11 to version 6.7 for ODA_BASE. I’ve never done a successful update that way and in fact, Red Hat doesn’t support it. I guess I can see why they would want to do an update rather than a fresh install but it still feels very risky to me.