Virtualized ODA X6-2HA – working with VMs

It’s been awhile since I built a virtualized ODA with VMs on a shared repo so I thought I’d go through the basic steps.

  1. install the OS
    1. install Virtual ISO image
    2. configure networking
    3. install ODA_BASE patch
    4. deploy ODA_BASE
    5. configure networking in ODA_BASE
    6. deploy ODA_BASE with configurator
  2. create shared repository.  This is where your specific situation plays out.  Depending on your hardware you may have less or more space in DATA or RECO.  Your DBA will be able to tell you how much they need for each and where you can borrow a few terabytes (or however much you need) for your VMs
  3. (optionally) create a separate shared repository to store your templates.  This all depends on how many of the same kind of VM you’ll be deploying.  If it makes no sense to keep the templates around once you create your VMs then don’t bother with this step
  4. import template into repository
    1. download the assembly file from Oracle (it will unzip into an .ova archive file)
    2. ***CRITICAL*** copy the .ova to /OVS on either nodes’ DOM0, not into ODA_BASE
    3. import the assembly (point it to the file sitting in DOM0 /OVS)
  5. modify template config as needed (# of vCPUs, Memory, etc)
  6. clone the template to a VM
  7. add network to VM (usually net1 for first public network, net2 for second and net3+ for any VLANs you’ve created
  8. boot VM and start console (easiest way is to VNC into ODA_BASE and launch it from there)
  9. set up your hostname, networking, etc the way you want it
  10. reboot VM to ensure changes persist
  11. rinse and repeat as needed

If you need to configure HA, preferred node or any other things, this is the time to do it.


ODA Software – Closed for Business!

I’ve deployed a number of these appliances over the last couple years both virtualized and bare metal.  When people realize that Oracle Linux is running under the hood they sometimes think it’s ok to throw rpmforge up in there and have at it.  What’s worse is a customer actually tried to do a yum update on the OS itself from the Oracle public YUM repo!   Ack….


I guess I can see wanting to stay patched to the latest available kernel or version of tools, but it needs to be understood that this appliance is a closed ecosystem.  The beauty of patching the ODA is the fact that I don’t have to chase down all the firmware updates for HDD/SSD/NVM disks, ILOM, BIOS, etc…  That legwork has already been done for me.  Plus the fact that all the patches are tested as a unit together on each platform makes me able to sleep better at night.  Sure- the patches take about 4-5 hours all said and done, but when you’re done, you’re done!  I’m actually wondering if Oracle will eventually implement busybox or something like it for the command line interface to hide the OS layer from end users.  With their move to a web interface for provisioning of the ODA X6-2S/M/L it seems they’ve taken a step in that direction.


If you decide to add repositories to your ODA in order to install system utilities like sysstat and such, it’s generally ok, but I need to say this:  the Oracle hard line states that no additional software should be installed on the ODA at all.  In support of that statement, I will say that I’ve had problems patching when the Oracle public YUM repo is configured and I also ran into the expired RHN key error that started rearing its ugly head at the beginning of 2017.  Both of these are easily fixed, but why put yourself in that position in the first place?


Also, in closing I’d like to recommend to all my customers/readers that you make it a priority to patch your ODA at least once a year.  There are actual ramifications to being out of date that have bitten folks.  I can think of one case where the customers’ ODA hadn’t been updated in 3-4 years.  The customer experienced multiple Hard Drive failures within a week or two and because they had their ODA loaded to the kilt, the ASM rebuild was impacting performance dramatically.  The reason the drives failed so close to eachother and more importantly the way they failed was because of outdated disk firmware.  Newer firmware was available that changed the way disk failure was performed in that it was more sensitive to “blips” and failed out the disk instead of letting it continue to stay in service.  As a result, the disk was dying for awhile and causing degraded performance.  Another reason the disks probably failed early-ish is the amount of load they were placing on the system.  Anywho… just remember to patch ok?



Create VM in Oracle VM for x86 using NFS share

I’m using OVM Manager 3.4.2 and OVM Server 3.3.2 to test an upgrade for one of our customers.  I am using Starwind iSCSI server to present the shared storage to the cluster but in production you should use enterprise grade hardware to do this.  There’s an easier way to do this- create an HVM VM and install from an ISO stored in a repository.  Then power the VM off and change the type to PVM then power on.  This may not work with all operating systems however so I’m going over how to create a new PVM VM from an ISO image shared from an NFS server.

* Download ISO (I'm using Oracle Linux 6.5 64bit for this example)
* Copy ISO image to OVM Manager (any NFS server is fine)
* Mount ISO on the loopback device
# mount -o loop /var/tmp/V41362-01.iso /mnt

* Share the folder via NFS
# service nfs start
Starting NFS services: [ OK ]
Starting NFS quotas: [ OK ]
Starting NFS mountd: [ OK ]
Starting NFS daemon: [ OK ]
Starting RPC idmapd: [ OK ]

# exportfs *:/mnt/

# showmount -e
Export list for ovmm:
/mnt *

* Create new VM in OVM Manager
* Edit VM properties and configure as PVM
* Set additional properties such as memory, cpu and network
* At the boot order tab, enter the network boot path formatted like this:
  nfs:{ip address or FQDN of NFS host}:/{path to ISO image top level directory}

For example, our NFS server is and the path where I mounted the ISO is at /mnt.  Leave the {}'s off of course:


You should be able to boot your VM at this point and perform the install of the OS.

Putting the Oracle SPARC M7 Chip through its paces

From time to time I get an opportunity to dive under the hood of some pretty cool technologies in my line of work.  Being an Oracle Platinum Partner, Collier IT specializes in Oracle based hardware and software solutions.  On the hardware side we work with Exadata, Oracle Database Appliance and the Oracle ZFS Appliance just to name a few.  We have a pretty nice lab that includes our own Exatada and ODA, and just recently a T7-2.


download (1)Featuring the new SPARC M7 chip released in October of 2015 with Software in Silicon technology, the M7-x and T7-x server line represents a huge leap forward in Oracle Database performance.  The difference between the M7 and T7 servers is basically size and power.  The chip itself is called M7, not to be confused with the server model M7-x.  The T7-x servers also use the same M7 processor.  Hopefully that clears up any confusion on this going forward.  Here’s a link to a datasheet that outlines the server line in more detail.


In addition to faster on-chip encryption and real time data integrity checking, SQL query acceleration provides an extremely compelling use case for consolidation while maintaining a high level of performance and security with virtually no overhead.  The SPARC line of processors has come a very long way indeed since it’s infancy.  Released in late 1987, it was designed from the start to provide a highly scalable architecture around which to build a compute package that ranged from embedded processors all the way up to large server based CPU’s while utilizing the same core instruction set.  The name SPARC itself stands for Scalable Processor ARChitecture.  Based on the RISC (Reduced Instruction Set Computer) architecture, operations are designed to be as simple as possible.  This helps achieve nearly one instruction per CPU cycle which allows for greater speed and simplicity of hardware.  Furthermore this helps promote consolidation of other functions such as memory management or Floating Point operations on the same chip.


Some of what the M7 chip is doing has actually been done in principle for decades.  Applications such as Hardware Video Acceleration or Cryptographic Acceleration leverage instruction sets hard coded into the processor itself yielding incredible performance.  Think of it as a CPU that has only one job in life- to do one thing and do it very fast.  Modern CPUs such as the Intel x86 cpu have many many jobs to perform and they have to juggle all of them at once.  They are very powerful however because of the sheer number of jobs they are asked to perform, they don’t really excel at any one thing.  Call them a jack of all trades and master of none.  The concept of what a dedicated hardware accelerator is doing for Video playback for example, is what Oracle is doing with Database Instructions such as SQL in the M7 chip.  The M7 processor is still a general purpose CPU, however with the ability to perform in hardware database related instructions at machine level speeds with little to no overhead.  Because of this, the SPARC M7 is able to outperform all other general purpose processors that have to timeshare those types of instructions along with all the other workloads they’re being asked to perform.


sprinting-runnerA great analogy would be comparing an athlete who competes in a decathlon to a sprint runner.  The decathlete is very good at running fast, however he needs to be proficient in 9 other areas of competition.  Because of this, the decathlete cannot possibly be as good at running fast as the sprinter because the sprinter is focusing on doing just one thing and being the best at it.  In the same vein, the M7 chip also performs SQL instructions like a sprinter.  The same applies to encryption and real time data compression.


Having explained this concept, we can now get into practical application.  The most common use case will be for accelerating Oracle Database workloads.  I’ll spend some time digging into that in my next article.  Bear in mind that there are also other applications such as crypto acceleration and hardware data compression that are accelerated as well.


Over the past few weeks, we’ve been doing some benchmark comparisons between 3 very different Oracle Database hardware configurations.  The Exadata (x5), the Oracle Database Appliance (x5) and an Oracle T7-2 are the three platforms that were chosen.  There is a white paper that Collier IT is in the process of developing which I will be a part of.  Because the data is not yet fully analyzed, I can’t go into specifics on the results.  What I can say is that the T7-2 performed amazingly well from a price/performance perspective compared to the other two platforms.


Stay tuned for more details on a new test with the S7 and a Nimble CS-500 array as well as a more in depth look at how the onboard acceleration works including some practical examples.








OVM Server for x86 version 3.4.2 released!

downloadOracle has just released the latest version of Oracle VM for x86 and announced it at OpenWorld.  There are some really cool additions that enhance the stability and useability of Oracle VM.  Here are some of the new features:


Installation and Upgrades

Oracle VM Manager support for previous Oracle VM Server releases
As of Oracle VM Release 3.4.2, Oracle VM Manager supports current and previous Oracle VM Server releases. For more information, see Chapter 6, Oracle VM Manager Support for Previous Oracle VM Server releases.


Support for NVM Express (NVMe) devices
Oracle VM Server now discovers NVMe devices and presents them to Oracle VM Manager, where the NVMe device is available as a local disk that you can use to store virtual machine disks or create storage repositories.

The following rules apply to NVMe devices:

Oracle VM Server for x86
  • To use the entire NVMe device as a storage repository or for a single virtual machine physical disk, you should not partition the NVMe device.
  • To provision the NVMe device into multiple physical disks, you should partition it on the Oracle VM Server where the device is installed. If an NVMe device is partitioned then Oracle VM Manager displays each partition as a physical disk, not the entire device.

    You must partition the NVMe device outside of the Oracle VM environment. Oracle VM Manager does not provide any facility for partitioning NVMe devices.

  • NVMe devices can be discovered if no partitions exist on the device.
  • If Oracle VM Server is installed on an NVMe device, then Oracle VM Server does not discover any other partitions on that NVMe device.
Oracle VM Server for SPARC
  • Oracle VM Manager does not display individual partitions on an NVMe device but only a single device.

    Oracle recommends that you create a storage repository on the NVMe device if you are using Oracle VM Server for SPARC. You can then create as many virtual disks as required in the storage repository. However, if you plan to create logical storage volumes for virtual machine disks, you must manually create ZFS volumes on the NVMe device. See Creating ZFS Volumes on NVMe Devices in the Oracle VM Administration Guide.

Using Oracle Ksplice to update the dom0 kernel
Oracle Ksplice capabilities are now available that allow you to update the dom0 kernel for Oracle VM Server without requiring a reboot. Your systems remain up to date with their OS vulnerability patches and downtime is minimized. A Ksplice update takes effect immediately when it is applied. It is not an on-disk change that only takes effect after a subsequent reboot.


This does not impact the underlying Xen hypervisor.

Depending on your level of support, contact your Oracle support representative for assistance before using Oracle Ksplice to update the dom0 kernel for Oracle VM Server. For more information, see Oracle VM: Using Ksplice Uptrack Document ID 2115501.1, on My Oracle Support at:

Extended SCSI functionality available for virtual machines
Oracle VM now provides additional support for SCSI functionality to virtual machines:

  • Linux guests can now retrieve vital product data (VPD) page 0x84 information from physical disks if the device itself makes it available.
  • Microsoft Windows Server guests can use SCSI-3 persistent reservation to form a Microsoft Failover Cluster in an upcoming Oracle VM Paravirtual Drivers for Microsoft Windows release. See the Oracle VM Paravirtual Drivers for Microsoft Windowsdocumentation for information about the availability of failover cluster capabilities on specific Microsoft Operating System versions.
Dom0 kernel upgraded
The dom0 kernel for Oracle VM Server is updated to Oracle Unbreakable Enterprise Kernel Release 4 Quarterly Update 2 in this release.

Package additions and updates
  • The ovmport-1.0-1.el6.4.src.rpm package is added to the Oracle VM Server ISO to support Microsoft Clustering and enable communication between Dom0 and DomU processes using the libxenstore API.
  • The Perl package is updated to perl-5.10.1-141.el6_7.1.src.rpm.
  • The Netscape Portable Runtime (NSPR) package is updated to nspr-4.11.0-1.el6.x86_64.rpm.
  • The openSCAP package is updated to openscap-1.2.8-2.0.1.el6.rpm.
  • The Linux-firmware package is updated to linux-firmware-20160616-44.git43e96a1e.0.12.el6.src.rpm.

Performance and Scalability

Oracle VM Manager performance enhancements
This release enhances the performance of Oracle VM Manager by reducing the number of non-critical events that Oracle VM Server sends to Oracle VM Manager when a system goes down.


If you are running a large Oracle VM environment, it is recommended to increase the amount of memory allocated to the Oracle WebLogic Server. This ensures that adequate memory is available when required. See Increasing the Memory Allocated to Oracle WebLogic Server in the Oracle VM Administration Guide for more information.

Oracle VM Server for x86 performance optimization
For information on performance optimization goals and techniques for Oracle VM Server for x86, see Optimizing Oracle VM Server for x86 Performance, on Oracle Technology Network at:

Xen 4.4.4 performance and scalability updates
  • Improved memory allocation: Host system performance is improved by releasing memory more efficiently when tearing down domains, for example, migrating a virtual machine from one Oracle VM Server to another or deleting a virtual machine. This ensures that the host system can manage other guest systems more effectively without experiencing issues with performance.
  • Improved aggregate performance: Oracle VM Server now uses ticket locks for spinlocks, which improves aggregate performance on large scale machines with more than four sockets.
  • Improved performance for Windows and Solaris guests: Microsoft Windows and Oracle Solaris guests with the HVM or PVHVM domain type can now specify local APIC vectors to use as upcall notifications for specific vCPUs. As a result, the guests can more efficiently bind event channels to vCPUs.
  • Improved workload performance: Changes to the Linux scheduler ensure that workload performance is optimized in this release.
  • Improved grant locking: Xen-netback multi-queue improvements take advantage of the grant locking enhancements that are now available in Oracle VM Server Release 3.4.2.
  • Guest disk I/O performance improvements: Block scalability is improved through the implementation of the Xen block multi-queue layer.


Oracle VM Manager Rule for Live Migration
To prevent failure of live migration, and subsequent issues with the virtual machine environment, a rule has been added to Oracle VM Manager, as follows:

Oracle VM Manager does not allow you to perform a live migration of a virtual machine to or from any instance of Oracle VM Server with a Xen release earlier than xen-4.3.0-55.el6.22.18. This rule applies to any guest OS.

Table 3.1 Live Migration Paths between Oracle VM Server Releases using Oracle VM Manager Release 3.4.2


Where the live migration path depends on the Xen release, you should review the following details:

Xen Release (from) Xen Release (to) Live Migration Available?
xen-4.3.0-55.el6.x86_64 xen-4.3.0-55.el6.0.17.x86_64 No
xen-4.3.0-55.el6.22.18.x86_64 and newer xen-4.3.0-55 Yes

For example, as a result of this live migration rule, all virtual machines in an Oracle VM server pool running Oracle VM Server Release 3.3.2 with Xen version xen-4.3.0-55.el6.22.9.x86_64 must be stopped before migrating to Oracle VM Server Release 3.4.2.


Run the following command on Oracle VM Server to find the Xen version:

# rpm -qa | grep "xen"
PVHVM hot memory modification
As of this release, it is possible to modify the memory allocated to running PVHVM guests without a reboot. Additionally, Oracle VM Manager now allows you to set the allocated memory to a value that is different to the maximum memory available.

  • Hot memory modification is supported on x86-based PVHVM guests running on Linux OS and guests running on Oracle VM Server for SPARC. For x86-based PVHVM guests running on Oracle Solaris OS, you cannot change the memory if the virtual machine is running.
  • See the Oracle VM Paravirtual Drivers for Microsoft Windows documentation for information about the availability of hot memory modification on PVHVM guests that are running a Microsoft Windows OS. You must use a Windows PV Driver that supports hot memory modification or you must stop the guest before you modify the memory.
  • Oracle VM supports hot memory modification through Oracle VM Manager only. If you have manually created unsupported configurations, such as device passthrough, hot memory modification is not supported.


  • Oracle MySQL patch update: This release of Oracle VM includes the July 2016 Critical Patch Update for MySQL. (23087189)
  • Oracle WebLogic patch update: This release of Oracle VM includes the July 2016 Critical Patch Update for WebLogic. (23087185)
  • Oracle Java patch update: This release of Oracle VM includes the July 2016 Critical Patch Update for Java. (23087198).
  • Xen security advisories: The following Xen security advisories are included in this release:
    • XSA-154 (CVE-2016-2270)
    • XSA-170 (CVE-2016-2271)
    • XSA-172 (CVE-2016-3158 and CVE-2016-3159)
    • XSA-173 (CVE-2016-3960)
    • XSA-175 (CVE-2016-4962)
    • XSA-176 (CVE-2016-4480)
    • XSA-178 (CVE-2016-4963)
    • XSA-179 (CVE-2016-3710 and CVE-2016-3712)
    • XSA-180 (CVE-2014-3672)
    • XSA-182 (CVE-2016-6258)
    • XSA-185 (CVE-2016-7092)
    • XSA-187 (CVE-2016-7094)
    • XSA-188 (CVE-2016-7154)



ODA Patching – get ahead of yourself?

I was at a customer site deploying an X5-2 ODA.  They are standardizing on the patch level.  Even though is currently the latest, they don’t want to be on the bleeding edge.  Recall that the patch doesn’t include infrastructure patches (mostly firmware) so you have to install first, run the –infra patch to get the firmware and then update to


We unpacked the patch on both systems and then had an epiphany.  Why don’t we just unpack the patch as well and save some time later?  What could possibly go wrong?  Needless to say, when we went to install or even verify the patch it complained as follows:

ERROR: Patch version must be


Ok, so there has to be a way to clean that patch off the system so I can use right?  I stumbled across the oakcli manage cleanrepo command and thought for sure that would fix things up nicely.  Ran it and I got this output:


[root@CITX-5ODA-ODABASE-NODE0 tmp]# oakcli manage cleanrepo --ver
Deleting the following files...
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/OAK/
Deleting the files under /DOM0OAK/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/Seagate/ST95000N/SF04/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/Seagate/ST95001N/SA03/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/WDC/WD500BLHXSUN/5G08/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HGST/H101860SFSUN600G/A770/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/Seagate/ST360057SSUN600G/0B25/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HITACHI/H106060SDSUN600G/A4C0/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HITACHI/H109060SESUN600G/A720/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HITACHI/HUS1560SCSUN600G/A820/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HGST/HSCAC2DA6SUN200G/A29A/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HGST/HSCAC2DA4SUN400G/A29A/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/STEC/ZeusIOPs-es-G3/E12B/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/STEC/Z16IZF2EUSUN73G/9440/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Expander/ORACLE/DE2-24P/0018/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Expander/ORACLE/DE2-24C/0018/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Expander/ORACLE/DE3-24C/0291/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Controller/LSI-es-Logic/0x0072/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Controller/LSI-es-Logic/0x0072/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Ilom/SUN/X4370-es-M2/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HITACHI/H109090SESUN900G/A720/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/STEC/Z16IZF4EUSUN200G/944A/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HGST/H7240AS60SUN4.0T/A2D2/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HGST/H7240B520SUN4.0T/M554/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HGST/H7280A520SUN8.0T/P554/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Expander/SUN/T4-es-Storage/0342/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Controller/LSI-es-Logic/0x0072/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Controller/LSI-es-Logic/0x005d/4.230.40-3739/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Controller/LSI-es-Logic/0x0097/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Controller/Mellanox/0x1003/2.11.1280/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Ilom/SUN/X4170-es-M3/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Ilom/SUN/X4-2/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Ilom/SUN/X5-2/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/HMP/
Deleting the files under /DOM0HMP/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/IPMI/
Deleting the files under /DOM0IPMI/
Deleting the files under /JDK/1.7.0_91/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/ASR/5.3.1/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/OPATCH/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/OPATCH/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/OPATCH/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/GI/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/DB/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/DB/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/DB/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/DB/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/OEL/6.7/Patches/6.7.1
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/OVM/3.2.9/Patches/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/OVS/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Controller/LSI-es-Logic/0x0072/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Controller/LSI-es-Logic/0x0072/
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/GI/


So I assumed that this fixed the problem.  Nope…


[root@CITX-5ODA-ODABASE-NODE0 tmp]# oakcli update -patch --verify

ERROR: Patch version must be



Ok so more searching the CLI manual and the oakcli help pages came up with bupkiss.  So I decided to do an strace of the oakcli command I had just ran.  As ususal- there was a LOT of garbage I didn’t care about or didn’t know what it was doing.  I did find however that it was reading the contents of a file that looked interesting to me:


[pid 5509] stat("/opt/oracle/oak/pkgrepos/System/VERSION", {st_mode=S_IFREG|0777, st_size=19, ...}) = 0
[pid 5509] open("/opt/oracle/oak/pkgrepos/System/VERSION", O_RDONLY) = 3
[pid 5509] read(3, "version=\n", 8191) = 19
[pid 5509] read(3, "", 8191) = 0
[pid 5509] close(3) = 0
[pid 5509] fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
[pid 5509] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f159799d000
[pid 5509] write(1, "\n", 1
) = 1
[pid 5509] write(1, "ERROR: Patch version must be 12."..., 40ERROR: Patch version must be
) = 40
[pid 5509] exit_group(0) = ?


There were a dozen or so lines after that, but I had what I needed.  Apparently /opt/oracle/oak/pkgrepos/System/VERSION contains the current version of the latest patch that has been unpacked.  The system software version is kept somewhere else because after I unpacked the patch, I ran an oakcli show version and it reported  But the VERSION file referenced earlier said  I assume when I unpacked the patch, it updates this file.  So what I wound up doing is changing the VERSION file back to as well as deleting the folder /opt/oracle/oak/pkgrepos/System/  Once I did this, everything worked as I expected.  I was able to verify and install the –infra portion of and continue on my merry way.


This highlights the fact that there isn’t a known way (to me at least) to delete an unpacked patch via oakcli or any python scripts I’ve been able to find yet.  Also- as an aside I tried just deleting the VERSION file assuming it would be rebuilt by oakcli and it didn’t.  I got this:


[root@CITX-5ODA-ODABASE-NODE0 System]# oakcli update -patch --verify
ERROR : Couldn't find the VERSION file to extract the current allowed version


So I just recreated the file and all was good.  I was hoping that the oak software didn’t maintain some sort of binary formatted database that kept track of all this information- I think I got lucky in this case.  Hope this helps someone out in a pinch!

New Oracle ODA X6 configurations officially released today!

Today, Oracle has announced the release of two new ODA configurations squarely targeted at the S in SMB.  I blogged about this back on June 9th here.  A few differences to note:

  • Two new commands replace oakcli (oakcli is gone).
    • odacli – perform “lifecycle” activities for the ODA appliance (provisioning and configuring)
    • odaadmcli – administer and configure the running appliance attributes
  • All new web based user interface used to deploy appliance.  Command line obviously still available but not required anymore to deploy.
  • No more virtualization or shared storage on the Small and Medium configuration


I’m not sure if I’ll have a chance to lay hands on the new hardware any time soon but if I do I’ll definitely give first impressions here!

ODA X6-2 in the wild!


It looks like Oracle has deployed their newest server (the X6-2) into the ODA appliance lineup now.  It’s already an option on the ExaData, BDA and ZDLRA.  There are now 3 different configurations available, 2 of which don’t include shared storage and have a much lower price point.  You can also run Oracle Database SE2 or EE on the two smaller configurations however neither one offers the virtualization option that’s been around since the original V1 ODA.


Here are the 3 options:

Oracle Database Appliance X6-2S ($18k):
One E5-2630 v4 2.2GHz 10 core CPU
6.4 TB (2 x 3.2 TB) NVMe SSDs *
128 GB (4 x 32 GB) DDR4-2400 Main Memory **
Two 480 GB SATA SSDs (mirrored) for OS
Two onboard 10GBase-T Ethernet ports
Dual-port 10GbE SFP+ PCIe

* You can add up to 2 more NVMe SSD’s for a total of 4
** An optional memory expansion kit is available that brings this configuration up to 384GB


Oracle Database Appliance X6-2M ($24k):
Two E5-2630 v4 2.2GHz 10 core CPUs
6.4 TB (2 x 3.2 TB) NVMe SSDs *
256 GB (8 x 32 GB) DDR4-2400 Main Memory **
Two 480 GB SATA SSDs (mirrored) for OS
Four onboard 10GBase-T Ethernet ports
Dual-port 10GbE SFP+ PCIe

* You can add up to 2 more NVMe SSD’s for a total of 4
** An optional memory expansion kit is available that brings this configuration up to 768GB


Oracle Database Appliance X6-2HA (?):
TBD – information about this configuration isn’t available yet.  More info coming soon!

X5-2 ODA upgrade from to observations

Word on keyboard

More fun with patching!  So this time I’m doing a fresh virtualized install and I decided to take my own sage advice of installing first to get the firmware patches.  I ran into a bunch of other issues which will be the topic of a different post but I digress.  I got fully installed, ODA_BASE deployed, everything was happy.


Remember that starting with version, you have to patch each node separately with the –local option for the infra patches.  So I started the patch on node 0 and it got almost all the way to the end at step 12 where oakd is being patched.  I ran into the “known issue” in 888888.1 item 9:

9.  During the infra patching, after step 12 completed, IPMI, HMP done, if it appeared to be hang during Patching OAK with the following two lines
                               INIT: Sending processes the TERM signal
                               INIT: no more processes left in this runlevel
JDK is not patched, the infra patching is not complete to the end.  
Workaround:  To reboot the appeared hang node manually, then run 
# oakcli update -patch –clean

# oakcli update -patch –infra –local
To let it complete the infra patch cleanly.  

I waited about 30 minutes at this step before I started to wonder, and sure enough after checking some log files in /opt/oracle/oak/onecmd/tmp/ it thought oakd was fully patched.  What I found is that oakd gets whacked because the patch doesn’t fully complete.  After doing the reboot that’s recommended in the workaround above, sure enough oakd is not running.  What’s more- now when I boot ODA_BASE the console doesn’t get to the login prompt and you can’t do anything even though you can ssh in just fine.  So I ran the –clean option then kicked off the patch again.  This time it complained that oakd wasn’t running on the remote node.  It was in fact running on node1 but node0 oakd was not.  I suspect that when the ODA communicates to oakd between nodes it’s using the local oakd to do so.


So I manually restarted oakd by running /etc/init.d/init.oak restart and then oakd was running.  I rebooted ODA_BASE on node0 just to be sure everything was clean then kicked off the infra patch again.  This time it went all the way through and finished.  The problem now is that the ODA_BASE console is non responsive no matter what I do so I’ll be opening a case with Oracle support to get a WTF.  I’ll update this post with their answer/solution.  If I were a betting man I’d say they’ll tell me to update to to fix it.  We’ll see…


As an aside- one of the things that does is do an in-place upgrade of Oracle Linux 5.11 to version 6.7 for ODA_BASE.  I’ve never done a successful update that way and in fact, Red Hat doesn’t support it.  I guess I can see why they would want to do an update rather than a fresh install but it still feels very risky to me.

ODA Software v12. possible bug

I’ve been updating some X5-2 ODA’s for a customer of mine to version in preparation for deployment.  I came across a stubborn bug that proved to be a little tricky to solve.  I was having a problem with ODA_BASE not fully completing the boot cycle after initial deployment and as a result I couldn’t get into the ODA_BASE console to configure firstnet.


The customer has some strict firewall rules for the network that these ODA’s sit in so I also couldn’t connect to the VNC console on port 5900 as a result.  If you’re gonna implement on an X5-2 ODA, I’d recommend installing first then update to  I’ve not been able to determine for sure what the problem was- I originally thought it had something to do with firmware because doesn’t update any of the firmware due to a big ODA_BASE OS version update from 5.11 to 6.7.  Apparently the thought was that the update would either be too big or take too long to download/install so they skip firmware in this release.  Here is the readme for the update:


This Patch bundle consists of the Jan 2016 GI Infrastructure and RDBMS –,, and  The Grid Infrastructure release upgrade is included in this patch bundle.  The database patches,, and are included in this patch bundle. Depending on the current version of the system being patched, usually all other infrastructure components like Controller, ILOM, BIOS, and disk firmware etc will also be patched; due to this release focus on the major OS update from OL5 to OL6.7; all other infrastructure components will not be patches.  In a virtualized environment, usually all other infrastructure components on dom0 will also be patched; in this release, we skip them.  To avoid all other infrastructure components version too far behind, the minimum version required is for infra and GI.  As part of the Appliance Manager, a new parameter has been introduced to control the rolling of ODA patching from one node to another.  This is the first release to provide this functionality to allow you to control when the second node to be patched.


I wound up having to re-image to and then upgraded as I stated above.  That fixed the problem.  I’m not sure- it may have been a bad download or a glitch in the ODA_BASE bundle because I checked against our own X5-2 ODA and it has the same problem with a fresh install of and all of the firmware is up to date.  In hindsight, I probably should have given more credence to this message but it would have added hours onto the install process.  As it is, it more than doubled the time because of the troubleshooting needed.  Lesson learned…

Troubleshooting ODA Network connectivity

TroubleShootAudits1Setting up an ODA in a customer’s environment can either go very well or give you lots of trouble.  It all depends on having your install checklist completed, reviewed by the customer and any questions answered ahead of time.


I’ve installed dozens of ODA’s in a variety of configurations.  Ranging from a simple bare metal install to a complex virtualized install with multiple VMs and networks.  Now understand that I’m not a network engineer nor do I play one on TV, but I know enough about networking to have a civil conversation with a 2nd level network admin without getting too far out of my comfort zone. Knowing this- I can certainly appreciate the level of complexity involved in configuring and supporting an enterprise grade network.


Having said that, I find that when there are issues with a deployment, whether it’s an ODA, ZFS appliance, Exadata or other device, at least 80% of the time network misconfigurations are the culprit.  I can’t tell you how many times I’ve witnessed misconfigurations where the network admin swore up and down that they were set correctly but in fact were wrong.  It usually involves checking, re-checking and checking yet again to finally uncover the culprit.  Below, I’ll outline some of the snafu’s I’ve been involved with and the troubleshooting that can help resolve the issue.


Internet lock


  • Cabling: Are you sure the cables are all plugged into the right place?

Make sure that if you didn’t personally cable the ODA and you’re having network issues, don’t go too long without personally validating the cable configuration.  In this case, the fancy setup charts are a lifesaver!  On the X5-2 ODA’s for example, the InfiniBand private interconnect is replaced by the 10gb fiber ethernet option if the customer needs 10gb ethernet over fiber.  There is only one expansion slot available so unfortunately it’s either or.  As a result of this, the private interconnect is then facilitated by net0 and net1 with crossover cables (green and yellow) between the two compute nodes instead of the InfiniBand cables.  This can be missed very easily.  Also make sure the storage cables are all connected to the proper ports for your configuration- whether it’s one storage shelf or two.  This will typically be caught shortly after deploying the OS image whether it’s virtualized or bare metal.  There’s a storagetopology check that gets run during the install process that will catch most cabling mistakes but best not to chance it.

  • Switch configuration: Trunk port vs. Access port

When you configure a switch port, you need to tell the switch about what kind of traffic will pass through that port.  One of the important items is what network(s) does the server attached to this port need to talk on.  If you’re configuring a standalone physical server, chances are you won’t have a need to talk on more than one VLAN.  In this case, it’s usually appropriate to configure the switch port as an access port.  You can still put the server on a non-default VLAN (a VLAN other than 1) but the VLAN “tags” get stripped off at the switch and the server never sees them.

If however you’re setting up a VMware server or a machine that uses virtualization technology, it’s more likely that the VM’s that run on that server may indeed need to talk on more than one VLAN through the same network adapter(s).  In this case, you would need to set the port mode to trunked.  You then need to make sure to assign all the VLAN’s that the server will need to communicate on to that trunk port.  The server is then responsible for analyzing the VLAN tags and passing the traffic to the appropriate destination on the server.  This is one of the areas where the switch is usually configured incorrectly.  Most of the time, the network engineer fails to configure trunk mode on the port, forgets to assign the proper VLANs to the port or even setting a native VLAN on the port.

There is a difference between the default VLAN and a native VLAN.  The default VLAN is always present and is typically needed for intra-network device communication to take place.  Things like Cisco’s CDP protocol use this VLAN.  The Native VLAN, if configured, is treated similar to an access port from the perspective of the network adapter on the server.  The server NIC does not have to have a VLAN interface configured on top of it to be able to talk on the native VLAN.  If you want to talk on any other VLAN on this port however, you would need to configure a VLAN interface on the server to be able to receive those packets.  I’ve not seen the native VLAN used in a lot of configurations where more than one VLAN is needed, but it is most certainly a valid configuration.  Have the network team check these settings and make sure you understand how it should apply to your device.

  • Switch configuration: Aggregated ports vs. regular ports

Most switches have the ability to cobble together 2 to as many as 8 ports to provide higher throughput/utilization of the ports as well as redundancy at the same time.  This is referred to in different ways depending on your switch vendor.  Cisco calls it etherchannel, HP calls it Dynamic LACP trunking while extreme networks refer to it as sharing (LAG).  However you want to refer to it, it’s an implementation of a portion of the 802.3 IEEE standard which is commonly referred to as Link Aggregation or LACP (Link Aggregation Control Protocol).  Normally when you want to configure a pair of network interfaces on a server together, it’s usually to provide redundancy and avoid a SPOF (Single Point Of Failure).  I’ll refer to the standard Linux implementation mainly because I’m familiar with the different methods of load balancing that is typically employed.  This isn’t to say that other OS’s don’t have this capability (almost all do), I’m just not very experienced with all of them.

Active-Backup (Linux bonding driver mode=1) is a very simple implementation in which a primary interface is used for all traffic until that interface fails.  The traffic then moves over to the backup interface and communication is restored almost seamlessly.  There are other load balancing modes besides this one that don’t require any special configurations on the switch, each has their strengths and weaknesses.

LACP, which does require a specific configuration on the switch ports that are involved in order to work tends to be more performant while still maintaining redundancy.  The main reason for this is that there is out of band communication via the multicast group MAC address (01:80:c2:00:00:02) between the network driver on the server and the switch to keep both partners up to date on the status of the link.  This allows both ports to be utilized with an almost 50/50 split to evenly distribute the load between the totality of all the NICs in the LACP group effectively doubling (or better) throughput.

The reason I’m talking about this in the first place is because of the configuration that needs to be in place on the switch if you’re to use LACP.  If you configure your network driver for Active-Backup mode but the switch ports are set to LACP, you likely won’t see any packets at all on the server.  Likewise, if you have LACP configured on the server but the switch isn’t properly set up to handle it you’ll get the same result.  This is another setting that commonly gets misconfigured.  Other parameters such as STP (Spanning Tree Protocol), lacp_rate and passive vs. active LACP are some of the more common misconfigurations.  Also sometimes the configuration has to be split between two switches (again- no SPOF) and an MLAG configuration needs to be properly set up in order to allow LACP to work between switches.  Effectively, MLAG is one way of making two switches appear as one from a network protocol perspective and is required to span multiple switches within a LACP port group.  The take away here is to have the network admin verify their configuraiton on the switch(es) and ports involved.

  • Link speed: how fast can the server talk on the network?

Sometimes a server is capable of communicating at 10gb/s versus the more common 1gb/s either via copper or fiber media (most typically).  It used to be that you had to force switches to talk at 1gb/s in order for the server to negotiate that speed.  This was back when 1gb/s was newer and the handshake protocol that takes place between the NIC and the switch port at connection time was not as mature as it is now.  However, as a holdover from those halcyon days of yore, some network admins are prone to still set port speeds manually rather than letting them auto-negotiate like a good network admin should.  Thus you have servers connecting at 1gb/s when they should be running at 10gb/s.  Again- just something to keep in mind if you’re having speed issues.

  • Cable Quality: what speed is your cable rated at?

There are currently four common ratings for copper ethernet cables.  They are by no means the only ones but these are the most commonly used in datacenters.  They all have to do with how fast you can send data through the cables.  Cat 5 is capable of transmitting up to 1gb/s.  Cat 5e was an improvement on Cat 5 and introduced some enhancements that limited crosstalk (interference) between the 8 strands of a standard ethernet cable.  Cat 6 and 6a are further improvements on those standards, now allowing speeds of up to 10gb/s or more.  Basically the newer the Cat x number/letter the faster you can safely transmit data without data loss or corruption.  The reason I mention this is that I’ve been burned on more than one occasion when using cat5 for 1gb/s and had too much crosstalk which severely limited throughput and resulted in a lot of collisions.  Replacing the cable with a new cat 5 or higher rated cable almost always fixed the problem.  If you’re having communication problems, rule this out early on so you’re not chasing your tail in other areas.

  • IP Networking: Ensuring you have accurate network configurations

I’ve had a lot of problems in this area.  The biggest problem seems to be the fact that not all customers have taken the time to review and fill out the pre-install checklist.  This checklist prompts you for all the networking information you’ll need to do the install.  If you’ve been given IP information, before you tear your hair out make sure it’s correct.  I’ve been given multiple configurations at the same customer for the same appliance and each time there was something critical wrong that kept me from talking on the network.  Configuring VLAN’s can be especially trying because if you have it wrong, you just won’t see any traffic.  With regular non-VLAN configurations, If you put yourself on the wrong physical switch port or network, you can always sniff the network (tcpdump is now installed as part of the ODA software).  This doesn’t really work with VLAN traffic.  Other things to verify would be your subnet mask and default gateway.  If either of these are misconfigured, you’re gonna have problems.  Also as I mentioned earlier, don’t make the mistake of assuming you have to create a VLAN interface on the ODA just because you’re connected to a trunked port.  Remember the native VLAN traffic is passed on to the server with the VLAN tags stripped off so it uses a regular network interface (i.e. net1).

These are just some of the pitfalls you may encounter.  I hope some of this has helped!

OVM 3.4 released!

VirtualizationOracle releases OVM 3.4!





Oracle released last Thursday the latest iteration of their flagship Type 1 hypervisor, Oracle VM for x86.  Some of the new features include:

  • Xen 4.4 hypervisor
  • Storage Live Migration
  • FCoE and UEFI boot support
  • KDump in DOM0
  • Direct OVA import
  • Automatic VNC/Serial Console on OVM Manager
  • OSWatcher installed and configured to run at boot on OVM Server
  • Up to 256 vCPU’s in a single VM
  • General performance improvements in administration tools
  • Simple name persistance in Repositories (keep original name when moving to new OVM Manager)
  • SNMP MIB’s for monitoring OVM Server
  • VIP for server pools deprecated

I’ll be kicking the tires soon and provide some more insight on the implications of these new features.  Stay Tuned!

How to create VLANs in DOM0 on a virtualized ODA


I’ve been working with a local customer the last week or so to help them set up a pair of ODA’s in virtualized mode.  In one of the datacenters, they needed it to be on a VLAN- including DOM0.  Normally, I just configure net1 for the customer’s network and I’m off to the races.  In this case, there are a few additional steps we have to do.

First thing you’ll need to do is install the ODA software from the install media.  Once this is done, you need to log into the console since we don’t have any IP information configured yet.  Below is a high level checklist of the steps needed to complete this activity:


  • Determine which VLAN DOM0 needs to be on
  • Pick a name for the VLAN interface.  It doesn’t have to be eth2 or anything like that.  I usually go with “VLAN456” if my VLAN ID is 456 so it’s self descriptive.
  • Run the following command in DOM0 on node 0 (assuming your VLAN ID is 456)

# oakcli create vlan VLAN456 -vlanid 456 -if bond0


At this point, you’ll have the following structures in place on each compute node:



We now have networking set up so that eth2 and eth3 are bonded together (bond0).  Then we put a VLAN bond interface (bond0.456) on top of the bond pair.  Finally we create a VLAN bridge (VLAN456) that can be used to forward that network into the VM, and also allow DOM0 to talk on that VLAN.   I’ve shown in the example above what it looks like to connect more than one VLAN to a bond pair.  If you need access to both VLAN’s from within DOM0 then each VLAN interface on each node will need an IP address assigned to it.  You’ll need to rerun configure firstnet for each interface.  Note also that if you need to access more than one VLAN from a bond pair,  you’ll need to set the switch ports that eth2 and eth3 are connected to into trunked mode so they can pass more than a single VLAN.  Your network administrator will know what this means.



After that’s in place, you can continue to deploy ODA_BASE, do a configure firstnet in ODA_BASE (remember to assign the VLAN interface to ODA_BASE), yadda yadda…


Then, as you configure ODA_BASE and create your VM(s), the NetBack and NetFront drivers are created that are responsible for plumbing the network into the VM.  Here’s a completed diagram with a VM that has access to both VLAN’s:

VLAN final


Happy Hunting!



UPDATE: The way this customer wound up configuring their switches at the end of the day was to put the ODA and ODA_BASE on the Native VLAN.  In this case, even though the switch port is trunked to have access to one or more VLAN’s at a time, the Native VLAN traffic is actually passed untagged down to the server.  This implies that you do not need a special VLAN interface on the ODA to talk on this network, just use the regular net1 or net2 interface.  Now, if you want to talk on any other VLANs through that switch port, you will need to follow the procedure above and configure a VLAN interface for that VLAN.

OVM 3.3.4 Released

OVM 3.3.4 has finally been released after what seems like months since the last update.  Even so- it appears that there are only minor enhancements and mostly bug fixes in this release.  You can find them under patch 20492240 and 20492250.


I was hoping for some new features or major updates in this release since it’s been so long.  Will have to hold my breath a little longer I guess :).

OVM Disaster Recovery In A Box (Part 4 of 5)

Now that you’ve touched a file inside the VM- we have a way to prove that the VM which will be replicated to the other side via replication is actually the one we created.  Apparently in my case, faith is overrated.


Now that I’ve fire-hosed a TON of information at you on how to set up your virtual prod and dr sites, this would be a good breaking point to talk a little about how the network looks from a 10,000 foot view.  Here’s a really simple diagram that should explain how things work.  And when I say simple, we’re talking crayon art here folks.  Really- does anyone have a link to any resources on the web or in a book that could help a guy draw better network diagrams?  Ok- I digress.. here’s the diagram:

OVM DR Network Diagram


One of the biggest take aways from this diagram highlights something that a LOT of people get confused about.  In OVM DR- you do NOT replicate OVM Manager, the POOL filesystem or the OVM servers on the DR side.  In other words, you don’t replicate the operating environment, only the contents therein (i.e. the VM’s via their storage repositories).  You basically have a complete implementation of OVM at each location just as if it were a standalone site.  The only difference is that some of the repositories are replicated.  The only other potential difference (and I don’t show it or deal with it in my simulation) is RAW lun’s presented to the VM.  Those would have to be replicated at the storage layer as well.


I’ve not bothered to mess up the diagram with the VM or Storage networks- you know they’re there and that they’re serving their purpose.  You can see that replication is configured between the PROD Repo LUN and a LUN in DR.  This would be considered an Active/Passive DR Solution.  In this scenario, I don’t show it but you could potentially have some DR workloads running at the DR site.  It isn’t replicated back to PROD but note the next sentence. Now, some companies might have a problem with shelling out all that money for the infrastructure at the DR site and have it sitting unused until a DR event occurred.  Those companies might just decide to run some of their workload in the DR site and have PROD be its DR.  In this Active/Active scenario, your workflow would be pretty much the same, there are just more VM’s and repositories at each site so you need to be careful and plan well.  Here is what an Active/Active configuration would look like:

OVM DR Network Diagram active active


Again- my article doesn’t touch on Active/Active but you could easily apply the stuff you learn in these 5 articles to accommodate an Active/Active configuraiton fairly easily.  We’ll be focusing on Active/Passive just as a reminder.  We now have a Virtual Machine running in PROD to facilitate our replication testing.  Make sure the VM runs and can ping the outside network so we know we have a viable machine.  Don’t be expecting lightning performance either, we’re running a VM inside a VM which is inside of a VM.  Not exactly recommended for production use.  Ok- DO NOT use this as your production environment.  There- all the folks who ignore the warnings on hair dryers about using them in the shower should be covered now.


Below are the high level steps used to fail over to your DR site.  Once you’ve accomplished this, make sure to remember failback.  Most people are usually so excited about getting the failover to work that they forget they’ll have to fail back at some point once things have been fixed in PROD.


FAILOVER (this works if you’re doing a controlled fail over or if a real failure at prod occurs):

  • Ensure all PROD resources are nominal and functioning properly
  • Ensure all DR resources are nominal and functioning properly
  • Ensure replication between PROD and DR ZFS appliances is in place and replicating
  • on ZFSDR1, Stop replication of PROD_REPO
  • on ZFSDR1, Clone PROD_REPO project to new project DRFAIL
  • Rescan physical disk on ovmdr1 (may have to reboot to see new LUN)
  • Verify new physical disk appears
  • Rename physical disk to PROD_REPO_FAILOVER
  • Take ownership of replicated repository in DR OVM Manager
  • Scan for VM’s in the unassigned VM’s folder
  • Migrate the VM to the DR pool
  • Start the VM
  • Check /var/tmp/ and make sure you see the ovmprd1 file that you touched when it was running in PROD.  This proves that it’s the same VM
  • Ping something on your network to establish network access
  • Ping or connect to something on the internet to establish external network access



  • Ensure all PROD resources are nominal and functioning properly
  • Ensure all DR resources are nominal and functioning properly
  • Restart replication in the opposite direction from ZFSDR1 to ZFSPRD1
  • Ensure replication finishes successfully
  • Rescan physical disks on ovmprd1
  • Verify your PROD Repo LUN is still visible and in good health
  • Browse the PROD Repo and ensure your VM(s) are there
  • Power on your VM’s in PROD and ensure that whatever data was modified while in DR has been replicated back to PROD successfully.
  • Ping something on your network to establish network access
  • Ping or connect to something on the internet to establish external network access


Now that we’ve shown you how all this works, I’ll summarize in part 5.