Create VM in Oracle VM for x86 using NFS share

I’m using OVM Manager 3.4.2 and OVM Server 3.3.2 to test an upgrade for one of our customers.  I am using Starwind iSCSI server to present the shared storage to the cluster but in production you should use enterprise grade hardware to do this.  There’s an easier way to do this- create an HVM VM and install from an ISO stored in a repository.  Then power the VM off and change the type to PVM then power on.  This may not work with all operating systems however so I’m going over how to create a new PVM VM from an ISO image shared from an NFS server.

* Download ISO (I'm using Oracle Linux 6.5 64bit for this example)
* Copy ISO image to OVM Manager (any NFS server is fine)
* Mount ISO on the loopback device
# mount -o loop /var/tmp/V41362-01.iso /mnt

* Share the folder via NFS
# service nfs start
Starting NFS services: [ OK ]
Starting NFS quotas: [ OK ]
Starting NFS mountd: [ OK ]
Starting NFS daemon: [ OK ]
Starting RPC idmapd: [ OK ]

# exportfs *:/mnt/

# showmount -e
Export list for ovmm:
/mnt *

* Create new VM in OVM Manager
* Edit VM properties and configure as PVM
* Set additional properties such as memory, cpu and network
* At the boot order tab, enter the network boot path formatted like this:
  nfs:{ip address or FQDN of NFS host}:/{path to ISO image top level directory}

For example, our NFS server is 10.2.3.4 and the path where I mounted the ISO is at /mnt.  Leave the {}'s off of course:

  nfs:10.2.3.4:/mnt 

You should be able to boot your VM at this point and perform the install of the OS.
Advertisements

Nimble PowerShell Toolkit

I was working on an internal project to test performance of a converged system solution.  The storage component is a Nimble AF7000 from which we’re presenting a number of LUNs.  There are almost 30 LUNs and I’ve had to create, delete and provision them a number of times throughout the project.  It became extremely tedious to do this through the WebUI so I decided to see if it could be scripted.

I know you can log into the nimble via ssh and basically do what I’m trying to do- and I did test this with success.  However I’ve recently had a customer who wanted to use PowerShell to perform some daily snapshot/clone operations for Oracle database running on windows (don’t ask).  We decided to leverage the Nimble PowerShell Toolkit to perform the operations right from the windows server.  The script was fairly straightforward, although we had to learn a little about PowerShell syntax and such.  I’ve included a sanitized script below that basically does what I need to.

$arrayname = "IP address or FQDN of array management address"
$nm_uid = "admin"
$nm_password = ConvertTo-SecureString -String "admin" -AsPlainText -Force
$nm_cred = New-Object -TypeName System.Management.Automation.PSCredential -ArgumentList $nm_uid,$nm_password
$initiatorID = Get-NSInitiatorGroup -name {name of initiator group} | select -expandproperty id

# Import Nimble Tool Kit for PowerShell
import-module NimblePowerShellToolKit

# Connect to the array
Connect-NSGroup -group $arrayname -credential $nm_cred

# Create 10 DATA Disks
for ($i=1; $i -le 10; $i++) {
    New-NSVolume -Name DATADISK$i -Size 1048576 -PerfPolicy_id 036462b75de9a4f69600000000000000000000000e -online $true
    $volumeID = Get-NSVolume -name DATADISK$i | select -expandproperty id
    New-NSAccessControlRecord -initiator_group_id $initiatorID -vol_id $volumeID
}

# Create 10 RECO Disks
for ($i=1; $i -le 10; $i++) {
    New-NSVolume -Name RECODISK$i -Size 1048576 -PerfPolicy_id 036462b75de9a4f69600000000000000000000000e -online $true
    $volumeID = Get-NSVolume -name RECODISK$i | select -expandproperty id
    New-NSAccessControlRecord -initiator_group_id $initiatorID -vol_id $volumeID
}

# Create 3 GRID Disks
for ($i=1; $i -le 3; $i++) {
    New-NSVolume -Name GRIDDISK$i -Size 2048 -PerfPolicy_id 036462b75de9a4f69600000000000000000000000e -online $true
    $volumeID = Get-NSVolume -name GRIDDISK$i | select -expandproperty id
    New-NSAccessControlRecord -initiator_group_id $initiatorID -vol_id $volumeID
}

I also wrote a script to delete the LUNs below:

$arrayname = "IP address or FQDN of array management address"  
$nm_uid = "admin"
$nm_password = ConvertTo-SecureString -String "admin" -AsPlainText -Force
$nm_cred = New-Object -TypeName System.Management.Automation.PSCredential -ArgumentList $nm_uid,$nm_password
$initiatorID = Get-NSInitiatorGroup -name {name of initiator group} | select -expandproperty id

# Import Nimble Tool Kit for PowerShell
import-module NimblePowerShellToolKit

# Connect to the array 
Connect-NSGroup -group $arrayname -credential $nm_cred


# Delete 10 DATA Disks
for ($i=1; $i -le 10; $i++) {
    Set-NSVolume -name DATADISK$i -online $false
    Remove-NSVolume -name DATADISK$i
}

# Delete 10 RECO Disks
for ($i=1; $i -le 10; $i++) {
    Set-NSVolume -name RECODISK$i -online $false
    Remove-NSVolume -name RECODISK$i 
}

# Delete 3 GRID Disks
for ($i=1; $i -le 3; $i++) {
    Set-NSVolume -name GRIDDISK$i -online $false
    Remove-NSVolume -name GRIDDISK$i 
}

Obviously you’ll have to substitute some of the values such as $arrayname, $nm_uid, $nm_password and $initiatorID (make sure you remove the {}’s when you put your value here). This is a very insecure method of storing your password but it was a quick and dirty solution at the time. There are ways to store the value of a password from a highly secured text file and encrypt it into a variable. Or if you don’t mind being interactive, you can skip providing the credentials and it will pop up a password dialog box for you to enter them every time the script runs.

It made the project go a lot faster- hopefully you can use this to model different scripts to do other things. The entire command set of the Nimble array is basically exposed through the toolkit so there’s not a whole lot you can’t do here that you could in the WebUI. When you download the toolkit- there is a README PDF that goes through all the commands. When in PowerShell, you can also get help for each of the commands. For example:

PS C:\Users\esteed> help New-NSVolume

NAME
    New-NSvolume

SYNOPSIS
    Create operation is used to create or clone a volume. Creating volumes requires name and size attributes. Cloning
    volumes requires clone, name and base_snap_id attributes where clone is set to true. Newly created volume will not
    have any access control records, they can be added to the volume by create operation on access_control_records
    object set. Cloned volume inherits access control records from the parent volume.


SYNTAX
    New-NSvolume [-name] <String> [-size] <UInt64> [[-description] <String>] [[-perfpolicy_id] <String>] [[-reserve]
    <UInt64>] [[-warn_level] <UInt64>] [[-limit] <UInt64>] [[-snap_reserve] <UInt64>] [[-snap_warn_level] <UInt64>]
    [[-snap_limit] <UInt64>] [[-online] <Boolean>] [[-multi_initiator] <Boolean>] [[-pool_id] <String>] [[-read_only]
    <Boolean>] [[-block_size] <UInt64>] [[-clone] <Boolean>] [[-base_snap_id] <String>] [[-agent_type] <String>]
    [[-dest_pool_id] <String>] [[-cache_pinned] <Boolean>] [[-encryption_cipher] <String>] [<CommonParameters>]


DESCRIPTION
    Create operation is used to create or clone a volume. Creating volumes requires name and size attributes. Cloning
    volumes requires clone, name and base_snap_id attributes where clone is set to true. Newly created volume will not
    have any access control records, they can be added to the volume by create operation on access_control_records
    object set. Cloned volume inherits access control records from the parent volume.


RELATED LINKS

REMARKS
    To see the examples, type: "get-help New-NSvolume -examples".
    For more information, type: "get-help New-NSvolume -detailed".
    For technical information, type: "get-help New-NSvolume -full".

You can also use the -detail parameter at the end to get a more complete description of each option. Additionally you can use -examples to see the commands used in real world situations. Have fun!

ODA Patching – get ahead of yourself?

I was at a customer site deploying an X5-2 ODA.  They are standardizing on the 12.1.2.6.0 patch level.  Even though 12.1.2.7.0 is currently the latest, they don’t want to be on the bleeding edge.  Recall that the 12.1.2.6.0 patch doesn’t include infrastructure patches (mostly firmware) so you have to install 12.1.2.5.0 first, run the –infra patch to get the firmware and then update to 12.1.2.6.0.

 

We unpacked the 12.1.2.5.0 patch on both systems and then had an epiphany.  Why don’t we just unpack the 12.1.2.6.0 patch as well and save some time later?  What could possibly go wrong?  Needless to say, when we went to install or even verify the 12.1.2.5.0 patch it complained as follows:

ERROR: Patch version must be 12.1.2.6.0

 

Ok, so there has to be a way to clean that patch off the system so I can use 12.1.2.5.0 right?  I stumbled across the oakcli manage cleanrepo command and thought for sure that would fix things up nicely.  Ran it and I got this output:

 


[root@CITX-5ODA-ODABASE-NODE0 tmp]# oakcli manage cleanrepo --ver 12.1.2.6.0
Deleting the following files...
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/OAK/12.1.2.6.0/Base
Deleting the files under /DOM0OAK/12.1.2.6.0/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/Seagate/ST95000N/SF04/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/Seagate/ST95001N/SA03/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/WDC/WD500BLHXSUN/5G08/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HGST/H101860SFSUN600G/A770/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/Seagate/ST360057SSUN600G/0B25/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HITACHI/H106060SDSUN600G/A4C0/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HITACHI/H109060SESUN600G/A720/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HITACHI/HUS1560SCSUN600G/A820/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HGST/HSCAC2DA6SUN200G/A29A/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HGST/HSCAC2DA4SUN400G/A29A/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/STEC/ZeusIOPs-es-G3/E12B/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/STEC/Z16IZF2EUSUN73G/9440/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Expander/ORACLE/DE2-24P/0018/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Expander/ORACLE/DE2-24C/0018/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Expander/ORACLE/DE3-24C/0291/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Controller/LSI-es-Logic/0x0072/11.05.03.00/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Controller/LSI-es-Logic/0x0072/11.05.03.00/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Ilom/SUN/X4370-es-M2/3.0.16.22.f-es-r100119/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HITACHI/H109090SESUN900G/A720/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/STEC/Z16IZF4EUSUN200G/944A/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HGST/H7240AS60SUN4.0T/A2D2/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HGST/H7240B520SUN4.0T/M554/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Disk/HGST/H7280A520SUN8.0T/P554/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Expander/SUN/T4-es-Storage/0342/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Controller/LSI-es-Logic/0x0072/11.05.03.00/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Controller/LSI-es-Logic/0x005d/4.230.40-3739/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Controller/LSI-es-Logic/0x0097/06.00.02.00/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Controller/Mellanox/0x1003/2.11.1280/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Ilom/SUN/X4170-es-M3/3.2.4.26.b-es-r101722/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Ilom/SUN/X4-2/3.2.4.46.a-es-r101689/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Ilom/SUN/X5-2/3.2.4.52-es-r101649/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/HMP/2.3.4.0.1/Base
Deleting the files under /DOM0HMP/2.3.4.0.1/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/IPMI/1.8.12.4/Base
Deleting the files under /DOM0IPMI/1.8.12.4/Base
Deleting the files under /JDK/1.7.0_91/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/ASR/5.3.1/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/OPATCH/12.1.0.1.0/Patches/6880880
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/OPATCH/12.0.0.0.0/Patches/6880880
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/OPATCH/11.2.0.4.0/Patches/6880880
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/GI/12.1.0.2.160119/Patches/21948354
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/DB/12.1.0.2.160119/Patches/21948354
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/DB/11.2.0.4.160119/Patches/21948347
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/DB/11.2.0.3.15/Patches/20760997
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/DB/11.2.0.2.12/Patches/17082367
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/OEL/6.7/Patches/6.7.1
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/OVM/3.2.9/Patches/3.2.9.1
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/OVS/12.1.2.6.0/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Controller/LSI-es-Logic/0x0072/11.05.02.00/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/thirdpartypkgs/Firmware/Controller/LSI-es-Logic/0x0072/11.05.02.00/Base
Deleting the files under $OAK_REPOS_HOME/pkgrepos/orapkgs/GI/12.1.0.2.160119/Base

 

So I assumed that this fixed the problem.  Nope…

 


[root@CITX-5ODA-ODABASE-NODE0 tmp]# oakcli update -patch 12.1.2.5.0 --verify

ERROR: Patch version must be 12.1.2.6.0

 

 

Ok so more searching the CLI manual and the oakcli help pages came up with bupkiss.  So I decided to do an strace of the oakcli command I had just ran.  As ususal- there was a LOT of garbage I didn’t care about or didn’t know what it was doing.  I did find however that it was reading the contents of a file that looked interesting to me:

 


[pid 5509] stat("/opt/oracle/oak/pkgrepos/System/VERSION", {st_mode=S_IFREG|0777, st_size=19, ...}) = 0
[pid 5509] open("/opt/oracle/oak/pkgrepos/System/VERSION", O_RDONLY) = 3
[pid 5509] read(3, "version=12.1.2.6.0\n", 8191) = 19
[pid 5509] read(3, "", 8191) = 0
[pid 5509] close(3) = 0
[pid 5509] fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
[pid 5509] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f159799d000
[pid 5509] write(1, "\n", 1
) = 1
[pid 5509] write(1, "ERROR: Patch version must be 12."..., 40ERROR: Patch version must be 12.1.2.6.0
) = 40
[pid 5509] exit_group(0) = ?

 

There were a dozen or so lines after that, but I had what I needed.  Apparently /opt/oracle/oak/pkgrepos/System/VERSION contains the current version of the latest patch that has been unpacked.  The system software version is kept somewhere else because after I unpacked the 12.1.2.6.0 patch, I ran an oakcli show version and it reported 12.1.2.5.0.  But the VERSION file referenced earlier said 12.1.2.6.0.  I assume when I unpacked the 12.1.2.6.0 patch, it updates this file.  So what I wound up doing is changing the VERSION file back to 12.1.2.5.0 as well as deleting the folder /opt/oracle/oak/pkgrepos/System/12.1.2.6.0.  Once I did this, everything worked as I expected.  I was able to verify and install the –infra portion of 12.1.2.5.0 and continue on my merry way.

 

This highlights the fact that there isn’t a known way (to me at least) to delete an unpacked patch via oakcli or any python scripts I’ve been able to find yet.  Also- as an aside I tried just deleting the VERSION file assuming it would be rebuilt by oakcli and it didn’t.  I got this:

 


[root@CITX-5ODA-ODABASE-NODE0 System]# oakcli update -patch 12.1.2.5.0 --verify
ERROR : Couldn't find the VERSION file to extract the current allowed version

 

So I just recreated the file and all was good.  I was hoping that the oak software didn’t maintain some sort of binary formatted database that kept track of all this information- I think I got lucky in this case.  Hope this helps someone out in a pinch!

Troubleshooting ODA Network connectivity

TroubleShootAudits1Setting up an ODA in a customer’s environment can either go very well or give you lots of trouble.  It all depends on having your install checklist completed, reviewed by the customer and any questions answered ahead of time.

 

I’ve installed dozens of ODA’s in a variety of configurations.  Ranging from a simple bare metal install to a complex virtualized install with multiple VMs and networks.  Now understand that I’m not a network engineer nor do I play one on TV, but I know enough about networking to have a civil conversation with a 2nd level network admin without getting too far out of my comfort zone. Knowing this- I can certainly appreciate the level of complexity involved in configuring and supporting an enterprise grade network.

 

Having said that, I find that when there are issues with a deployment, whether it’s an ODA, ZFS appliance, Exadata or other device, at least 80% of the time network misconfigurations are the culprit.  I can’t tell you how many times I’ve witnessed misconfigurations where the network admin swore up and down that they were set correctly but in fact were wrong.  It usually involves checking, re-checking and checking yet again to finally uncover the culprit.  Below, I’ll outline some of the snafu’s I’ve been involved with and the troubleshooting that can help resolve the issue.

 

Internet lock

 

  • Cabling: Are you sure the cables are all plugged into the right place?

Make sure that if you didn’t personally cable the ODA and you’re having network issues, don’t go too long without personally validating the cable configuration.  In this case, the fancy setup charts are a lifesaver!  On the X5-2 ODA’s for example, the InfiniBand private interconnect is replaced by the 10gb fiber ethernet option if the customer needs 10gb ethernet over fiber.  There is only one expansion slot available so unfortunately it’s either or.  As a result of this, the private interconnect is then facilitated by net0 and net1 with crossover cables (green and yellow) between the two compute nodes instead of the InfiniBand cables.  This can be missed very easily.  Also make sure the storage cables are all connected to the proper ports for your configuration- whether it’s one storage shelf or two.  This will typically be caught shortly after deploying the OS image whether it’s virtualized or bare metal.  There’s a storagetopology check that gets run during the install process that will catch most cabling mistakes but best not to chance it.

  • Switch configuration: Trunk port vs. Access port

When you configure a switch port, you need to tell the switch about what kind of traffic will pass through that port.  One of the important items is what network(s) does the server attached to this port need to talk on.  If you’re configuring a standalone physical server, chances are you won’t have a need to talk on more than one VLAN.  In this case, it’s usually appropriate to configure the switch port as an access port.  You can still put the server on a non-default VLAN (a VLAN other than 1) but the VLAN “tags” get stripped off at the switch and the server never sees them.

If however you’re setting up a VMware server or a machine that uses virtualization technology, it’s more likely that the VM’s that run on that server may indeed need to talk on more than one VLAN through the same network adapter(s).  In this case, you would need to set the port mode to trunked.  You then need to make sure to assign all the VLAN’s that the server will need to communicate on to that trunk port.  The server is then responsible for analyzing the VLAN tags and passing the traffic to the appropriate destination on the server.  This is one of the areas where the switch is usually configured incorrectly.  Most of the time, the network engineer fails to configure trunk mode on the port, forgets to assign the proper VLANs to the port or even setting a native VLAN on the port.

There is a difference between the default VLAN and a native VLAN.  The default VLAN is always present and is typically needed for intra-network device communication to take place.  Things like Cisco’s CDP protocol use this VLAN.  The Native VLAN, if configured, is treated similar to an access port from the perspective of the network adapter on the server.  The server NIC does not have to have a VLAN interface configured on top of it to be able to talk on the native VLAN.  If you want to talk on any other VLAN on this port however, you would need to configure a VLAN interface on the server to be able to receive those packets.  I’ve not seen the native VLAN used in a lot of configurations where more than one VLAN is needed, but it is most certainly a valid configuration.  Have the network team check these settings and make sure you understand how it should apply to your device.

  • Switch configuration: Aggregated ports vs. regular ports

Most switches have the ability to cobble together 2 to as many as 8 ports to provide higher throughput/utilization of the ports as well as redundancy at the same time.  This is referred to in different ways depending on your switch vendor.  Cisco calls it etherchannel, HP calls it Dynamic LACP trunking while extreme networks refer to it as sharing (LAG).  However you want to refer to it, it’s an implementation of a portion of the 802.3 IEEE standard which is commonly referred to as Link Aggregation or LACP (Link Aggregation Control Protocol).  Normally when you want to configure a pair of network interfaces on a server together, it’s usually to provide redundancy and avoid a SPOF (Single Point Of Failure).  I’ll refer to the standard Linux implementation mainly because I’m familiar with the different methods of load balancing that is typically employed.  This isn’t to say that other OS’s don’t have this capability (almost all do), I’m just not very experienced with all of them.

Active-Backup (Linux bonding driver mode=1) is a very simple implementation in which a primary interface is used for all traffic until that interface fails.  The traffic then moves over to the backup interface and communication is restored almost seamlessly.  There are other load balancing modes besides this one that don’t require any special configurations on the switch, each has their strengths and weaknesses.

LACP, which does require a specific configuration on the switch ports that are involved in order to work tends to be more performant while still maintaining redundancy.  The main reason for this is that there is out of band communication via the multicast group MAC address (01:80:c2:00:00:02) between the network driver on the server and the switch to keep both partners up to date on the status of the link.  This allows both ports to be utilized with an almost 50/50 split to evenly distribute the load between the totality of all the NICs in the LACP group effectively doubling (or better) throughput.

The reason I’m talking about this in the first place is because of the configuration that needs to be in place on the switch if you’re to use LACP.  If you configure your network driver for Active-Backup mode but the switch ports are set to LACP, you likely won’t see any packets at all on the server.  Likewise, if you have LACP configured on the server but the switch isn’t properly set up to handle it you’ll get the same result.  This is another setting that commonly gets misconfigured.  Other parameters such as STP (Spanning Tree Protocol), lacp_rate and passive vs. active LACP are some of the more common misconfigurations.  Also sometimes the configuration has to be split between two switches (again- no SPOF) and an MLAG configuration needs to be properly set up in order to allow LACP to work between switches.  Effectively, MLAG is one way of making two switches appear as one from a network protocol perspective and is required to span multiple switches within a LACP port group.  The take away here is to have the network admin verify their configuraiton on the switch(es) and ports involved.

  • Link speed: how fast can the server talk on the network?

Sometimes a server is capable of communicating at 10gb/s versus the more common 1gb/s either via copper or fiber media (most typically).  It used to be that you had to force switches to talk at 1gb/s in order for the server to negotiate that speed.  This was back when 1gb/s was newer and the handshake protocol that takes place between the NIC and the switch port at connection time was not as mature as it is now.  However, as a holdover from those halcyon days of yore, some network admins are prone to still set port speeds manually rather than letting them auto-negotiate like a good network admin should.  Thus you have servers connecting at 1gb/s when they should be running at 10gb/s.  Again- just something to keep in mind if you’re having speed issues.

  • Cable Quality: what speed is your cable rated at?

There are currently four common ratings for copper ethernet cables.  They are by no means the only ones but these are the most commonly used in datacenters.  They all have to do with how fast you can send data through the cables.  Cat 5 is capable of transmitting up to 1gb/s.  Cat 5e was an improvement on Cat 5 and introduced some enhancements that limited crosstalk (interference) between the 8 strands of a standard ethernet cable.  Cat 6 and 6a are further improvements on those standards, now allowing speeds of up to 10gb/s or more.  Basically the newer the Cat x number/letter the faster you can safely transmit data without data loss or corruption.  The reason I mention this is that I’ve been burned on more than one occasion when using cat5 for 1gb/s and had too much crosstalk which severely limited throughput and resulted in a lot of collisions.  Replacing the cable with a new cat 5 or higher rated cable almost always fixed the problem.  If you’re having communication problems, rule this out early on so you’re not chasing your tail in other areas.

  • IP Networking: Ensuring you have accurate network configurations

I’ve had a lot of problems in this area.  The biggest problem seems to be the fact that not all customers have taken the time to review and fill out the pre-install checklist.  This checklist prompts you for all the networking information you’ll need to do the install.  If you’ve been given IP information, before you tear your hair out make sure it’s correct.  I’ve been given multiple configurations at the same customer for the same appliance and each time there was something critical wrong that kept me from talking on the network.  Configuring VLAN’s can be especially trying because if you have it wrong, you just won’t see any traffic.  With regular non-VLAN configurations, If you put yourself on the wrong physical switch port or network, you can always sniff the network (tcpdump is now installed as part of the ODA software).  This doesn’t really work with VLAN traffic.  Other things to verify would be your subnet mask and default gateway.  If either of these are misconfigured, you’re gonna have problems.  Also as I mentioned earlier, don’t make the mistake of assuming you have to create a VLAN interface on the ODA just because you’re connected to a trunked port.  Remember the native VLAN traffic is passed on to the server with the VLAN tags stripped off so it uses a regular network interface (i.e. net1).

These are just some of the pitfalls you may encounter.  I hope some of this has helped!

OVM Disaster Recovery In A Box (Part 4 of 5)

Now that you’ve touched a file inside the VM- we have a way to prove that the VM which will be replicated to the other side via replication is actually the one we created.  Apparently in my case, faith is overrated.

 

Now that I’ve fire-hosed a TON of information at you on how to set up your virtual prod and dr sites, this would be a good breaking point to talk a little about how the network looks from a 10,000 foot view.  Here’s a really simple diagram that should explain how things work.  And when I say simple, we’re talking crayon art here folks.  Really- does anyone have a link to any resources on the web or in a book that could help a guy draw better network diagrams?  Ok- I digress.. here’s the diagram:

OVM DR Network Diagram

 

One of the biggest take aways from this diagram highlights something that a LOT of people get confused about.  In OVM DR- you do NOT replicate OVM Manager, the POOL filesystem or the OVM servers on the DR side.  In other words, you don’t replicate the operating environment, only the contents therein (i.e. the VM’s via their storage repositories).  You basically have a complete implementation of OVM at each location just as if it were a standalone site.  The only difference is that some of the repositories are replicated.  The only other potential difference (and I don’t show it or deal with it in my simulation) is RAW lun’s presented to the VM.  Those would have to be replicated at the storage layer as well.

 

I’ve not bothered to mess up the diagram with the VM or Storage networks- you know they’re there and that they’re serving their purpose.  You can see that replication is configured between the PROD Repo LUN and a LUN in DR.  This would be considered an Active/Passive DR Solution.  In this scenario, I don’t show it but you could potentially have some DR workloads running at the DR site.  It isn’t replicated back to PROD but note the next sentence. Now, some companies might have a problem with shelling out all that money for the infrastructure at the DR site and have it sitting unused until a DR event occurred.  Those companies might just decide to run some of their workload in the DR site and have PROD be its DR.  In this Active/Active scenario, your workflow would be pretty much the same, there are just more VM’s and repositories at each site so you need to be careful and plan well.  Here is what an Active/Active configuration would look like:

OVM DR Network Diagram active active

 

Again- my article doesn’t touch on Active/Active but you could easily apply the stuff you learn in these 5 articles to accommodate an Active/Active configuraiton fairly easily.  We’ll be focusing on Active/Passive just as a reminder.  We now have a Virtual Machine running in PROD to facilitate our replication testing.  Make sure the VM runs and can ping the outside network so we know we have a viable machine.  Don’t be expecting lightning performance either, we’re running a VM inside a VM which is inside of a VM.  Not exactly recommended for production use.  Ok- DO NOT use this as your production environment.  There- all the folks who ignore the warnings on hair dryers about using them in the shower should be covered now.

 

Below are the high level steps used to fail over to your DR site.  Once you’ve accomplished this, make sure to remember failback.  Most people are usually so excited about getting the failover to work that they forget they’ll have to fail back at some point once things have been fixed in PROD.

 

FAILOVER (this works if you’re doing a controlled fail over or if a real failure at prod occurs):

  • Ensure all PROD resources are nominal and functioning properly
  • Ensure all DR resources are nominal and functioning properly
  • Ensure replication between PROD and DR ZFS appliances is in place and replicating
  • on ZFSDR1, Stop replication of PROD_REPO
  • on ZFSDR1, Clone PROD_REPO project to new project DRFAIL
  • Rescan physical disk on ovmdr1 (may have to reboot to see new LUN)
  • Verify new physical disk appears
  • Rename physical disk to PROD_REPO_FAILOVER
  • Take ownership of replicated repository in DR OVM Manager
  • Scan for VM’s in the unassigned VM’s folder
  • Migrate the VM to the DR pool
  • Start the VM
  • Check /var/tmp/ and make sure you see the ovmprd1 file that you touched when it was running in PROD.  This proves that it’s the same VM
  • Ping something on your network to establish network access
  • Ping or connect to something on the internet to establish external network access

 

FAILBACK:

  • Ensure all PROD resources are nominal and functioning properly
  • Ensure all DR resources are nominal and functioning properly
  • Restart replication in the opposite direction from ZFSDR1 to ZFSPRD1
  • Ensure replication finishes successfully
  • Rescan physical disks on ovmprd1
  • Verify your PROD Repo LUN is still visible and in good health
  • Browse the PROD Repo and ensure your VM(s) are there
  • Power on your VM’s in PROD and ensure that whatever data was modified while in DR has been replicated back to PROD successfully.
  • Ping something on your network to establish network access
  • Ping or connect to something on the internet to establish external network access

 

Now that we’ve shown you how all this works, I’ll summarize in part 5.

Hardware Virtualized VM’s on ODA – One Click!

Untitled

 

I previously wrote an article on how to install Windows on a virtualized ODA.  In that article I stated that running Windows on an ODA was not supported.  I’m starting to lean away from that stance for a couple reasons.  One of them is the continued Oracle InfoDoc’s I see being written on how to run an HVM virtual machine on an ODA.  The other and perhaps more compelling reason is due to an excerpt from the oakcli command reference documentation- specifically the “-os” parameter of the oakcli configure vm command:

 

oakcli configure vm

Use the oakcli configure vm command to configure a virtual machine on Oracle Database Appliance Virtualized Platform and to increase or decrease resource allocation to user domains. You must restart the domain for the resource allocation change to take effect.

Syntax

oakcli configure vm name [-vcpu cpucount -maxvcpu maxcpu -cpuprio priority 
-cpucap cap -memory memsize -maxmemory max_memsize -os sys -keyboard lang -mouse 
mouse_type -domain dom -network netlist -autostart astart -disk disks -bootoption
bootstrap -cpupool pool -prefnode 0|1 -failover true|false][-h]

Parameters

Parameter Description
name The name assigned to the virtual machine.
-vcpu cpucount Number of nodes assigned to the virtual machine. The range is 1 to 72. This number depends on your Oracle Database Appliance configuration:

  • On Oracle Database Appliance X5-2, the range is from 1 to 72.
  • On Oracle Database Appliance X4-2, the range is from 1 to 48.
  • On Oracle Database Appliance X3-2, the range is from 1 to 32.
  • On Oracle Database Appliance, the range is 1 to 24.
-maxvcpu maxcpu Maximum number of CPUs that the virtual machine can consume. The range is 1 to 72. This number depends on your Oracle Database Appliance configuration:

  • On Oracle Database Appliance X5-2, the range is from 1 to 72.
  • On Oracle Database Appliance X4-2, the range is from 1 to 48.
  • On Oracle Database Appliance X3-2, the range is from 1 to 32.
  • On Oracle Database Appliance version 1, the range is 1 to 24.
-cpuprio priority Priority for CPU usage, where larger values have higher priority. The range is 1 to 65535,
-cpucap cap Percentage of a CPU the virtual machine can receive. The range is 10 to 100.
-memory memsize Amount of memory given to the virtual machine: (1 to 248)G to (1to 760G) or (1 to 253952)M to (1 to 778240)M, based on RAM. The default is M.
-maxmemory max_memsize Maximum amount of memory allowed for the virtual machine: (1 to 248)G to (1 to 760)G or (1-253952)M to (1-778240)M, based on RAM. The default is M.
-os sys Operating system used by the virtual machine (WIN_2003, WIN_2008, WIN_7, WIN_VISTA, OTHER_WIN, OL_4, OL_5, OL_6, RHL_4, RHL_5, RHL_6, LINUX_RECOVERY, OTHER_LINUX, SOLARIS_10, SOLARIS_11, OTHER_SOLARIS, or NONE)
-keyboard lang Keyboard used by virtual machine (en-us, ar, da, de, de-ch, en-gb, es, et, fi, fo, fr, fr-be, fr-ca, hr, hu, is, it, ja, lt, lv, mk, nl, n–be, no, pl, pt, pt-br, ru, sl, sv, th, or tr)
-mouse mouse_type Mouse type used by the virtual machine (OS_DEFAULT, PS2_MOUSE, USB_MOUSE, or USB_TABLET)
-domain dom Domain type from the following options:

  • Hardware virtualized guest (XEN_HVM)

    – The kernel or operating system is not virtualization-aware and can run unmodified.

    – Device drivers are emulated.

  • Para virtualized guest (XEN_PVM)

    – The guest is virtualization-aware and is optimized for a virtualized environment.

    – PV guests use generic, idealized device drivers.

  • Hardware virtualized guest (XEN_HVM_PV_DRIVERS)

    The PV drivers are hypervisor-aware and significantly reduce the overhead of emulated device input/output.

  • Hardware virtualized guest (UNKNOWN)
-network netlist MAC address and list of networks used by the virtual machine
-autostart astart Startup option for virtual machine (always, restore, or never)
-disk disks List of disks (slot, disktype, and content) used by virtual machine
-bootoption bootstrap Boot option used to bootstrap the virtual machine (PXE, DISK, or CDROM)
-cpupool pool Named CPU pool assigned to the virtual machine
-prefnode 0|1 Preferred node on which the virtual machine will attempt to start (Node 0 or Node 1). This parameter is only valid for virtual machines created in shared repositories.
-failover true|false Allow (use the keyword “true”) or disallow (use the keyword “false”) the virtual machine to start or restart on a node other than the node defined by the -prefnode parameter. This parameter is only valid for virtual machines created in shared repositories.
-h (Optional) Display help for using the command.

Note the selection of Operating Systems you have to choose from.  The list includes the following Operating Systems:

  • Windows 2003
  • Windows 2008
  • Windows 7
  • Windows Vista
  • Other Windows
  • Oracle Linux 4, 5 and 6
  • Red Had Linux 4, 5 and 6
  • Linux Recovery
  • Other Linux
  • Solaris 10 and 11
  • Other Solaris

 

To me- this is a strong indicator that you should be able to run a VM that isn’t created from a template- including Windows!  It gets even better.  I stumbled across an InfoDoc (2099289.1) which was created by a gentleman named Ruggero Citton.  In that document, he shows how to deploy an HVM virtual machine on an ODA with a single command.  This automates all the manual steps that you used to have to do in order to run an HVM virtual machine including manually creating virtual disk images and editing the vm.cfg file for the VM.

 

Check out the InfoDoc for more information including the Perl script that does all of the automation.  I was able to follow his instructions and successfully created a Windows 2008 r2 vm.  Based on the parameters in the oakcli configure vm command, I’m a lot more comfortable with at least telling customers about this capability.  I still want to confirm that a VM which was created in this fashion wouldn’t cause Oracle to not support the customer- I’ll post an update when I find out for sure.

Testing network throughput in Linux

I was at a customer site the other day doing a POC to compare performance between an ODA and an AIX system running Oracle Database.  The network didn’t seem to be very busy at all and I wanted to rule out throughput as a bottleneck for performance issues.  I wound up using nc (netcat) and dd (disk dump) to show the network throughput.  Here’s an example of what I did (on two different systems):

 

System 1:

[root@forge ~]# nc -vl 2222 >/dev/null

System 2:

[root@daryl ~]# dd if=/dev/zero bs=1024k count=256 | nc -v 10.10.155.10 2222
Connection to 10.10.155.10 2222 port [tcp/EtherNet/IP-1] succeeded!
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 22.7638 s, 11.8 MB/s

This tells me that one of the two systems is probably connected at 100Mb. Further investigation reveals that I was right:

System 1:

[root@forge ~]# ethtool eth0
Settings for eth0:
        Supported ports: [ TP ]
        Supported link modes:   1000baseT/Full
                                10000baseT/Full
        Supported pause frame use: No
        Supports auto-negotiation: No
        Advertised link modes:  Not reported
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: off
        MDI-X: Unknown
        Supports Wake-on: uag
        Wake-on: d
        Link detected: yes

System 2:

[root@daryl ~]# ethtool eth0
Settings for eth0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Supported pause frame use: No
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: Yes
        Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                             100baseT/Half 100baseT/Full
        Link partner advertised pause frame use: Symmetric
        Link partner advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: off
        Supports Wake-on: g
        Wake-on: g
        Current message level: 0x000000ff (255)
                               drv probe link timer ifdown ifup rx_err tx_err
        Link detected: yes

If you look at the output above, you’ll see that the line that starts with “Speed:” shows the currently connected link speed. Sure enough, daryl is stuck at 100Mb so we get the slower speed.

My oakcli deploy command failed… do I have to restart?

I was in the process of deploying the database in ODA_BASE on our virtualized X5 ODA today when step 9 failed for DNS resolution issues- I had fat fingered the hostname in DNS. Instead of starting the whole process over, I simply used the GridInst.pl command. It’s located at /opt/oracle/oak/onecmd/GridInst.pl and you can use it to resume the deploy command. Granted, step 9 isn’t all that far into the process, but if you were on step 16 for example- it could come in handy!

Here’s the format and some useful bits of information to know:


[root@CIT-X5ODA-ODABASE-NODE0 onecmd]# ./GridInst.pl -h
Usage:
GridInst.pl -l [options]
GridInst.pl -s | -r [options]
GridInst.pl -v

ARGUMENTS:
-l List all the steps that exist
-s Run the step # at a time
-r Run the steps one after the other as long as no errors
are encountered
-d Debug. You will be prompted to enter Y at some
particular steps.
-o VM env run
-n Ignore errors whenever needed
-h Usage
-v Display the version number

EXAMPLES:
GridInst.pl -l
GridInst.pl -s 0
GridInst.pl -r 1-19
[root@CIT-X5ODA-ODABASE-NODE0 onecmd]#

Depending on whether you’re running this in a virtualized environment or not will make a difference. You have to specify the -o option if you’re virtualized or the command won’t work. Since you have to specify the starting and ending step number, it comes in handy to know what all the step numbers are and what they do. Here’s how to get that info:


[root@CIT-X5ODA-ODABASE-NODE0 onecmd]# ./GridInst.pl -o -l
INFO : Logging all actions in /dev/null and traces in /dev/null
INFO : Loading configuration file /opt/oracle/oak/onecmd/onecommand.params...
The steps in order are...
Step 0 = ValidateParamFile
Step 1 = SetupNetwork
Step 2 = WriteNodelists
Step 3 = SetupSSHroot
Step 4 = SetupDNS
Step 5 = UpdateEtcHosts
Step 6 = SetTimezone
Step 7 = SetupNTP
Step 8 = SetupILOM
Step 9 = ValidateEnv
Step 10 = CreateUsers
Step 11 = SetupStorage
Step 12 = SetupSSHusers
Step 13 = InstallGIClone
Step 14 = RunGIClonePl
Step 15 = RunRootScripts
Step 16 = GIConfigAssists
Step 17 = CreateASMDiskgroups
Step 18 = InstallDBClone
Step 19 = RunDBClonePl
Step 20 = DbcaDB
Step 21 = SetupACFS
Step 22 = SetupASR
Step 23 = ResecureMachine
[root@CIT-X5ODA-ODABASE-NODE0 onecmd]#

Hopefully this tidbit of information helps someone. Also observe that the hostname is already configured when I ran GridInst.pl to resume the deployment and it isn’t the default of “test” anymore. This is because oakcli deploy had gotten to step 9 which by that time had set the host and IP information. Normally if you were to run this from step 0 (not sure if you can do this but I may be testing it), the hostname would still be set to the default “test”. Just something to be aware of.

I was pulling my hair out looking for the command as a subset of oakcli deploy. I knew the ability existed I just didn’t know the command until a coworker reminded me of the GridInst.pl command. Thanks Dave!!

Resize OVM Repository on the fly

A quick three step process for resizing your repositories (increase only- no shrinking).

 

  1. resize LUN on your storage array
  2. rescan physical disks on each OVM server and verify the new LUN size is reflected
  3. log into each OVM server and run the following command against your repository LUN

# tunefs.ocfs2 -S /dev/mapper/{UUID of repository LUN}

NOTE: you can get the path needed for the command above from OVM Manager.  Highlight the repository and select the info perspective.  It will show you the path right on that screen!

Once you run the command above- go back into OVM Manager and verify that your repository has resized.  I’ve tested this process in my sandbox lab running OVM 3.3.2 however be careful and test in your environment before doing this to production.

The maximum OCFS2 volume size is 64T.  I don’t know if that means that’s the maximum repository size but I don’t see anything contrary to that so far so I’m going with it for now.

Tracing disk performance visually in linux

speeding_1379651cMonitoring performance in Linux can wind up being a very boring proposition to some.  It typically involves a lot of command line work and interpretation by the person running the commands.  If you’re a command line junkie who’s really into iostat, sar, blktrace and bonnie++ then this stuff might be somewhat sacrilegious to you :).   If not- read on!

 

So typically your basic I/O monitoring consists of watching iostat and maybe vmstat as well as another handful of tools that tell you what’s going on with your I/O subsystem.  It can prove to be somewhat challenging to paint a picture to support your assertions that there are in fact bottlenecks or at the very least contention between reads and writes.  Gosh- it sure would be nice to show it visually and start to pick up on patterns and other things you just can’t see (or can be very tedious to sift through) in regular CLI based utilities.  Well, help has arrived!

 

By using a handful of tools freely available to you, there’s an easy way to get what you’re after.  Follow my example tutorial below and hopefully you can apply it to your own scenarios and do something good with it.  Enjoy!

 
Install Seekwatcher, blktrace and other tools needed:

# yum install blktrace python python-dev python-matplotlib cython gcc mencoder png2theora
# cd /var/tmp
# wget http://oss.oracle.com/mercurial/mason/seekwatcher/archive/tip.tar.gz
# tar xvzf tip.tar.gz
# cd seekwatcher-b392aeaf693b/
# python setup.py install

 
Generate a report of disk performance:

# seekwatcher -t dd.trace -o output.png -p 'dd if=/dev/zero of=/images/bigfile bs=1M count=4096' -d /dev/sdc1

 

This command will write 4 gigabytes of zero’s to the file /images/bigfile.  Then it will monitor the disk /dev/sdc1 which is where that filesystem resides.  The output of the command will create a file called output.png that looks like this (click on the image for a bigger size):

find
 
 
Generate a video of disk performance:

# seekwatcher -t dd.trace -o dd.mpg -p 'sync && echo 3 > /proc/sys/vm/drop_caches ; dd if=/dev/zero of=/images/bigfile bs=1M seek=500 count=4096' -d /dev/sdc1 --movie

 

This command is also writing 4 gigabytes of zero’s to the same file, however this time I told it to seek 500mb into the disk before writing (remember seek=write, skip=read with dd).  I also have a command running in another window that is reading another file at the same time to generate some seeks and read activity so you can see it.  Note that I’m flushing out the buffer cache to disk right before the dd command this time to avoid any potential of the buffer cache keeping physical writes from occurring. It could be useless as I don’t know for sure if the dd command is interacting with the filesystem layer and therefore the buffer cache when called in this manner. I just figured it would be a safe way to ensure the writes were physical and not virtual so they would show up here. The output of this command will create a file called dd.mpg that looks like this (my apologies, I can’t seem to figure out how to embed a video from my google drive so you’ll have to click on the link):

Solaris Commands – off the beaten path

Oracle blogger Giri Mandalika posted some useful and somewhat obscure commands that can be very useful in troubleshooting Solaris.  These commands mostly work with Solaris 10 however some of them are specific to Solaris 11:

 

Interrupt Statistics : intrstat utility
intrstat utility can be used to monitor interrupt activity generated by various hardware devices along with the CPU that serviced the interrupt and the CPU time spent in servicing those interrupts on a system. On a busy system, intrstat reported stats may help figure out which devices are busy at work, and keeping the system busy with interrupts.
eg.,
.. [idle system] showing the interrupt activity on first two vCPUs ..

# intrstat -c 0-1 5

device | cpu0 %tim cpu1 %tim
-------------+------------------------------
cnex#0 | 0 0.0 0 0.0
ehci#0 | 0 0.0 0 0.0
hermon#0 | 0 0.0 0 0.0
hermon#1 | 0 0.0 0 0.0
hermon#2 | 0 0.0 0 0.0
hermon#3 | 0 0.0 0 0.0
igb#0 | 0 0.0 0 0.0
ixgbe#0 | 0 0.0 0 0.0
mpt_sas#0 | 18 0.0 0 0.0
vldc#0 | 0 0.0 0 0.0

device | cpu0 %tim cpu1 %tim
-------------+------------------------------
cnex#0 | 0 0.0 0 0.0
ehci#0 | 0 0.0 0 0.0
hermon#0 | 0 0.0 0 0.0
hermon#1 | 0 0.0 0 0.0
hermon#2 | 0 0.0 0 0.0
hermon#3 | 0 0.0 0 0.0
igb#0 | 0 0.0 0 0.0
ixgbe#0 | 0 0.0 0 0.0
mpt_sas#0 | 53 0.2 0 0.0
vldc#0 | 0 0.0 0 0.0
^C


Check the outputs of the following as well.
# echo ::interrupts | mdb -k
# echo ::interrupts -d | mdb -k

 
Physical Location of Disk : croinfo & diskinfo commands
Both croinfo and diskinfo commands provide information about the chassis, receptacle, and occupant relative to all disks or a specific disk. Note that croinfo and diskinfo utilities share the same executable binary and function in a identical manner. The main difference being the defaults used by each of the utilities.
eg.,

# croinfo
D:devchassis-path t:occupant-type c:occupant-compdev
------------------------------ --------------- ---------------------
/dev/chassis//SYS/MB/HDD0/disk disk c0t5000CCA0125411FCd0
/dev/chassis//SYS/MB/HDD1/disk disk c0t5000CCA0125341F0d0
/dev/chassis//SYS/MB/HDD2 - -
/dev/chassis//SYS/MB/HDD3 - -
/dev/chassis//SYS/MB/HDD4/disk disk c0t5000CCA012541218d0
/dev/chassis//SYS/MB/HDD5/disk disk c0t5000CCA01248F0B8d0
/dev/chassis//SYS/MB/HDD6/disk disk c0t500151795956778Ed0
/dev/chassis//SYS/MB/HDD7/disk disk c0t5001517959567690d0

# diskinfo -oDcpd
D:devchassis-path c:occupant-compdev p:occupant-paths d:occupant-devices
------------------------------ --------------------- ----------------------------------------------------------------------------- -----------------------------------------
/dev/chassis//SYS/MB/HDD0/disk c0t5000CCA0125411FCd0 /devices/pci@400/pci@1/pci@0/pci@0/LSI,sas@0/iport@1/disk@w5000cca0125411fd,0 /devices/scsi_vhci/disk@g5000cca0125411fc
/dev/chassis//SYS/MB/HDD1/disk c0t5000CCA0125341F0d0 /devices/pci@400/pci@1/pci@0/pci@0/LSI,sas@0/iport@2/disk@w5000cca0125341f1,0 /devices/scsi_vhci/disk@g5000cca0125341f0
/dev/chassis//SYS/MB/HDD2 - - -
/dev/chassis//SYS/MB/HDD3 - - -
/dev/chassis//SYS/MB/HDD4/disk c0t5000CCA012541218d0 /devices/pci@700/pci@1/pci@0/pci@0/LSI,sas@0/iport@1/disk@w5000cca012541219,0 /devices/scsi_vhci/disk@g5000cca012541218
/dev/chassis//SYS/MB/HDD5/disk c0t5000CCA01248F0B8d0 /devices/pci@700/pci@1/pci@0/pci@0/LSI,sas@0/iport@2/disk@w5000cca01248f0b9,0 /devices/scsi_vhci/disk@g5000cca01248f0b8
/dev/chassis//SYS/MB/HDD6/disk c0t500151795956778Ed0 /devices/pci@700/pci@1/pci@0/pci@0/LSI,sas@0/iport@4/disk@w500151795956778e,0 /devices/scsi_vhci/disk@g500151795956778e
/dev/chassis//SYS/MB/HDD7/disk c0t5001517959567690d0 /devices/pci@700/pci@1/pci@0/pci@0/LSI,sas@0/iport@8/disk@w5001517959567690,0 /devices/scsi_vhci/disk@g5001517959567690

 
Monitoring Network Traffic Statistics : dlstat command
dlstat command reports network traffic statistics for all datalinks or a specific datalink on a system.
eg.,

# dlstat -i 5 net0
LINK IPKTS RBYTES OPKTS OBYTES
net0 163.12M 39.93G 206.14M 43.63G
net0 312 196.59K 146 370.80K
net0 198 172.18K 121 121.98K
net0 168 91.23K 93 195.57K
^C

For the complete list of options along with examples, please consult the Solaris Documentation.

 
Fault Management : fmstat utility
Solaris Fault Manager gathers and diagnoses problems detected by the system software, and initiates self-healing activities such as disabling faulty components. fmstat utility can be used to check the statistics associated with the Fault Manager.
fmadm config lists out all active fault management modules that are currently participating in fault management. -m option can be used to report the diagnostic statistics related to a specific fault management module. fmstat without any option report stats from all fault management modules.
eg.,

# fmstat 5
module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz
cpumem-retire 0 0 1.0 8922.5 96 0 0 0 12b 0
disk-diagnosis 1342 0 1.1 8526.0 96 0 0 0 0 0
disk-transport 0 0 1.0 8600.3 96 1 0 0 56b 0
...
...
zfs-diagnosis 139 75 1.0 8864.5 96 0 4 12 672b 608b
zfs-retire 608 0 0.0 15.2 0 0 0 0 4b 0
...
...
# fmstat -m cpumem-retire 5
NAME VALUE DESCRIPTION
auto_flts 0 auto-close faults received
bad_flts 0 invalid fault events received
cacheline_fails 0 cacheline faults unresolveable
cacheline_flts 0 cacheline faults resolved
cacheline_nonent 0 non-existent retires
cacheline_repairs 0 cacheline faults repaired
cacheline_supp 0 cacheline offlines suppressed
...
...

 
InfiniBand devices : List & Show Information about each device
ibv_devices lists out all available IB devices whereas ibv_devinfo shows information about all devices or a specific IB device.
eg.,

# ibv_devices
device node GUID
------ ----------------
mlx4_0 0021280001cee63a
mlx4_1 0021280001cee492
mlx4_2 0021280001cee4aa
mlx4_3 0021280001cee4ea

# ibv_devinfo -d mlx4_0
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.7.8130
node_guid: 0021:2800:01ce:e63a
sys_image_guid: 0021:2800:01ce:e63d
vendor_id: 0x02c9
vendor_part_id: 26428
hw_ver: 0xB0
board_id: SUN0160000002
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 56
port_lid: 95
port_lmc: 0x00
link_layer: IB

port: 2
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 56
port_lid: 96
port_lmc: 0x00
link_layer: IB

Other commands and utilities such as ibstatus, fwflash or cfgadm can also be used to retrieve similar information.

PCIe Hot-Plugging : hotplug command
When the hotplug service is enabled on a Solaris system, hotplug command to bring hot pluggable devices online or offline without physically adding or removing the device from the system.
The following command lists out the all physical [hotplug] connectors along with the current status.
eg.,

# hotplug list -c
Connection State Description
________________________________________________________________________________
IOU2-EMS2 ENABLED PCIe-Native
IOU2-PCIE6 ENABLED PCIe-Native
IOU2-PCIE7 EMPTY PCIe-Native
IOU2-PCIE4 EMPTY PCIe-Native
IOU2-PCIE1 EMPTY PCIe-Native

For detailed instructions to hotplug a device, check the Solaris documentation out.

Using UUID’s to troubleshoot and understand Oracle VM (part 1 of 2)

As quoted from Wikipedia:

A universally unique identifier (UUID) is an identifier standard used in software construction. A UUID is simply a 128-bit value. The meaning of each bit is defined by any of several variants.

For human-readable display, many systems use a canonical format using hexadecimal text with inserted hyphen characters. For example:
de305d54-75b4-431b-adb2-eb6b9e546013.

The intent of UUIDs is to enable distributed systems to uniquely identify information without significant central coordination. In this context the word unique should be taken to mean “practically unique” rather than “guaranteed unique”. Since the identifiers have a finite size, it is possible for two differing items to share the same identifier. This is a form of hash collision. The identifier size and generation process need to be selected so as to make this sufficiently improbable in practice. Anyone can create a UUID and use it to identify something with reasonable confidence that the same identifier will never be unintentionally created by anyone to identify something else. Information labeled with UUIDs can therefore be later combined into a single database without needing to resolve identifier (ID) conflicts.

Adoption of UUIDs is widespread with many computing platforms providing support for generating UUIDs and for parsing/generating their textual representation.

As an expression of just how unique a UUID actually is, you would have to create 1 trillion UUIDs every nanosecond for 10 billion years to exhaust the number of UUIDs available.

 

Oracle VM uses UUID’s in multiple places, as does Linux itself. Below, we will identify the more important usage of UUID’s in Oracle VM and Linux in general.

Storage Repository UUID:
Generated at filesystem creation time, this UUID is stored in a few places including the path of the actual repository when it’s mounted:

[root@OVM ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              19G  9.3G  8.8G  52% /
/dev/sda1              99M   48M   46M  52% /boot
tmpfs                 903M     0  903M   0% /dev/shm
none                  903M  336K  902M   1% /var/lib/xenstored
/dev/mapper/36001f9300200e0000007000200000000
                       12G  359M   12G   3% /poolfsmnt/0004fb0000050000d465e496e0f1a989
/dev/mapper/36001f9300200e000001d000200000000
                      800G  449G  352G  57% /OVS/Repositories/0004fb0000010000c93e88783207de86
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It’s also located in the .ovsrepo file identified by OVS_REPO_UUID= inside the filesystem when the repository is created by OVM Manager:

OVS_REPO_UUID=0004fb0000010000c93e88783207de86
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OVS_REPO_VERSION=3.0
OVS_REPO_MGR_UUID=0004fb0000010000d395249d92ccea86
OVS_REPO_ALIAS=My Big Fat Repository

Also you can see the UUID of the repository by using the mounted.ocfs2 command:

# mounted.ocfs2 -d
Device		Stack 	Cluster 	F 	UUID 								Label
/dev/sdb1 	o2cb 	ocfs2demo 		0004fb0000010000c93e88783207de86 	ocfs2demo
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

OVM Manager UUID:
When OVM Manager is installed, it generates a UUID that gets linked to the database instance for that installation and is stored in numerous locations. The first place it is stored is in the configuration file on the OVM Manager system itself which is located at /u01/app/oracle/ovm-manager-3/.config. It is identified by UUID=

[root@ovmmgr ~]# cd /u01/app/oracle/ovm-manager-3/
[root@ovmmgr ovm-manager-3]# cat .config
DBTYPE=MySQL
DBHOST=localhost
SID=ovs
LSNR=49500
OVSSCHEMA=ovs
APEX=8080
WLSADMIN=weblogic
OVSADMIN=admin
COREPORT=54321
UUID=0004fb00000100008edf7365808c02d3
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
BUILDID=3.3.1.1065

It can also be located inside the /.ovspoolfs file on the pool filesytstem LUN. It is identified by OVS_POOLFS_MGR_UUID=. It’s also found on each OVM server in the /.ovsrepo file on each repository filesystem LUN identified by OVS_REPO_MGR_UUID=.

LUN UUID:
When a LUN is created on a SAN, it is given a UUID or sometimes called a GUID (same difference). With a local disk on Linux, the OS assigns a UUID to each disk so it can identify it uniquely. When LUN’s are presented to Linux (as well as most other OS’s), the OS reads page 83 of the LUN’s VPD (Vital Product Data) table and determines its UUID. This is how the dm-multipath driver in the linux kernel for example is smart enough to know that /dev/sdk, /dev/sdl, /dev/sdm and /dev/sdn are all different paths to the same LUN. The /dev/sd# construct is just an enumeration of each device that is presented to the kernel in the order it’s presented each time the system boots. This is also why it’s generally a good idea to identify a LUN or chunk of storage which is presented to a server by it’s UUID rather than the /dev/sd# label. In Linux, depending on the boot process and the order in which LUN’s are enumerated by the kernel, there’s no guarantee that the first time the system boots it will see each LUN in the same exact order as the second time. What is currently /dev/sdk could wind up getting enumerated as /dev/sdp the next time it boots. It’s all about timing, and certainly not a way to maintain consistency when trying to mount filesystems.

This is why you may sometimes see the following type of notation in /etc/fstab:

#
# /etc/fstab
# Created by anaconda on Mon Jul 29 16:36:37 2013
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=978c7340-c929-4321-859a-2b168a258ebc /                   ext4    defaults        1 1

instead of this:

#
# /etc/fstab
# Created by anaconda on Mon Jul 29 16:36:37 2013
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/sda1	 /                   ext4    defaults        1 1

In part 2, I’ll talk about ways to correlate the different locations of UUID’s to figure out why things aren’t working and how to fix them. I’ll give a few common scenarios and walk through how to deal with them. Stay tuned!