Nimble PowerShell Toolkit

I was working on an internal project to test performance of a converged system solution.  The storage component is a Nimble AF7000 from which we’re presenting a number of LUNs.  There are almost 30 LUNs and I’ve had to create, delete and provision them a number of times throughout the project.  It became extremely tedious to do this through the WebUI so I decided to see if it could be scripted.

I know you can log into the nimble via ssh and basically do what I’m trying to do- and I did test this with success.  However I’ve recently had a customer who wanted to use PowerShell to perform some daily snapshot/clone operations for Oracle database running on windows (don’t ask).  We decided to leverage the Nimble PowerShell Toolkit to perform the operations right from the windows server.  The script was fairly straightforward, although we had to learn a little about PowerShell syntax and such.  I’ve included a sanitized script below that basically does what I need to.

$arrayname = "IP address or FQDN of array management address"
$nm_uid = "admin"
$nm_password = ConvertTo-SecureString -String "admin" -AsPlainText -Force
$nm_cred = New-Object -TypeName System.Management.Automation.PSCredential -ArgumentList $nm_uid,$nm_password
$initiatorID = Get-NSInitiatorGroup -name {name of initiator group} | select -expandproperty id

# Import Nimble Tool Kit for PowerShell
import-module NimblePowerShellToolKit

# Connect to the array
Connect-NSGroup -group $arrayname -credential $nm_cred

# Create 10 DATA Disks
for ($i=1; $i -le 10; $i++) {
    New-NSVolume -Name DATADISK$i -Size 1048576 -PerfPolicy_id 036462b75de9a4f69600000000000000000000000e -online $true
    $volumeID = Get-NSVolume -name DATADISK$i | select -expandproperty id
    New-NSAccessControlRecord -initiator_group_id $initiatorID -vol_id $volumeID
}

# Create 10 RECO Disks
for ($i=1; $i -le 10; $i++) {
    New-NSVolume -Name RECODISK$i -Size 1048576 -PerfPolicy_id 036462b75de9a4f69600000000000000000000000e -online $true
    $volumeID = Get-NSVolume -name RECODISK$i | select -expandproperty id
    New-NSAccessControlRecord -initiator_group_id $initiatorID -vol_id $volumeID
}

# Create 3 GRID Disks
for ($i=1; $i -le 3; $i++) {
    New-NSVolume -Name GRIDDISK$i -Size 2048 -PerfPolicy_id 036462b75de9a4f69600000000000000000000000e -online $true
    $volumeID = Get-NSVolume -name GRIDDISK$i | select -expandproperty id
    New-NSAccessControlRecord -initiator_group_id $initiatorID -vol_id $volumeID
}

I also wrote a script to delete the LUNs below:

$arrayname = "IP address or FQDN of array management address"  
$nm_uid = "admin"
$nm_password = ConvertTo-SecureString -String "admin" -AsPlainText -Force
$nm_cred = New-Object -TypeName System.Management.Automation.PSCredential -ArgumentList $nm_uid,$nm_password
$initiatorID = Get-NSInitiatorGroup -name {name of initiator group} | select -expandproperty id

# Import Nimble Tool Kit for PowerShell
import-module NimblePowerShellToolKit

# Connect to the array 
Connect-NSGroup -group $arrayname -credential $nm_cred


# Delete 10 DATA Disks
for ($i=1; $i -le 10; $i++) {
    Set-NSVolume -name DATADISK$i -online $false
    Remove-NSVolume -name DATADISK$i
}

# Delete 10 RECO Disks
for ($i=1; $i -le 10; $i++) {
    Set-NSVolume -name RECODISK$i -online $false
    Remove-NSVolume -name RECODISK$i 
}

# Delete 3 GRID Disks
for ($i=1; $i -le 3; $i++) {
    Set-NSVolume -name GRIDDISK$i -online $false
    Remove-NSVolume -name GRIDDISK$i 
}

Obviously you’ll have to substitute some of the values such as $arrayname, $nm_uid, $nm_password and $initiatorID (make sure you remove the {}’s when you put your value here). This is a very insecure method of storing your password but it was a quick and dirty solution at the time. There are ways to store the value of a password from a highly secured text file and encrypt it into a variable. Or if you don’t mind being interactive, you can skip providing the credentials and it will pop up a password dialog box for you to enter them every time the script runs.

It made the project go a lot faster- hopefully you can use this to model different scripts to do other things. The entire command set of the Nimble array is basically exposed through the toolkit so there’s not a whole lot you can’t do here that you could in the WebUI. When you download the toolkit- there is a README PDF that goes through all the commands. When in PowerShell, you can also get help for each of the commands. For example:

PS C:\Users\esteed> help New-NSVolume

NAME
    New-NSvolume

SYNOPSIS
    Create operation is used to create or clone a volume. Creating volumes requires name and size attributes. Cloning
    volumes requires clone, name and base_snap_id attributes where clone is set to true. Newly created volume will not
    have any access control records, they can be added to the volume by create operation on access_control_records
    object set. Cloned volume inherits access control records from the parent volume.


SYNTAX
    New-NSvolume [-name] <String> [-size] <UInt64> [[-description] <String>] [[-perfpolicy_id] <String>] [[-reserve]
    <UInt64>] [[-warn_level] <UInt64>] [[-limit] <UInt64>] [[-snap_reserve] <UInt64>] [[-snap_warn_level] <UInt64>]
    [[-snap_limit] <UInt64>] [[-online] <Boolean>] [[-multi_initiator] <Boolean>] [[-pool_id] <String>] [[-read_only]
    <Boolean>] [[-block_size] <UInt64>] [[-clone] <Boolean>] [[-base_snap_id] <String>] [[-agent_type] <String>]
    [[-dest_pool_id] <String>] [[-cache_pinned] <Boolean>] [[-encryption_cipher] <String>] [<CommonParameters>]


DESCRIPTION
    Create operation is used to create or clone a volume. Creating volumes requires name and size attributes. Cloning
    volumes requires clone, name and base_snap_id attributes where clone is set to true. Newly created volume will not
    have any access control records, they can be added to the volume by create operation on access_control_records
    object set. Cloned volume inherits access control records from the parent volume.


RELATED LINKS

REMARKS
    To see the examples, type: "get-help New-NSvolume -examples".
    For more information, type: "get-help New-NSvolume -detailed".
    For technical information, type: "get-help New-NSvolume -full".

You can also use the -detail parameter at the end to get a more complete description of each option. Additionally you can use -examples to see the commands used in real world situations. Have fun!

Advertisements

Windows Wifi troubleshooting tools

Have you ever tried connecting your laptop to a Wifi network and for one reason or another it failed?  It can be extremely frustrating, even to a seasoned vet who knows their way around Windows.  The big problem is that you get virtually no information from the connection failure.  No logs, no error codes, nothing.

There is a reporting tool that is built into windows 8 and newer that will compile a very detailed report showing each connection attempt, its status and a ton of other stuff.  Here’s how to run the report and where it gets put:

 

  • Open a command prompt as administrator
  • Run the following command
    • netsh wlan show wlanreport
  • Note the path where the html file is generated.  It should be C:\ProgramData\Microsoft\Windows\WlanReport\wlan-report-latest.html
  • Open your favorite web browser and point it to that file.  voila!

 

Here’s a snippet of what some of the report looks like:

capture

 

There is a LOT more information below this including extremely detailed information about all the network adapters on the system, the system itself and some script output.  If you click on the items in the session list above, it will bring you to a detailed log of that session and why it was able to or not able to connect.

 

Suffice it to say this is an invaluable tool to review logs and information all in one place.

Hardware Virtualized VM’s on ODA – One Click!

Untitled

 

I previously wrote an article on how to install Windows on a virtualized ODA.  In that article I stated that running Windows on an ODA was not supported.  I’m starting to lean away from that stance for a couple reasons.  One of them is the continued Oracle InfoDoc’s I see being written on how to run an HVM virtual machine on an ODA.  The other and perhaps more compelling reason is due to an excerpt from the oakcli command reference documentation- specifically the “-os” parameter of the oakcli configure vm command:

 

oakcli configure vm

Use the oakcli configure vm command to configure a virtual machine on Oracle Database Appliance Virtualized Platform and to increase or decrease resource allocation to user domains. You must restart the domain for the resource allocation change to take effect.

Syntax

oakcli configure vm name [-vcpu cpucount -maxvcpu maxcpu -cpuprio priority 
-cpucap cap -memory memsize -maxmemory max_memsize -os sys -keyboard lang -mouse 
mouse_type -domain dom -network netlist -autostart astart -disk disks -bootoption
bootstrap -cpupool pool -prefnode 0|1 -failover true|false][-h]

Parameters

Parameter Description
name The name assigned to the virtual machine.
-vcpu cpucount Number of nodes assigned to the virtual machine. The range is 1 to 72. This number depends on your Oracle Database Appliance configuration:

  • On Oracle Database Appliance X5-2, the range is from 1 to 72.
  • On Oracle Database Appliance X4-2, the range is from 1 to 48.
  • On Oracle Database Appliance X3-2, the range is from 1 to 32.
  • On Oracle Database Appliance, the range is 1 to 24.
-maxvcpu maxcpu Maximum number of CPUs that the virtual machine can consume. The range is 1 to 72. This number depends on your Oracle Database Appliance configuration:

  • On Oracle Database Appliance X5-2, the range is from 1 to 72.
  • On Oracle Database Appliance X4-2, the range is from 1 to 48.
  • On Oracle Database Appliance X3-2, the range is from 1 to 32.
  • On Oracle Database Appliance version 1, the range is 1 to 24.
-cpuprio priority Priority for CPU usage, where larger values have higher priority. The range is 1 to 65535,
-cpucap cap Percentage of a CPU the virtual machine can receive. The range is 10 to 100.
-memory memsize Amount of memory given to the virtual machine: (1 to 248)G to (1to 760G) or (1 to 253952)M to (1 to 778240)M, based on RAM. The default is M.
-maxmemory max_memsize Maximum amount of memory allowed for the virtual machine: (1 to 248)G to (1 to 760)G or (1-253952)M to (1-778240)M, based on RAM. The default is M.
-os sys Operating system used by the virtual machine (WIN_2003, WIN_2008, WIN_7, WIN_VISTA, OTHER_WIN, OL_4, OL_5, OL_6, RHL_4, RHL_5, RHL_6, LINUX_RECOVERY, OTHER_LINUX, SOLARIS_10, SOLARIS_11, OTHER_SOLARIS, or NONE)
-keyboard lang Keyboard used by virtual machine (en-us, ar, da, de, de-ch, en-gb, es, et, fi, fo, fr, fr-be, fr-ca, hr, hu, is, it, ja, lt, lv, mk, nl, n–be, no, pl, pt, pt-br, ru, sl, sv, th, or tr)
-mouse mouse_type Mouse type used by the virtual machine (OS_DEFAULT, PS2_MOUSE, USB_MOUSE, or USB_TABLET)
-domain dom Domain type from the following options:

  • Hardware virtualized guest (XEN_HVM)

    – The kernel or operating system is not virtualization-aware and can run unmodified.

    – Device drivers are emulated.

  • Para virtualized guest (XEN_PVM)

    – The guest is virtualization-aware and is optimized for a virtualized environment.

    – PV guests use generic, idealized device drivers.

  • Hardware virtualized guest (XEN_HVM_PV_DRIVERS)

    The PV drivers are hypervisor-aware and significantly reduce the overhead of emulated device input/output.

  • Hardware virtualized guest (UNKNOWN)
-network netlist MAC address and list of networks used by the virtual machine
-autostart astart Startup option for virtual machine (always, restore, or never)
-disk disks List of disks (slot, disktype, and content) used by virtual machine
-bootoption bootstrap Boot option used to bootstrap the virtual machine (PXE, DISK, or CDROM)
-cpupool pool Named CPU pool assigned to the virtual machine
-prefnode 0|1 Preferred node on which the virtual machine will attempt to start (Node 0 or Node 1). This parameter is only valid for virtual machines created in shared repositories.
-failover true|false Allow (use the keyword “true”) or disallow (use the keyword “false”) the virtual machine to start or restart on a node other than the node defined by the -prefnode parameter. This parameter is only valid for virtual machines created in shared repositories.
-h (Optional) Display help for using the command.

Note the selection of Operating Systems you have to choose from.  The list includes the following Operating Systems:

  • Windows 2003
  • Windows 2008
  • Windows 7
  • Windows Vista
  • Other Windows
  • Oracle Linux 4, 5 and 6
  • Red Had Linux 4, 5 and 6
  • Linux Recovery
  • Other Linux
  • Solaris 10 and 11
  • Other Solaris

 

To me- this is a strong indicator that you should be able to run a VM that isn’t created from a template- including Windows!  It gets even better.  I stumbled across an InfoDoc (2099289.1) which was created by a gentleman named Ruggero Citton.  In that document, he shows how to deploy an HVM virtual machine on an ODA with a single command.  This automates all the manual steps that you used to have to do in order to run an HVM virtual machine including manually creating virtual disk images and editing the vm.cfg file for the VM.

 

Check out the InfoDoc for more information including the Perl script that does all of the automation.  I was able to follow his instructions and successfully created a Windows 2008 r2 vm.  Based on the parameters in the oakcli configure vm command, I’m a lot more comfortable with at least telling customers about this capability.  I still want to confirm that a VM which was created in this fashion wouldn’t cause Oracle to not support the customer- I’ll post an update when I find out for sure.

Are IT pillars narrowing?

Back in the 1990’s when I got into the IT field, things were obviously different than now.  The ways in which things are different are somewhat alarming to me.  I’m talking about IT specializations such as network, server, desktop and virtualization engineers for example.  

I’m starting to see focus areas narrow to specific areas of expertise as things get more and more complex. Not too long ago if you were a server admin, chances are you worked on anything from racking and stacking the server to setting up ports on the switch to allocating storage and installing the OS.

Not so much these days. We have facilities engineers who rack the server and run the cables. We have a network team split into multiple sub teams based on function who get the network portion done. Then we have a SAN team who provisions the storage. Finally your server admin does the install and patches.

To some degree this is a good thing.  There are increasingly more and more complex technologies that require a higher level of training and knowledge to effectively configure and support.  It makes me somewhat nervous if I’m going to be required to limit my expertise to just a small focus area.  

What happens when a paradigm shift occurs in the tech world?  Take Novell Netware for example.  It used to be almost a given that most companies used it for file and print.  Some of you reading this article today may not even know what Netware is.  If I were a Netware admin I’d be kinda screwed right about now wouldn’t I?

Maybe I’m just being paranoid but maybe not?  I personally have branched out into a number of fields including OS support, virtualization, storage and networking.  I enjoy all these technologies and it makes me somewhat more valuable because I can now integrate these technologies better than someone who focuses on only 1 pillar.  Even though I do it at the expense of not being as “deep” in any particular field, I’m okay with this because it let’s me explore more and be better prepared for that big paradigm shift that’s out there somewhere.

What are your thoughts on this?  I’d love to hear your point of view!

Join a Solaris 10 or 11 server to a Microsoft AD domain

sol9desktop2

I found a great infodoc that explains how to join an Oracle Solaris 10 or 11 server to Microsoft Active Directory for authentication.  The Solaris 10 procedure is not supported by Oracle so you may want to stick with 3rd party tools like Centrify or Likewise to at least provide a support mechanism from them in the event of any problems.

The Solaris 11 solution is however supported via kerberos and I know a few customers of my own that would be interested in implementing this!

The infodoc number is 1485462.1 and it contains references to other documents within MOS so you’ll need an account to access them.

Run a Windows guest VM on an ODA

Untitled

The Oracle Database Appliance was originally released back in 2011 to bridge the gap between the Exadata and smaller configurations. The first version of the ODA consisted of 2 compute nodes and a shared storage shelf. The compute nodes had 2x 6 core processors and 96gb of memory. The software itself started out as a bare metal deployment only and eventually came to include either that or a virtualized image that employed Oracle VM for x86 software. This was done to allow customers to more fully leverage the cores and memory available to the ODA that weren’t used by the database. The first release to support this configuration was version 2.5 which came out around 2013.

For those of you who use an ODA (Oracle Database Appliance) in your workplace and have deployed the virtualized image to leverage the capabilities of OVM, you know that Oracle only offers templates based on Oracle Linux. The entirety of the OVM command set does not exist on the ODA as it does in a normal deployment of OVM with OVM Manager. The API used to interact with OVM on the ODA is the OAKCLI command. This is short for Oracle Appliance Kit Command Line Interface and it’s used to manage almost every aspect of the configuration. There is no OVM Manager CLI nor is there a GUI/BUI short of the configuration manager used to deploy the database so we’re limited to what is offered in OAKCLI.

If you who run a mixed shop consisting of both Linux and Windows, this article may be of some use to you. In order to run Windows (or really anything other than Oracle Linux or Oracle Solaris) inside of OVM, you have to create what’s called an HVM Virtual Machine. This is what’s referred to as a Hardware Virtualized Machine. It’s similar to how VMware works in that everything down to the BIOS and chipset is virtualized and presented to the OS. The operating system in this case has no idea it’s being virtualized. Oracle Linux and Oracle Solaris on the other hand, when installed inside a PV (ParaVirtualized) VM are fully aware of the fact that they are in a VM and take advantage of it inside the kernel to talk directly to the hardware in a more efficient manner.

What does all this mean to me? It means that I can install any OS that is a supported OVM guest OS, not just the templates that are available for download and use on an ODA. There are a lot of manual steps involved and precious little of it uses the OAKCLI framework. One thing you do need to understand is that this installation method is not in any way supported by Oracle. Your mileage may vary when trying to get support for any issues related to guest performance, or even configuration or stability. Having said that, Oracle has created a short tutorial on how to do this, the infoDoc number is 1524138.1.

The thickening debate over thin provisioning

e3_iSCSI-LUN-Thin-provisioningI’ve been working with different storage vendors for a number of years now.  Most if not all of them now support thin provisioning as a way to fully maximize your storage utilization.  Before thin provisioning came along, you had to pre-allocate X amount of storage to a given application and then it was usable only by that application.  This proved to be potentially very wasteful as most admins don’t allocate only the amount of storage they need, they project growth over usually a 2-3 year period at least so the “left over” space that wasn’t being used at the time was locked away and basically wasted if not used.

 

Enter thin provisioning.  Now we have a way of overcommitting our storage to get better utilization of the storage we have.  For example, let’s say you allocate a thin provisioned LUN to a server or VM of 100GB.  Initially that 100GB LUN occupies zero space on the SAN.  At some point the LUN is used to store data on the server, let’s say the server has written 20GB of data to the volume.  On the SAN side, the only blocks of data that are used at this point are the 20GB that has been written.  The remaining 80GB is still unused on the SAN and can be allocated to other volumes.  Now here’s where it gets tricky and you have to be careful.  Let’s say we have a SAN with 1TB capacity (I know- small by today’s standards but work with me on this).  Let’s show an example of storage allocation to illustrate my point:

  • 5x 100GB thin provisioned LUNS -> windows server
  • 1x 300GB thin provisioned LUN -> Linux server
  • 2x 400GB thin provisioned LUN’s -> VMware environment.

Wait a minute- that’s 1.6TB and you told me I only had 1TB!  This is exactly what I’m referring to when I say you have to be careful.  In the example above, we’ve actually given out more storage than what we have.  This magic is possible due to the fact that the SAN tells the “clients” that they have whatever size LUN we’ve given them and they all believe they actually have that much space.  In actuality, we have a pool of storage from which we can allocate blocks to each of the LUN’s that I have presented.  Don’t worry- this is actually a fairly common scenario in the real world.  The differentiator here is that it is done with full realization of how the data is used in conjunction with closely monitoring available space on the SAN.

 

Here’s a quick explanation of what happens if you don’t do it right.  As blocks of data are written to the LUNs, storage blocks are allocated on the SAN and marked as used and the pool of available storage shrinks by that much.  At some point, if you continue to use the storage, you will find yourself in the unenviable position of running out of space on the SAN.  What’s worse, all of the consumers of the storage still think they’ve got more space to use so they ALL run out of space at the same time.  We all know what happens when windows can’t write to the C: drive or Linux can’t write to the root filesystem or VMware can’t write to it’s datastore.  You’re blessed with a blue or purple screen of death or kernel panics on all of your clients.  It’s not the server’s fault- you told them they had the storage space.  It’s not the SAN’s fault, you told it to allocate that much storage to each server.  Guess who’s fault it is?  You guessed it- YOURS!  This is a very real possibility and I’ve worked with multiple customers who have run into this unenviable problem.  In some cases, depending on the scope of how many clients were impacted and how hard recovery was or if it was even possible to recover from, unlucky admins have even experienced an unplanned RGE.  RGE stands for Resume Generating Event.  Ya don’t wanna be that guy…

fired-employee

 

One nasty side effect of thin provisioning however we haven’t even discussed yet.  The thickening… oooh that sounds scary!  Albeit not as scary as an RGE, still it’s something to be aware of for sure.  Basically it has to do with what happens when you thin provision a LUN and the consumer has been using it for awhile.  When you first assign thin provisioned storage to a consumer, it occupies no space.  In my example, I have a HP P4000 SAN connected to a VMware ESXi 5.1 server with a Windows 2008 R2 VM running on it.  I’ve assigned a 1TB Thin Provisioned LUN from the P4000 directly to the VM via an RDM (Raw Device Mapping) in VMware.  Here’s a picture of what your storage looks like at this point:

start

At this point, everything’s fine.  From all perspectives, I have 1TB of space available on my shiny new LUN and no space has been used yet.  So time goes by and I start using my storage in Windows.  I’ve copied about 850GB of data to the LUN consisting of some ISO images, a few database backups and some program dumps.  Now here’s what things look like storage wise:

data written

Ok, nothing unexpected here.  I’ve copied about 850gb of data to the LUN and as expected, I’m using up about 850GB of storage on my SAN.  VMware also tracks how much space I’m using and it also reports the same amount.  The windows VM also thinks that it’s using up 850gb.  So far so good!  Now, more time has gone by and I’ve deleted some unneeded files- I got rid of some of the program dumps that are no longer valid and a few database backups as well.  Now I’m down to about 400GB used.  Here’s where things tend to sneak up on you and make life interesting.  WIndows thinks I’m only using 400GB of space- rightfully so as I’ve deleted some files.  However from the SAN’s perspective, I’m still using 850GB.  How is this possible?  I deleted the files in windows and that space is no longer needed.  Well, the SAN doesn’t know that.  To get a better understanding of why there is now a discrepancy we first need to discuss just what actually happens when you delete files in windows.

 

This is a very simplified description of what happens, I’m not going to go into too much detail here except to cover the basic principle of what happens when you delete files.  It is out of the scope of this document to talk about things like media descriptors and byte offsets or all of the other minute details of NTFS and how it works.  When you delete a file, you’re actually just de-referencing the data that is stored on the disk.  There is a MFT (Master File Table) that keeps track of where all the files reside on disk as well as a mapping or offset to the first block of data in that file.  There is also a cluster bitmap which is a table that is responsible for letting windows know where it can write new blocks of data and where it can’t.  When a file is deleted in windows, it is marked as deleted in the MFT.  The cluster bitmap is also updated to reflect those previously allocated blocks as now eligible to be written to- this is why deleting files is so fast.  It would be terribly inefficient and wasteful for windows to actually go out and write all zeros to each and every block of a file that you deleted wouldn’t it?  It would take minutes to delete even a 50GB file.  So even though the file was marked as deleted in the MFT, the actual data that made up the contents of the file that was deleted is still sitting on disk, it’s just currently inaccessible through windows explorer.  Technically speaking, if you wanted to recover that file and nothing had been written to the disk since you deleted it, you could theoretically go out and recreate the entry in the MFT if you knew what the initial offset of the first block of the file’s data was.  Back to our scenario- here’s a depiction of what things look like at this point:

data deleted

Now that we understand in a basic fashion what happens when a file is deleted in windows, we can move on to why the heck the SAN still thinks that data is there and being used.  Understand that the storage layer doesn’t have any visibility into windows, VMware or any other layers past the point where it presented the LUN.  It doesn’t know that my windows VM just deleted 450GB of data from the LUN.  It just knows that over time, it’s been asked to write blocks of data totaling 850GB to this point.  So given what we now know happens when you delete a file in windows, it would make sense that the full 850GB of data is still what the storage layer thinks it needs to hold on to.

NOTE: VMware’s VAAI UNMAP storage API as well as VMware tools installed inside the guest VM can provide VMware with some insight into what’s actually going on inside windows.  In the case of VAAI UNMAP, it can potentially reverse some of the negative effects of thin provisioning.  For the purpose of our discussion however, let’s suspend disbelief and assume that we don’t have VAAI or the VMware tools installed and configured.

To take this scenario one step further, let’s look at one more tweak to our scenario.  Say we’ve deleted the data and now are sitting at 400GB used space from windows’ perspective.  What do you suppose would happen if we wrote another 500GB of new data in windows?  Recall what happened when we deleted the other data in the first place- the blocks of data are simply de-referenced.  And the SAN doesn’t know anything about data being deleted.  Since the SAN thinks that the old data still exists, including the data we deleted in windows, new writes would have to go somewhere else wouldn’t they?  Wait a minute- we had 850GB of data to begin with, then we deleted 450GB but we added 500GB of new data.  By my math, that’s 1350GB which is more than the 1TB that we have!  I’m sure you’re asking yourself at this point, wouldn’t I run out of space?  The answer is no and I’ll tell you why- because windows told the SAN where to write the new 500GB of data.  How this works is that some of the blocks will be written where the deleted data was previously stored and some will be written to empty blocks.  If you’re thoroughly confused at this point, allow me to show you visually how this is possible.  First though, you need to understand that the blocks that windows writes are logically mapped to blocks on the SAN.  A block that is written to storage location 0x8000 according to the windows cluster bitmap may actually be written to storage location 0x15D1DC7 according to the SAN.  They both maintain a table of blocks and their corresponding location, but they are two different tables and those tables are mapped dynamically as data is written.  This diagram shows how NTFS maps data blocks as they are written, deleted and re-written to:

ntfs block mapping

The graphic above illustrates two important things.  The first is that even though we thought we had used up 850GB of 1TB on the SAN, we know in reality we still had more space in windows because we deleted files.  With the diagram above I’ve shown how windows tells the storage layer “I know I told you to write these blocks before, forget that and write these new blocks there instead”.  We’re still limited to 1TB of total space but because of how the blocks are logically mapped from windows to the SAN, we still have room to store our data.  The second thing that I’ve illustrated above is the thickening concept that this article is all about.  If you think of a high-water mark on the SAN as data gets written, deleted and re-written, then you start to understand why thin provisioned volumes eventually get “thickened” over time.  Because of this, you should have a set of guidelines in place to help you determine when it’s appropriate to thin provision and when to use thick volumes in the first place.  One primary rule of thumb that I use is this: if the LUN will be written to frequently, it’s probably not a good candidate for thin provisioning.  You won’t really gain much space savings in the end and could potentially find yourself in that ugly out of space problem I mentioned early on.

Hopefully this has shed some light on what happens over time with thin provisioned volumes and what can happen if you’re not paying attention.