Nimble CASL filesystem overview

nimbleChassis

 

As a Nimble partner, Collier IT has worked with a number of customers to install, configure and deploy a number of Nimble arrays.  We also have a CS-300 and a CS-500 in our lab so I get to use it on a daily basis.  I wanted to take some time to give a quick overview of how Nimble’s CASL filesystem works and how it differentiates itself from other storage vendor technologies.

 

These days, you can’t throw a rock in the Storage industry without hitting about a dozen new vendors every month touting their flashy new storage array and how it’s better than all the others out there.  I’ve used a lot of them and they all seem to share some common pitfalls:

 

 

CI-Generic-Chart.jpg

  • Performance drops as the array fills up, sometimes at only 80% of capacity or less!

Nobody in their right mind would intentionally push their SAN up to 100% utilization.  It’s not that hard though to lose track of thin provisioning and how much actual storage you have left.  As most storage arrays get close to full capacity, they start to slow down.  The reason for this is based on how the filesystem architecture is designed to handle free space and how it writes to disk.  Consider NetApp’s WAFL filesystem for example.  It is based on the WIFS (Write In Free Space) concept.  As new blocks are written to disk, instead of overwriting blocks in place (which adds a TON of overhead), it is redirected to a new location on disk and an is written as a full stripe across the disk group.  It allows for fast sequential reads because the blocks are laid out in a very contiguous manner.  Once the amount of free space starts to diminish, you find that what’s left is very scattered and located in different locations on the array.  It’s not contiguous anymore and there is significantly more time spent seeking for reads and writes, slowing down the array.

One of the big benefits of the CASL filesystem is that it even though it is also a WIFS filesystem, it does not fill holes.  Instead, it uses a lightweight sweeping process to consolidate the holes into full stripe writes in free space in the background.  The filesystem is designed from the ground up to run these sweeps very efficiently, and it also utilizes flash disk to speed up the process even more.  What this does is allow the array to ALWAYS write in a full stripe.  What’s more- CASL is also able to combine blocks of different size into a full stripe.  The big benefit here is that you get a very low overhead method of performing compression inline that doesn’t slow the array down.

 

 

innacuracy.jpg

  • Write performance is inconsistent, especially when dealing with a lot of random write patterns

The CASL filesystem has been designed from the ground up to deal with one of the Achilles heel of most arrays: the cache design.  Typical arrays will employ flash or DRAM as a cache mechanism for a fast write acknowledgement.  This is all well and good until you get into a situation where you have a lot of random reads and writes over a sustained period of time at a high rate of throughput.  This is most I/O workloads now with virtualization and storage consolidation.  We aren’t just streaming video anymore folks- we’re updating dozens of storage profiles simultaneously each with their own read and write characteristics.  The problem with other storage array caching mechanisms is that once this sustained load gets to the point where the controller(s) can’t flush the cache to spinning disk as fast as the data is coming in, you get throttled.

Nimble has a different approach to caching that was designed from the ground up to be not only scalable but media agnostic.  It doesn’t matter if you’re writing to spinning disk or SSD.  Here’s a quick breakdown:

  1. Write is received at storage array and stored in the active controller’s NVRAM cache
  2. Write is mirrored to standby controller’s NVRAM
  3. At this point, the write is acknowledged
  4. Write is shadow copied to DRAM
  5. While the data is in DRAM, it is analyzed for compression.  If the data is a good candidate for compression, the array determines the best compression algorithm to use for that type of data and it is compressed.  If it’s not a good candidate for compression (JPG for example) then it will not be compressed at all.
  6. data is grouped into stripes of 4.5mb and then written to disk in a full RAID stripe write operation

Here, the big performance benefits are mainly reduction in I/O to spinning disk and targeted inline compression.  This is achieved because we’re not blindly flushing data to disk as cache mechanisms fill up.  Instead we analyze the writes in memory, compress them inline and write them out to disk in a much more efficient manner.  The compression leverages the processing power on the array that is capable of compressing at 300mb/sec per core or faster.  As a result- you experience orders of magnitude less IOPS from the controller to the disk due to both compressing of the data and the way data is written.  What would have been maybe 1000 IOPS is now reduced to as little as 10 IOPS in some cases!  This is why Nimble doesn’t have to spend a lot of money on 15k or even 10k SAS drives on the back end.

To protect against data loss before the data is written to disk, both controllers have super capacitors that will hold the contents of NVRAM safe until you restore power and then write to disk.  Redundant controllers also guard against data corruption/loss in the event of the primary controller failure.

ssd2.jpg

  • Poor SSD write life

 A common problem since SSD’s have come about is that they eventually “wear out” due to the fact that the NAND flash substrate can only sustain a finite number of erase cycles before it becomes unusable.  Without getting into all the details of things like write amplification, garbage collection and flash cell degredation, understand that the less you write to an SSD generally the better off it will be.  Due to the nature of how typical arrays utilize SSD as a cache layer, inherently there will be a lot of writing.

Like I described earlier when talking about write performance, Nimble designed their filesystem from the ground up to minimize the amount of writes to SSD or just disk in general.  A side benefit of that is the fact that they also don’t need to use more expensive SLC SSD’s in their arrays due to the lower amount of writes needed.

Managing-Poor-Performance

  • My read performance sucks, especially with random reads

Typical storage arrays employ multiple caching layers to help boost read performance.  It is understood that the worst case scenario is having to read all data from slow spinning disk.  Even the fastest 15k SAS drives can only sustain about 150-170 IOPS per drive.  So the standard drill is when a read request comes in, the cache layer is queried for the data and if it exists there and hasn’t been modified, is sent to the client.  This is the fastest read operation.  Next you go to the secondary cache- typically SSD(s).  The same thing happens- if the data is there, it’s read from slower SSD and served up to the client.  Finally if the data isn’t cached or if it has changed since it was in cache then you experience a “cache-miss” and data is read from slow spinning disk.

Nimble is smarter about how it handles caching.  First NVRAM (much faster than DRAM) is checked for the data.  Then DRAM is checked.  Flash cache is the next step- if data is found there it is checksummed and uncompressed on the fly, then returned.  Finally spinning disk serves up any missing data if none of it is in cache.  The beautiful thing about CASL is that it will keep track of read patterns and make a decision on whether or not the data that was just served up from disk should be held in a higher level cache.

I haven’t talked about all of the technologies that CASL employs let alone some of the other benefits of owning Nimble storage.  Suffice it to say I’m excited about the future of Nimble.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s