Category Archives: Flash

Host-based Caching with QLogic FabricCache

Introduction

In a previous article, the topic of discussion was leveraging flash for a virtual infrastructure.  Some environments today are still not fully virtualized; this is typically because the last workloads to be virtualized are the systems with high horsepower application requirements, such as SQL and Oracle.  These types of workloads can benefit greatly from flash-based caching.  While many hardware/software combo solutions do exist for in-OS caching in non-virtualized environments, they will be covered in a separate article.  This article focuses on the QLogic host bus adapter-based caching solution FabricCache.

How it works

QLogic’s take on using flash technology to accelerate SAN storage is to place the cache directly on the SAN, in their HBA’s, viewing cache as a shared SAN resource.

Overview

The Qlogic FabricCache 10000 Series Adapter consists of an FC HBA and a PCI-e Flash device.  These two adapters are ribbon-connected, providing the HBA with direct access to the PCI-e Flash device, as the PCI bus provides only power to the flash card.  While some consider this a two-card solution, keep in mind that if you have a flash PCI adapter in the host to accelerate data traffic, the HBA will also exist.  That being said, the existing HBA cannot be re-used with the FabricCache solution.

Caching

A LUN is pinned to a specific HBA/Cache – all hosts that need to access the cached data for the LUN in question access it from the FabricCache adapter to which the LUN is pinned.  For example:  Server_A and Server_B are two ESX hosts in a cluster.  Datastore_A is on LUN_A and Datastore_B is on LUN_B.  LUN_A is tied to HBA_A on Server_A, while LUN_B is tied to HBA_B on Server_B.  All of the cached data for LUN_A will be on the cache on HBA_A, so if a VM on Server_B is running on LUN_A, its cache resides on HBA_A on Server_A.  A cache miss results in storage array access.  In this case, the host would access the cache across the SAN, to the neighboring host.  For this reason, each adapter must be in the same zone to be able to see other caches in the FabricCache cluster.

FabricCacheBlog

In a FabricCache cluster, all data access is in-band between the HBAs on the fabric, with no cache visibility by the host operating system or hypervisor, or the HBA driver.   QLogic’s FabricCache is a write through caching, where all writes are sent to through the cache, requiring acknowledgement from the array before acknowledging to the operating system that the write operation is complete.  The lack of write caching is based on the fact that 80% of SAN traffic is reads, thus accelerating the largest portion of SAN traffic.

The technology can be managed by a command line using QLogic Host CLI, but there is also a GUI available called the QConvergeConsole.  There is also a plug-in available for VMware’s vCenter.  In this interface, an administrator can target specific LUNs to cache, and assign LUNs to the specific HBA cache that should be used.

When and Why FabricCache

In my opinion, the biggest reason to use FabricCache is the pricing model.  Since the solution is all hardware, there is no software maintenance required on an annual basis.  That being said, there is a hardware warranty that must be purchased the first year, but typically that cost can be capitalized, thus resulting in an OpEx free solution.  Ongoing maintenance, at least when I checked, is not required.

Once you reach 5 systems with the adapter, the annual maintenance cost for those 5 systems is equal to the cost of purchasing a stand-by adapter and keeping it on the shelf in case of failure, thus avoiding the recurring maintenance costs.

A few scenarios that I believe this technology is a great fit are where the supportability of a cache solution, lifespan of the flash, and recurring cost of the solution are paramount to the decision.  These areas are discussed in greater detail in the sections below.

Supportability

  • No software aside from HBA driver – this is very beneficial in that there is no risk of 3rd party software acting as a shim into your operating system. Many times, custom written cache drivers are only supported by the manufacturer of the software, and not thoroughly tested by the operating system/hypervisor vendor.  This can result in complications when upgrading, or integrating with other pieces of software.
  • Physical Clusters – Since cache is part of the SAN, cache will still be available during failover.
  • Non-VMware Hypervisors – Many hypervisor products do not have host-based cache products available. Since this solution is all hardware, with a simple operating system driver for the HBA, this makes it a good candidate for systems that do not have a cache product available for them.

Longer Life/ Lower Cost

  • Uses SLC flash – SLC flash typically lasts about five to ten times longer than MLC or eMLC flash. SLC flash also has typically twice the performance.   The result is faster cache access, and reduced time until failure.
  • Slightly higher CapEx due to cost of flash, but gives the option to avoid OpEx

When Not and Why Not FabricCache

There are also a few cases where I would avoid this solution in favor of an in-OS or in-guest solution.  These cases are listed below.

  • Virtual Environment – Since the HBA owns the LUN, as shown in the diagram earlier in this article, all VM’s would need to be on the same host to have access to local cache. In the case of a vMotion, the host would need to access the cache on another host since the VM has moved.
  • Many organizations use two HBA’s in a single host – each HBA connecting to a separate switch, for resiliency in the event of an HBA failure. This configuration would require the purchase of two FabricCache adapters, which makes the price almost unpalatable.

Another notable item with this solution is that in the event of a server in the FabricCache cluster becoming unavailable, no cache acceleration will be available for the LUNs serviced by the FabricCache adapter on the failed host.

Conclusion

This article focuses on a host bus adapter-based caching solution, QLogic’s FabricCache, how it works, and where I see it as being a good fit into an infrastructure.

Considerations when Choosing Flash

Overview

There is a growing trend in the use of solid state drives in IT departments.  All devices, from smart-phones, laptops, servers, to storage arrays may contain some type of solid state storage.  In past articles, software products used to leverage solid state storage were discussed in detail.  This article will discuss the various types of flash cells available in solid state storage devices.

Flash Considerations

There are many types of flash cells available in solid state drives currently on the market.  When purchasing solid state drives by themselves, the differences are not quite as clear as one may expect.  This is even truer when looking at products with integrated flash modules, as types are normally hidden by implementation.  In many cases, the type of flash cell used is up to the storage manufacturer – the customer has no input.  That being said, some even argue that the controller is more important than the underlying storage technology.

The most overlooked aspect of flash storage is that all flash memory suffers from wear, which occurs because erasing or programming a cell subjects it to wear due to the voltage applied.  Each time this happens, a charge is trapped in the transistor, and causes a permanent shift in the cell’s characteristics.  After number of cycles, the cell becomes unusable.  This is why iPhones start to slow down over time.  Buried deep in the product documentation is a specification called “maximum endurance.”  This term notes the amount of write/erase cycles available until the product exceeds its maximum usable life.  It is important to take these metric into consideration when choosing a solid state drive.

Cell Types

The sections below will take a closer look at the various type of flash chips found in solid state storage currently on the market.

Single-Level Cell (SLC)

SLC is fast and reliable, but also quite expensive.  Featured in the best performing storage arrays, SLC uses a single cell to store one bit of data.  SLC Flash will typically for 100,000 erase/write cycles.  SLC is typically two to four times as expensive as MLC, but has ten times the life.  If the application for which the flash will be used is highly write/erase intensive, like a SQL Temp DB, finding a solution with an SLC chip and a good controller will provide the best performance and highest endurance.

Multi-Level Cell (MLC)

MLC is the most common form of flash.  MLS is often found in consumer-grade products like cameras, phones, USB memory, and portable music players, but some MLC flash is present in enterprise storage.  The main characteristic of MLC is low price, allowing a lower barrier of entry into a flash solution.  It is cheaper to produce, but stuffers from higher wear rates and lower write performances when compared to SLC.  MLC is more complex and can interpret four digital states from a signal stored in a single cell, whereas SLC only stores one, which is how they got their names.  This is also the reason MLC chips require more powerful error correction on the controller.  Typically rated at 10,000 erase/write cycles, much lower than SLC devices, MLC storage has much higher capacities at 25%-50% of the price one would pay for an SLC-based solid state device.  MLC devices are a good fit in solutions that require significant read acceleration, with minimal writes.

Triple-Level Cell (TLC)

Triple-Level Cell flash devices are simply MLS with a greater density.  TLC shares the same characteristics as MLC, with the negatives being amplified given the additional level of cell density.  The result of this increased cell density design is that it allows for increases storage capacity at a lower cost.

Enterprise Multi-Level Cell (EMLC)

When the term “Enterprise” is put in front of MLC that typically indicates that the device has a more sophisticated controller.  This is where some say that the controller is what matters most.  These advanced controllers include wear leveling technologies to increase lifespan.  Sometimes, these controllers could be as much as 30,000 write/erase cycles, thus tripling the average lifespan of device based on MLC flash cells.

Wear Leveling

Wear leveling is a general term referencing various mechanisms a controller takes to increase the lifespan of the flash cells on a flash-based device.  These mechanisms include:

  • Moves write cycles around the chip so that the cells wear evenly
  • On-device deduplication, which reduces the volumes of data written
  • Cell redundancy, which reserves a portion of the devices capacity to replace cells that fail
  • Write optimization, which stores data writes so they can be made in large chunks to reduce the number of write operations

Many times an 825GB device will be sold as a 750GB capacity.  This is an indication that wear leveling technologies have been implemented on the controller.  With wear leveling technologies, a larger TLC device, marketed with a lower overall capacity, results in a much longer lifespan for the device.

Conclusion

Despite endurance issues, flash devices are more reliable than spinning media.  Keep in mind that while overall device life may be longer than spinning disk, performance decreases as wear increases.  This is important to note as flash is usually included as part of a solution to accelerate performance.  When attempting to balance performance, lifespan and price it is important to remember SLC is usually 4x the price of MLC, but SLC has 10x the erase/write cycles of MLC.

Leveraging Flash in a Virtual Environment

Introduction

Flash is an emerging trend in the storage landscape, as the traditional monolithic array can no longer offer the performance demands of modern workloads. Flash is available three ways:

  • High-cost all flash arrays
  • Hybrid arrays used in replacement of existing storage
  • Easy to add host-based flash

Focusing on the virtual infrastructure, host-based flash seems to be the most cost effective approach. Leveraging host-based flash adds much needed performance, while continuing to leverage the remaining storage capacity of the existing SAN-based storage devices currently deployed.

Host-based Flash

Host-based flash architecture views storage differently, broken into two high level tiers: Data in motion, and data at rest. Data at rest is static or lightly used data, the capacity side of data, which continues to reside on SAN-based storage. Data in motion is the active data sets, residing in the cache locally on the host, while also keeping a copy on the SAN for long term storage and backup purposes. This architecture effectively decouples performance from capacity. The primary benefits of this architecture are increased performance, and extending the useful life of SAN storage devices.

Host-based flash accelerates the data in motion. This done by to keep reads and writes (where applicable) local to the hypervisor host, on high speed, low latency solid state devices.

Components

There are two components to a host based caching infrastructure, those being hardware and software. The host-based hardware provides the cache capacity and performance, while the software provides the mechanism to determine which pieces of data are cached and facilitates compatibility with existing features like vMotion, HA, and DRS. Below are the various hardware and software options available:

Hardware

The hardware component can consist of multiple possibilities:

  • SAS/SATA attached solid state drives
  • PCIe-based solid state storage
  • System RAM

Software

The software component. There are 3 primary types:

  • Static Read Cache
  • Dynamic Read Cache
  • Dynamic Read/Write Cache

Hardware

In order to implement a host-based flash solution, flash hardware must be installed in the server. This section details the hardware options available to be implemented in such a solution.

Tier 1 – SATA/SAS attached SSD Flash

Most software products on the market today support SSD-based flash devices attached via SATA and SAS. Both rack-optimized and blade servers have slots for the disk controllers and drives. The cost for these devices is relatively low, compared to other options. These factors lower the barrier of entry, allowing for flash devices to be easily be added to an existing infrastructure.

Performance on these devices pales in comparison to PCIe devices. This performance hit comes into play with the need to cross two controllers to reach the flash device. The first controller is the SATA/SAS disk controller. The second controller is the controller built into the SSD, converting SATA/SAS to flash.

It is worth noting that SATA/SAS based SSD have a limited write life, and RAID levels can be introduced to ensure flash availability. That being said, the additional protection provided by RAID will increase the load on the SATA/SAS controller, introducing latency, as well as additional cost. Also, RAID 5 and 6 result in a write penalty for the parity calculation and writing that is undesirable in a flash infrastructure. Software designed to provide a flash virtualization platform provide cache redundancy across hosts to ensure a cache is always available, even in the event of losing flash on a single host.

Tier 2 – PCIe-based Flash

All software products on the market today support PCIe based flash devices. Rack optimized servers usually have more PCIe slots than are needed, although in smaller form factor systems, all available slots may be consumed by redundant 10GB Ethernet and 8GB HBA adapters. Additionally, some server vendors partner with flash vendors to provide specially designed PCIe flash devices that will fit into a blade server. HP offers IO Accelerators for BladeSystem c-Class. Since PCIe system run at the speed of the PCIe bus, and has direct access to the CPU without having to cross multiple controller, they offer staggering performance over SSD, which is the name of the game in this architecture.

What is also staggering about PCIe-based flash is the price tag, costing anywhere from 3x to 10x that of SSD. That being said, the devices are also limited by the capacity of the PCIe bus, which means if there are a lot of devices on your PCIe bus, there may be some level of contention. Adding redundancy to PCIe flash devices requires additional devices as well as software to support mirroring the device. Again, this is where flash virtualization platform software comes into play.

Tier 3 – RAM

Some software products on the market support allocating a section of RAM on the host to provide the cache storage. All servers have RAM, although it is a precious and expensive resource. RAM is the fastest and highest performing method of caching data on a host, and also the most expensive. Redundancy is not a major concern, as there are usually in upwards of 16 DIMM slots in a server, with all in use.

Hardware Price and Performance Comparison

The table below details the various options, highlighting both price and performance. One thing to note about the HP vs Micron SSD options is that HP’s 3 year warranty ensures the drives will be placed upon failure for 3 years. The Micron drives are priced such that they can be mirrored or be kept on hand in the event of failure. The remaining information is rather straight-forward, reflecting what the sections above discuss.

Vendor Size

Class

List

Price

B/W

Random Read

Random Writes

Sequential Reads

Sequential Writes

Price/GB

HP 32GB

RAM

$1,399

varies

100k IOPS

unavail

5500 MiB/s

5500 MiB/s

$43

HP 16GB

RAM

$425

varies

100k IOPS

unavail

5500 MiB/s

5500 MiB/s

$27

Micron 175GB

Full PCIe

$3,200

16Gbps

unavail

unavail

unavail

unavail

$18

Micron 350GB

Full PCIe

$6,500

16Gbps

unavail

unavail

unavail

unavail

$19

HP 365GB

in-blade PCIe

$8,500

16Gbps

71k IOPS

32.5K IOPS

860 MiB/s

560 MiB/s

$23

HP 100GB

SSD

$670

6Gbps

63k IOPS

19.2k IOPS

480 MiB/s

185 MiB/s

$6.60

HP 200GB

SSD

$1,400

6Gbps

63k IOPS

32k IOPS

480 MiB/s

350 MiB/s

$7

HP 400GB

SSD

$2,660

6Gbps

63k IOPS

35k IOPS

480 MiB/s

450 MiB/s

$6.60

Micron 120GB

SSD

$192

6Gbps

63k IOPS

23k IOPS

425 MiB/s

200 MiB/s

$1.60

Micron 240GB

SSD

$324

6Gbps

63k IOPS

33k IOPS

425 MiB/s

330 MiB/s

$1.40

Micron 480GB

SSD

$579

6Gbps

63k IOPS

35k IOPS

425 MiB/s

375 MiB/s

$1.20

 

Software

The available options coinciding with the software component of the host-based flash infrastructure come in multiple flavors. The sections below detail the various software options available.

Tier 1 – Static Read Cache

The basic cache is what some refer to as a static read cache. This means that a portion of a locally attached flash device is statically assigned to a specific VM. There is only one product on the market worth mentioning in this realm, detailed below:

Vmware – vFlash Read Cache

VMware’s vFlash Read Cache is included free with vSphere Enterprise Edition, assuming the virtual infrastructure has been upgraded to vSphere 5.5 and the VM is running hardware version 10. It provides an effective method for caching commonly read blocks to the flash device. It fully supports vMotion, including pre-warming of the cache on the destination host prior to the vMotion completing. This avoids the need to pre-populate the flash. It is VMware certified, and supports both block-based and network-attached storage. Upgrading and patching of the hypervisor software is fully supported by VMware. The product is managed from within vCenter, although performance monitoring must be done via ESXTOP.

There are many complexities in configuring the static read cache, and including choosing the correct block size. Choosing the wrong block size can result in worse performance for the VM’s leveraging cache. The largest complexity is the fact that the cache is static. The means that an administrator must plan the flash usage, on a per-VM basis, in advance. This means that the administrator must know the usage pattern of each VM in advance. When dealing with virtual infrastructures that include hundreds of virtual machines, this task is nearly impossible. Also, if additional flash is needed for newly deployed VM’s, and there is no more available flash, additional hardware will need to be procured to provide for the newly deployed VM. This product also lacks write buffering, to accelerate writes. It is not fully clustered, so any time a host goes down, the cache data is lost.

Tier 2 – Dynamic Read Cache

The next level cache is what some refer to as a dynamic read cache, or write back caches. This means that a local flash device is assigned as a pool of flash resources. The dynamic read cache is smart enough to identify usage patterns of VM’s assigned to the cache, growing and shrinking the per-VM cache usage as it sees fit. All products listed support vMotion, DRS, and HA. There are three products on the market worth mentioning in this realm, detailed below:

Proximal Data – AutoCache

Flash indexes in RAM, algorithm to keep index size small. Other products store indexes on the flash device, resulting in an added latency penalty for lookups. Uses Most Recently Used, Most Frequently Used, and proprietary caching. Active real-time feedback to determine which algorithms are performing best at any given time, resulting in more resources to the most affective algorithm that adapts to changes and prolongs life of flash.

List Price of $999 for 0-500GB per host; $1999 for 500GB-1TB; >1TB for $2999 – with flash prices going down over time, customers starting out on the low end will eventually upgrade to the $2999 price point. As with most software vendors, 20% recurring maintenance cost is assumed, although not confirmed.

Fusion-io – ioTurbine

Fusion-io’s ioTurbine product seems to be one of the more well-known dynamic read caching products on the market. This is likely due to the fact that Fusion-io has been selling various PCIe-based flash devices for quite some time. They could be considered a one-stop shop for all of your flash needs, as they also have SAN based flash arrays as caches for storage, operating specific cache software, for both server and workstations.

Little stands out regarding io-Turbine on the feature side, as it seems to match the feature sets of the other two players in this class of software solutions. The caching software can be configured for specific VMDK files, specific VMFS datastores, and specific virtual machines. Caching is done using the Most Recently Used algorithm, which caches only the most recently used data, regardless of the frequency of use.

Much stands out from an architectural standpoint. Limited hardware support, as Fusion-io PCI flash devices are the only supported hardware. Since Fusion-io only makes the expensive PCIe devices, and no SSD, the entry price is very high. The software requires a host driver, and a dedicated virtual machine for management. Additionally, a driver can be installed in the guest for additional file level control, but this approach is not scalable for environment with more than a couple dozen systems on the cache. Documentation is unclear as to whether the host driver operates in user mode or kernel mode.

List Price $3,900 per host, regardless of the size of flash devices in the host. As with most software vendors, 20% recurring maintenance cost is assumed, although not confirmed.

SANDisk – FlashSoft

FlashSoft, which was acquired by SANdisk not long ago, also offers caching software for both Windows, Linux, and for virtual environments. This software uses the Most Frequently Used algorithm, meaning the hottest blocks used by the VMDK’s are cached. This software is configured on a per-VMDK basis only. Supporting both PCI and SSD, the software comes with the capability to track, monitor and predict end of life for SSD devices. That being said, HP provides the same type of tools for its SSD devices.

List Price $3,900 per host, regardless of the size of the flash devices in the host. As with most software vendors, 20% recurring maintenance cost is assumed, although not confirmed.

Tier 3 – Dynamic Read/Write Cache

The top tier technology for host-based flash is what some would call a dynamic read/write cache, although it is really much more than that. This means that the local flash devices across multiple hosts are assigned as a pool

PernixData – Flash Virtualization Platform (FVP)

PernixData provides the only dynamic host-based caching software that can cache both reads and writes. Few software providers have the resources available to develop such software, as the CTO of the firm helped develop the VMFS file system used by Vmware ESX. This inside knowledge has allowed the software to be the most tightly integrated with the vSphere hypervisor and underlying stack. The FVP software is fully clustered, using “Flash Cluster Technology” to allow any host to remotely access the flash devices on any other host. This ensures that data written to cache is fault tolerant in the event of hardware failure. The software can be configured to cache specific virtual machines or specific datastores. Based on customer stories using other caching software, PernixData’s FVP continues to operate problem free when host hypervisor upgrades occur. This is important given that upgrades are release on an annual basis. The caching software will also continue to provide virtual machines access to cache even if a SSD fails. This is done by accessing the fault tolerant copy of the cache from another host. The same occurs in an HA event, if host hardware fails. PernixData is also the only software that can leverage system RAM to act as a cache for storage data.

The only drawback with this software is price. With a list price of $7,500 per host, and a 20 % annual maintenance fee, the entry point can be quite salty. Discounted pricing makes the product less than $3,000 per host, making it less than Fusion-io.

Product Capability Comparison

The table below compares the three tiers of caching software available for host-based flash.

Column1

Vmware vFRC

Fusion-io ioTurbine

Proximal Data AutoCache

PernixData FVP v2.0

Write-Through Caching (Reads)

X

X

X

X

Write-Back Caching (Writes)

NO

NO

NO

X

Dynamically Assigned

NO

X

X

X

Caches to PCI Flash

X

X

X

X

Caches to SAS/SATA SSD

X

NO

X

X

Caches to RAM

NO

NO

NO

X

Pre-Assignment required

X

NO

NO

NO

In-depth knowledge of workloads required

X

NO

NO

NO

Clustered

NO

NO

NO

X

Supports vMotion

X

X

X

X

Supports vMotion Maintaining Cache

NO

X

X

X

Continued acceleration upon failure of SSD

NO

NO

NO

X

Supports DRS

X

X

X

X

Supports HA (cache info is lost)

NO

NO

NO

X

Vmware Certified

X

NO

NO

X

Block Storage

X

X

X

X

NFS

?

X

X

X

Outage Required to add VM

X

X

X

NO

vSphere 5.0 or better required

NO

X

X

X

vSphere 5.5 H/W v10 required

X

X

N/A

N/A

vCenter Plug-in for Configuration

X

NO

X

X

vCenter Plug-in for Performance Mgmt

NO

NO

X

X

Seamless Hypervisor Upgrade

X

NO

NO

X

Licensing

n/a

Per host

Per host

Per host

1st Year pricing per host

Free with vSphere 5.5 EE

$3,900

$999 (<500GB)

$7,500

Recurring cost per host

Free with vSphere 5.5 EE

$780.0

$200

$1,567

 

Product Price Comparison

The table below compares the five caching software products available for host-based flash.

Vendor

Product

License Model

License Cost

Maintenance Cost

Notes

Vmware

vFlash Read Cache

vSphere Feature

n/a

n/a

Assuming vSphere 5.5 EE

Proximal Data

Auto Cache

0-500GB

$999

$200

List Price

Proximal Data

Auto Cache

500GB-1TB

$1,999

$400

List Price

Proximal Data

Auto Cache

>1TB

$2,999

$600

List Price

Fusion-io

ioTurbine

per host

$3,900

$780

List Price

SANDisk

Flashsoft

per host

$3,900

$780

List Price

PernixData

Flash Virtualization Platform 2.0

per host

$7,500

$1,570

List Price

Conclusion

This document provided a comprehensive look at the options, both hardware and software, as well as pricing estimates for the components required to deploy a host-based flash cluster in a virtual infrastructure. PernixData seems offer the best overall capabilities, given the fact that it is fully clustered, the most tightly integrated with vSphere, and provides caching of writes. Solid state drives are the best point of entry for an initial foray into the realm of host-based flash given cost and ease of deployment.

With any new technology, proof of concept is the best place to start. PernixData is willing to provide evaluation software and loaner SSDs to facilitate testing in a proof of concept environment. This proof of concept should be executed on a fraction of production hosts, focusing on problematic virtual machines, to see how the solution performs in the real world.

Pending a successful POC, and full deployment to production an organization can expect to see reduced CPU utilization on the existing SAN storage as well as a lower demand for IOPS on their SAN-attached storage arrays. With virtual machines depending more on flash disk for reads, their performance will improve while also freeing up resources on the SAN-attached storage arrays helping to improve performance for other SAN-based workloads.