Modern Day Page File Planning

Introduction

Page files can have great effect on system performance and recovery in a Windows environment. Incorrect sizing can result in slow or unresponsive systems, excessive use of disk resources, and the inability to obtain sufficient information from dump files in the event of a crash. The old belief regarding proper sizing of the page file is this: irrelevant of the size of RAM, a page file must be at least 1.5 times the quantity of physical RAM installed in the server. In modern day, 64-bit computing, this page file allocation rule is no longer relevant. Configuring systems with 1.5x RAM, to support full memory dumps, will turn a small outage into a major outage, due to the amount of time it takes to write the contents of RAM to disk. Additionally, the amount of wasted space on system drives to provide for the massive page file and dump file add up quickly. This document discusses the modern-day approach for handling page files and page file sizing for the Windows operating system.

Initial Discussion

According to Microsoft there is no specific recommended size for the page file. At a high level, the page file size requirements are based on the hardware configuration, software running, and overall workload of the system. The recommendation is to monitor the performance characteristics of the system, and determine page file size based on the observed metrics. This is not an easy task, as many administrators manage thousands of Windows operating system instances. This approach requires gathering performance data from each individual system, interpreting the data, and modifying page file sizes, which requires a reboot to implement. While this approach is technically possible, a more realistic approach is to focus on the larger, more business critical workloads. Workloads such as SQL Server, Exchange, and Hyper-V are good places to start.

Application Specific Considerations

SQL Server

Systems running SQL Server generally have a large amount of RAM. The majority of the physical RAM allocated to the system should be committed to the SQL Server process and locked in physical RAM in order to prevent it from being paged out to the page file. Since SQL already manages its own memory space, this memory space should be considered “non-pagable” and not included in a calculation for page file size. If SQL is configured to allow locking pages in memory, set the SQL Server maximum memory setting so that some physical RAM remains available for other operating system processes. For SQL servers, a 2GB page file on the system drive, and 6GB file on a non-system drive seems to work best as a starting point, but mileage may vary.

Hyper-V

Per Microsoft’s Hyper-V product group: for systems with the Hyper-V role installed running on Windows Server 2012 and Windows Server 2012 R2, the page file for the parent partition should be left at the default page file setting of “System Managed.”

General Windows Servers

For the remaining systems, having a page file that is much too large compared to the actual non-locked memory space can result in Windows overzealously paging out application memory to the page file, resulting in those applications suffering the consequences of page misses (slow performance). As long as the server is not running other memory-hungry processes, a page file size of 4GB is a good starting point.

Some products or services may use the page file for reasons other than what have been discussed in this document. Some applications are not supported without a page file, to include domain controllers, certificate servers, LDS servers, systems running DFS replication, and Exchange Server.

Considerations for Page File Size

When sizing page files, it is important to remember that actual page file usage depends greatly on the amount of modified memory being managed by the system. Files that already exist (txt, dll, doc, exe) on the disk are not written to a page file. Only modified data not already residing on disk (unsaved text in MS Word) is memory that could potentially be backed up by a page file. There are a handful of items to consider when sizing page files.

Crash Dump Settings

There are three possible settings for Windows crash dumps

 

Windows Crash Dump Setting Minimum page file size
Small memory dump 1 MB
Kernel memory dump Depends on kernel virtual memory usage
Complete memory dump 1x Physical RAM plus 257MB

 

It is best to have a corporate standard as to how memory dumps are obtained in the event of a blue screen. Since we are focusing on higher memory workloads, which are usually more critical workloads, it is recommended to configure the system to perform a kernel memory dump in the event of a system crash. While a small memory dump provides enough data to inform on the faulting process, some stop codes involve drivers. Drivers are only listed in kernel memory dumps. Having a system configured to save kernel memory dumps ensures that all the information needed in the event of a crash is present, rather than having to increase the dump level and wait for unplanned another outage. Complete memory dumps are excessive, as disk space would then be required to hold a page file and the dump file, both of which would be the size of physical RAM.

 

With the release of Windows 2012, a new feature was introduced called “Automatic Memory Dump.” Enabled by default, this setting automatically selects the best crash dump model based on frequency of crashes. The Automatic memory dump setting at first selects the small memory dump option, which requires a page file or dedicated dump file to be at least 256K. If the system crashes, the feature selects bumps the system up to the kernel memory dump setting at startup, and then increases the minimum size of the system-managed page.

System Commit Charge and Limit

The system commit charge is the total amount of committed memory that is in use by all processes and by the kernel. It is the memory that has been promised and must have physical RAM. The system commit charge cannot be larger than the system commit limit, as the system commit limit is the sum of physical RAM and the page file. These numbers are shown in task manager.

 

If the system commit charge is steadily below the commit limit, the page file size can be reduced. This metric must be gathered after the system has been in use for some time.

Infrequently Accessed Pages

The purpose of the page file is to back infrequently accessed pages, so they can be removed from physical memory, providing more physical RAM for frequently accessed pages. This data cannot be obtained until the system is running for some time.

Additional Considerations

Page File Location

If the disk IOPS for the page file average more than 10 pages per second, Microsoft recommends that the page file be moved off of the system disk, and onto a dedicated drive. If the page file reached an average of 60 IOPS, use more than one dedicated page hard disk to obtain better performance, possibly by way of striping. The recommendation is to use a dedicated disk for every 60 pages per second of I/O activity. These numbers are based on 7200 RPM disks.

Number of Page Files

If the system is configured to have more than one page file, the page file that is first to respond is the page file that is used. It is likely that the page file on the faster disk will be used more frequently.

Calculating Page File Size

Putting all else aside for a moment, Microsoft has provided a few hard perfmon metrics to determine whether your page file is undersized or oversized.

Method 1

Four perfmon metrics can be observed to definitively determine whether the page file can be decreased, or needs to be increased.

Counter threshold Suggest Value
Memory\\Available Bytes No less than 4 MB
Memory\\Pages Input/sec No more than 10 pages
Paging File\\% Usage No more than 70 percent
Paging File\\% Usage Peak No more than 70 percent

Using perfmon logs to collect data during typical usage, one can understand paging activity and adjust the page file accordingly.

Method 2

Note the value of Process\\Page File Bytes Peak and multiply by 0.7

Method 3

The minimum page file size can be determined by calculating the sum of peak private bytes used by all system processes, then subtract the amount of system RAM. The maximum page file size can be determined by calculating the sum of peak private bytes, and adding a margin for safety.

Conclusion

There are no hard and fast rules regarding page file sizing on systems, and the myth of sizing a page file at 1.5x the size of physical RAM is not a valid sizing guideline. Crash dump requirements, performance metrics, and free disk space are all driving factors. The only true way to properly size you page file is to monitor the usage, and make adjustments.

Reference Items

SWAP Sizing

http://stackoverflow.com/questions/2588/appropriate-page file-size-for-sql-server

Microsoft Articles

http://support.microsoft.com/kb/889654

http://support.microsoft.com/kb/2860880

Swapping vs Paging

http://www.techarp.com/showarticle.aspx?artno=143&pgno=1

 

Virtual Memory Management

http://msdn.microsoft.com/en-us/library/ms810627.aspx

Missing Complete Dump Option

http://www.sophos.com/en-us/support/knowledgebase/111474.aspx

System Commit Limit

http://mcpmag.com/articles/2011/07/05/sizing-page-files-on-windows-systems.aspx

 

 

Advertisements

Leveraging Flash in a Virtual Environment

Introduction

Flash is an emerging trend in the storage landscape, as the traditional monolithic array can no longer offer the performance demands of modern workloads. Flash is available three ways:

  • High-cost all flash arrays
  • Hybrid arrays used in replacement of existing storage
  • Easy to add host-based flash

Focusing on the virtual infrastructure, host-based flash seems to be the most cost effective approach. Leveraging host-based flash adds much needed performance, while continuing to leverage the remaining storage capacity of the existing SAN-based storage devices currently deployed.

Host-based Flash

Host-based flash architecture views storage differently, broken into two high level tiers: Data in motion, and data at rest. Data at rest is static or lightly used data, the capacity side of data, which continues to reside on SAN-based storage. Data in motion is the active data sets, residing in the cache locally on the host, while also keeping a copy on the SAN for long term storage and backup purposes. This architecture effectively decouples performance from capacity. The primary benefits of this architecture are increased performance, and extending the useful life of SAN storage devices.

Host-based flash accelerates the data in motion. This done by to keep reads and writes (where applicable) local to the hypervisor host, on high speed, low latency solid state devices.

Components

There are two components to a host based caching infrastructure, those being hardware and software. The host-based hardware provides the cache capacity and performance, while the software provides the mechanism to determine which pieces of data are cached and facilitates compatibility with existing features like vMotion, HA, and DRS. Below are the various hardware and software options available:

Hardware

The hardware component can consist of multiple possibilities:

  • SAS/SATA attached solid state drives
  • PCIe-based solid state storage
  • System RAM

Software

The software component. There are 3 primary types:

  • Static Read Cache
  • Dynamic Read Cache
  • Dynamic Read/Write Cache

Hardware

In order to implement a host-based flash solution, flash hardware must be installed in the server. This section details the hardware options available to be implemented in such a solution.

Tier 1 – SATA/SAS attached SSD Flash

Most software products on the market today support SSD-based flash devices attached via SATA and SAS. Both rack-optimized and blade servers have slots for the disk controllers and drives. The cost for these devices is relatively low, compared to other options. These factors lower the barrier of entry, allowing for flash devices to be easily be added to an existing infrastructure.

Performance on these devices pales in comparison to PCIe devices. This performance hit comes into play with the need to cross two controllers to reach the flash device. The first controller is the SATA/SAS disk controller. The second controller is the controller built into the SSD, converting SATA/SAS to flash.

It is worth noting that SATA/SAS based SSD have a limited write life, and RAID levels can be introduced to ensure flash availability. That being said, the additional protection provided by RAID will increase the load on the SATA/SAS controller, introducing latency, as well as additional cost. Also, RAID 5 and 6 result in a write penalty for the parity calculation and writing that is undesirable in a flash infrastructure. Software designed to provide a flash virtualization platform provide cache redundancy across hosts to ensure a cache is always available, even in the event of losing flash on a single host.

Tier 2 – PCIe-based Flash

All software products on the market today support PCIe based flash devices. Rack optimized servers usually have more PCIe slots than are needed, although in smaller form factor systems, all available slots may be consumed by redundant 10GB Ethernet and 8GB HBA adapters. Additionally, some server vendors partner with flash vendors to provide specially designed PCIe flash devices that will fit into a blade server. HP offers IO Accelerators for BladeSystem c-Class. Since PCIe system run at the speed of the PCIe bus, and has direct access to the CPU without having to cross multiple controller, they offer staggering performance over SSD, which is the name of the game in this architecture.

What is also staggering about PCIe-based flash is the price tag, costing anywhere from 3x to 10x that of SSD. That being said, the devices are also limited by the capacity of the PCIe bus, which means if there are a lot of devices on your PCIe bus, there may be some level of contention. Adding redundancy to PCIe flash devices requires additional devices as well as software to support mirroring the device. Again, this is where flash virtualization platform software comes into play.

Tier 3 – RAM

Some software products on the market support allocating a section of RAM on the host to provide the cache storage. All servers have RAM, although it is a precious and expensive resource. RAM is the fastest and highest performing method of caching data on a host, and also the most expensive. Redundancy is not a major concern, as there are usually in upwards of 16 DIMM slots in a server, with all in use.

Hardware Price and Performance Comparison

The table below details the various options, highlighting both price and performance. One thing to note about the HP vs Micron SSD options is that HP’s 3 year warranty ensures the drives will be placed upon failure for 3 years. The Micron drives are priced such that they can be mirrored or be kept on hand in the event of failure. The remaining information is rather straight-forward, reflecting what the sections above discuss.

Vendor Size

Class

List

Price

B/W

Random Read

Random Writes

Sequential Reads

Sequential Writes

Price/GB

HP 32GB

RAM

$1,399

varies

100k IOPS

unavail

5500 MiB/s

5500 MiB/s

$43

HP 16GB

RAM

$425

varies

100k IOPS

unavail

5500 MiB/s

5500 MiB/s

$27

Micron 175GB

Full PCIe

$3,200

16Gbps

unavail

unavail

unavail

unavail

$18

Micron 350GB

Full PCIe

$6,500

16Gbps

unavail

unavail

unavail

unavail

$19

HP 365GB

in-blade PCIe

$8,500

16Gbps

71k IOPS

32.5K IOPS

860 MiB/s

560 MiB/s

$23

HP 100GB

SSD

$670

6Gbps

63k IOPS

19.2k IOPS

480 MiB/s

185 MiB/s

$6.60

HP 200GB

SSD

$1,400

6Gbps

63k IOPS

32k IOPS

480 MiB/s

350 MiB/s

$7

HP 400GB

SSD

$2,660

6Gbps

63k IOPS

35k IOPS

480 MiB/s

450 MiB/s

$6.60

Micron 120GB

SSD

$192

6Gbps

63k IOPS

23k IOPS

425 MiB/s

200 MiB/s

$1.60

Micron 240GB

SSD

$324

6Gbps

63k IOPS

33k IOPS

425 MiB/s

330 MiB/s

$1.40

Micron 480GB

SSD

$579

6Gbps

63k IOPS

35k IOPS

425 MiB/s

375 MiB/s

$1.20

 

Software

The available options coinciding with the software component of the host-based flash infrastructure come in multiple flavors. The sections below detail the various software options available.

Tier 1 – Static Read Cache

The basic cache is what some refer to as a static read cache. This means that a portion of a locally attached flash device is statically assigned to a specific VM. There is only one product on the market worth mentioning in this realm, detailed below:

Vmware – vFlash Read Cache

VMware’s vFlash Read Cache is included free with vSphere Enterprise Edition, assuming the virtual infrastructure has been upgraded to vSphere 5.5 and the VM is running hardware version 10. It provides an effective method for caching commonly read blocks to the flash device. It fully supports vMotion, including pre-warming of the cache on the destination host prior to the vMotion completing. This avoids the need to pre-populate the flash. It is VMware certified, and supports both block-based and network-attached storage. Upgrading and patching of the hypervisor software is fully supported by VMware. The product is managed from within vCenter, although performance monitoring must be done via ESXTOP.

There are many complexities in configuring the static read cache, and including choosing the correct block size. Choosing the wrong block size can result in worse performance for the VM’s leveraging cache. The largest complexity is the fact that the cache is static. The means that an administrator must plan the flash usage, on a per-VM basis, in advance. This means that the administrator must know the usage pattern of each VM in advance. When dealing with virtual infrastructures that include hundreds of virtual machines, this task is nearly impossible. Also, if additional flash is needed for newly deployed VM’s, and there is no more available flash, additional hardware will need to be procured to provide for the newly deployed VM. This product also lacks write buffering, to accelerate writes. It is not fully clustered, so any time a host goes down, the cache data is lost.

Tier 2 – Dynamic Read Cache

The next level cache is what some refer to as a dynamic read cache, or write back caches. This means that a local flash device is assigned as a pool of flash resources. The dynamic read cache is smart enough to identify usage patterns of VM’s assigned to the cache, growing and shrinking the per-VM cache usage as it sees fit. All products listed support vMotion, DRS, and HA. There are three products on the market worth mentioning in this realm, detailed below:

Proximal Data – AutoCache

Flash indexes in RAM, algorithm to keep index size small. Other products store indexes on the flash device, resulting in an added latency penalty for lookups. Uses Most Recently Used, Most Frequently Used, and proprietary caching. Active real-time feedback to determine which algorithms are performing best at any given time, resulting in more resources to the most affective algorithm that adapts to changes and prolongs life of flash.

List Price of $999 for 0-500GB per host; $1999 for 500GB-1TB; >1TB for $2999 – with flash prices going down over time, customers starting out on the low end will eventually upgrade to the $2999 price point. As with most software vendors, 20% recurring maintenance cost is assumed, although not confirmed.

Fusion-io – ioTurbine

Fusion-io’s ioTurbine product seems to be one of the more well-known dynamic read caching products on the market. This is likely due to the fact that Fusion-io has been selling various PCIe-based flash devices for quite some time. They could be considered a one-stop shop for all of your flash needs, as they also have SAN based flash arrays as caches for storage, operating specific cache software, for both server and workstations.

Little stands out regarding io-Turbine on the feature side, as it seems to match the feature sets of the other two players in this class of software solutions. The caching software can be configured for specific VMDK files, specific VMFS datastores, and specific virtual machines. Caching is done using the Most Recently Used algorithm, which caches only the most recently used data, regardless of the frequency of use.

Much stands out from an architectural standpoint. Limited hardware support, as Fusion-io PCI flash devices are the only supported hardware. Since Fusion-io only makes the expensive PCIe devices, and no SSD, the entry price is very high. The software requires a host driver, and a dedicated virtual machine for management. Additionally, a driver can be installed in the guest for additional file level control, but this approach is not scalable for environment with more than a couple dozen systems on the cache. Documentation is unclear as to whether the host driver operates in user mode or kernel mode.

List Price $3,900 per host, regardless of the size of flash devices in the host. As with most software vendors, 20% recurring maintenance cost is assumed, although not confirmed.

SANDisk – FlashSoft

FlashSoft, which was acquired by SANdisk not long ago, also offers caching software for both Windows, Linux, and for virtual environments. This software uses the Most Frequently Used algorithm, meaning the hottest blocks used by the VMDK’s are cached. This software is configured on a per-VMDK basis only. Supporting both PCI and SSD, the software comes with the capability to track, monitor and predict end of life for SSD devices. That being said, HP provides the same type of tools for its SSD devices.

List Price $3,900 per host, regardless of the size of the flash devices in the host. As with most software vendors, 20% recurring maintenance cost is assumed, although not confirmed.

Tier 3 – Dynamic Read/Write Cache

The top tier technology for host-based flash is what some would call a dynamic read/write cache, although it is really much more than that. This means that the local flash devices across multiple hosts are assigned as a pool

PernixData – Flash Virtualization Platform (FVP)

PernixData provides the only dynamic host-based caching software that can cache both reads and writes. Few software providers have the resources available to develop such software, as the CTO of the firm helped develop the VMFS file system used by Vmware ESX. This inside knowledge has allowed the software to be the most tightly integrated with the vSphere hypervisor and underlying stack. The FVP software is fully clustered, using “Flash Cluster Technology” to allow any host to remotely access the flash devices on any other host. This ensures that data written to cache is fault tolerant in the event of hardware failure. The software can be configured to cache specific virtual machines or specific datastores. Based on customer stories using other caching software, PernixData’s FVP continues to operate problem free when host hypervisor upgrades occur. This is important given that upgrades are release on an annual basis. The caching software will also continue to provide virtual machines access to cache even if a SSD fails. This is done by accessing the fault tolerant copy of the cache from another host. The same occurs in an HA event, if host hardware fails. PernixData is also the only software that can leverage system RAM to act as a cache for storage data.

The only drawback with this software is price. With a list price of $7,500 per host, and a 20 % annual maintenance fee, the entry point can be quite salty. Discounted pricing makes the product less than $3,000 per host, making it less than Fusion-io.

Product Capability Comparison

The table below compares the three tiers of caching software available for host-based flash.

Column1

Vmware vFRC

Fusion-io ioTurbine

Proximal Data AutoCache

PernixData FVP v2.0

Write-Through Caching (Reads)

X

X

X

X

Write-Back Caching (Writes)

NO

NO

NO

X

Dynamically Assigned

NO

X

X

X

Caches to PCI Flash

X

X

X

X

Caches to SAS/SATA SSD

X

NO

X

X

Caches to RAM

NO

NO

NO

X

Pre-Assignment required

X

NO

NO

NO

In-depth knowledge of workloads required

X

NO

NO

NO

Clustered

NO

NO

NO

X

Supports vMotion

X

X

X

X

Supports vMotion Maintaining Cache

NO

X

X

X

Continued acceleration upon failure of SSD

NO

NO

NO

X

Supports DRS

X

X

X

X

Supports HA (cache info is lost)

NO

NO

NO

X

Vmware Certified

X

NO

NO

X

Block Storage

X

X

X

X

NFS

?

X

X

X

Outage Required to add VM

X

X

X

NO

vSphere 5.0 or better required

NO

X

X

X

vSphere 5.5 H/W v10 required

X

X

N/A

N/A

vCenter Plug-in for Configuration

X

NO

X

X

vCenter Plug-in for Performance Mgmt

NO

NO

X

X

Seamless Hypervisor Upgrade

X

NO

NO

X

Licensing

n/a

Per host

Per host

Per host

1st Year pricing per host

Free with vSphere 5.5 EE

$3,900

$999 (<500GB)

$7,500

Recurring cost per host

Free with vSphere 5.5 EE

$780.0

$200

$1,567

 

Product Price Comparison

The table below compares the five caching software products available for host-based flash.

Vendor

Product

License Model

License Cost

Maintenance Cost

Notes

Vmware

vFlash Read Cache

vSphere Feature

n/a

n/a

Assuming vSphere 5.5 EE

Proximal Data

Auto Cache

0-500GB

$999

$200

List Price

Proximal Data

Auto Cache

500GB-1TB

$1,999

$400

List Price

Proximal Data

Auto Cache

>1TB

$2,999

$600

List Price

Fusion-io

ioTurbine

per host

$3,900

$780

List Price

SANDisk

Flashsoft

per host

$3,900

$780

List Price

PernixData

Flash Virtualization Platform 2.0

per host

$7,500

$1,570

List Price

Conclusion

This document provided a comprehensive look at the options, both hardware and software, as well as pricing estimates for the components required to deploy a host-based flash cluster in a virtual infrastructure. PernixData seems offer the best overall capabilities, given the fact that it is fully clustered, the most tightly integrated with vSphere, and provides caching of writes. Solid state drives are the best point of entry for an initial foray into the realm of host-based flash given cost and ease of deployment.

With any new technology, proof of concept is the best place to start. PernixData is willing to provide evaluation software and loaner SSDs to facilitate testing in a proof of concept environment. This proof of concept should be executed on a fraction of production hosts, focusing on problematic virtual machines, to see how the solution performs in the real world.

Pending a successful POC, and full deployment to production an organization can expect to see reduced CPU utilization on the existing SAN storage as well as a lower demand for IOPS on their SAN-attached storage arrays. With virtual machines depending more on flash disk for reads, their performance will improve while also freeing up resources on the SAN-attached storage arrays helping to improve performance for other SAN-based workloads.