Benefits of the SSD Cache in an Enterprise SAN
Hard Drive Disks (HDDs), are generally used as a secondary storage medium, and impose latency in data storage due to its mechanical nature. To overcome the latency, a component called cache, typically of smaller size but on a faster storage medium, is used to store data so that future requests can be served faster. While HDDs are generally inexpensive, and have a high latency, Solid State Drives (SSDs) have a lower latency, and are come at a higher price point. The ideal scenario is to make use of both inexpensive HDDs, as well as two or more SSDs as a cache component. These SSDs are faster, and also store a large amount of hot data.
StorTrends iTX, the software behind the StorTrends product line, contains an SSD cache within every storage pool. Each storage pool is comprised of a different number of logical RAID drives. Having an SSD cache component over the mechanical based RAID drives helps the storage stack maintain effective performance for data reads and writes, specifically when the IO pattern is random. The SSD Cache component not only caches the IO within the cache, but also continuously purges/flushes the dirty data to its physical location, the logical drives. This mechanism works in parallel with the ongoing IO to ensure that there are free blocks to host data where new IO is continuously being stored.
A System Administrator can create a storage pool using SSD cache that is linked through iSCSI from a Windows or Linux system. While creating volumes, the user has the option to select various profiles for the SSD cache, including:
• Accelerate All IOs
• Accelerate Random IOs
• No SSD Cache Acceleration
The SSD Cache module purges per its data block granularly. In the StorTrends iTX software, SSD Cache has four threads. These four threads are set based on a random calculation. This results in SSD Cache continuously firing 512 IOPS from each thread while purging. This results in a total of 2,000 IOPS from the SSD cache. The purging is done to the logical drives, which are RAID based on mechanical drives. In StorTrends, the amount of IOPS from SSD cache will make use of maximum disk queue length for the HDDs involved in RAID. Hence, similar to the SSD cache’s continuous performance, the purge method flushes the dirty data parallel to the logical drive to make way for more blocks in the SSD Cache for new data.
In a system with a mixture of acceleration profiles, the Accelerate-All-IOs profile uses both the read cache and write cache for all IOs. However, volumes that are configured with a Random-Only profile or No-SSD-Cache profile either hit the SSD cache or the logical drive and directly skip the SSD cache. When a high IO consists of 256 outstanding IOs with a granularity of 64KB cycles through SSD Cache, the purge is completely stopped. This is to ensure a stable response to the application IO. The SSD cache module waits for outstanding IO to go below the said level. A continuously high IO load can result in the SSD cache becoming full without purge activity, thereby making the SSD cache no longer usable.
Alternatively, when the SSD cache is not receiving a high load, the SSD cache purge is initiated using four threads to maximize the disk queue length at a logical drive level. This will impact the performance of the volumes whose IOs are hitting the logical drive directly. A sequential IO stream on a Random-only profile volume, or any type of IO load on a No-SSD-Cache profile volume, can be affected by the active purge from the SSD cache. The north side IOs (IO from host/initiator) going to the logical drive, waits for its turn at the disk queue. If there is a delay in completing the IO, the IO latency then increases heavily and the performance of the system dips drastically. Thus, regardless of whether the IO load through the SSD cache is high or low, the purging or lack of purging is problematic.
The SSD Cache must continuously purge and avoid filling up. The cache must not cancel the purge and must push to high outstanding IOs. In a scenario of medium outstanding IO, the SSD cache should not use all threads for purging. This occurs so that the north side IO for non-SSD cached volumes maintains consistent performance. This can be achieved when a purge is run with a different number of threads based on outstanding IO. This solution is implemented in the StorTrends iTX software. The SSD cache purge intelligence can start/stop threads based on outstanding IO and dirty % during runtime.
Therefore, when outstanding IO is high and dirty IO is greater than 75%, the cache makes sure one thread runs continuously to purge the dirty IO, so that the SSD Cache volumes continue to hit the SSD cache. Alternatively, when outstanding IO is low, it uses all threads to purge. When the outstanding IO is medium, it checks the dirty % and decides the number of threads to run with.
There are special cases when this policy is not followed. This could be when the SSD cache is in a planned or unplanned degraded state (i.e. one drive is removed from the SSD cache and needs to be replaced). In this situation, it must quickly purge the dirty data before the other drive also fails (which may occur because both drives were put to use at same time, and therefore may have the same lifetime). In this case, the purge runs with all four threads.
In a StorTrends 3500i hybrid array running StorTrends iTX software, SSD cache policy based purging helps to keep the performance of both SSD cached volumes and non-SSD cached volumes consistent. This ensures that SSD cached volumes never starve for blocks in the SSD cache.