EDITORS NOTE : This article was written by an intern author and was has heavily copied from the following article : http://www.extremetech.com/extreme/210492-extremetech-explains-how-do-ssds-work. We would advise our viewers to read the well written article from Extremetech before going through ours.
SSDs have been a humongous step in the department of Storage media. For years, we were stuck to spinning magnetic media and various others composed of its vestiges. These were staunch and reliable to say the least but lacked on a crucial aspect of storage i.e. speed. In our journey to understanding SSDs, the first and foremost has to be decrypting the contrasting differences between them and their predecessors i.e. Hard Disk Drives (HDDs)
Hard Disk Drives store data on a series of spinning magnetic platters with a rotating actuator over them which acts as a read / write arm. When a read / write instruction is encountered, the actuator drops over the concerned location on the magnetic disk and the results are redirected to the CPU. In cases where large amounts of data are to be read, the actuator must traverse different locations often for which, it has to wait for the platter to spin multiple times. This is a prime reason behind why HDDs are astonishingly slower when compared to SSDs.
It was quite lucid from the very beginning that HDDs won’t be able to cope up with the speeds at which CPUs and System memory operated. The time in which a typical HDD is able to execute the instructions assigned to it are “Non-Zero” and are measured in milliseconds while the operating times of a modern CPU are measured in nanoseconds. For comparison 1 millisecond = 1,000,000 nanoseconds which is a more than significant margin. Equipped with even the smallest of platters, on-disk caches and other advancements, HDDs appear lackluster in comparison to the pace of CPUs and System Memory. Even the fastest of HDDs spinning at 10,000 RPM i.e. the WD Velociraptor family and some enterprise ones clocking speeds at 15,000 RPM were achingly slow in comparison to CPUs.
The fresh crop of Storage solutions : SSDs
Solid state drives (SSDs) derive their names from their functionality. An SSD lacks moving or spinning mechanical parts, a key factor behind how they can achieve such impressive speeds. Instead of storing data on a spinning magnetic platter, an SSD resorts to storing bits of information on NAND flash which is made up of what are called ‘Floating Gate Transistors’. These transistors differ from those found on typical DRAM in not having to be refreshed multiple times. NAND flash can retain data even when not powered up which makes it a kind of Non-Volatile memory. NAND is nowhere as fast as main memory or CPUs, but it’s still multiple orders of magnitudes faster than mechanical storage.
The diagram above is a representation of a typical NAND cell. Electrons are stored in the floating gates which are marked by a ‘0’ representing the charged state and a ‘1’ representing the uncharged state. This defeats our general understanding of how we perceive 0s and 1s, but yes, that’s what it means. Stepping down hierarchically, the largest division is of a block which is further divided into pages which save bits of information. A typical page size varies from 2k, 4k, 8k or 16k with 128 to 256 pages per block. This equates to a minimum of 256KB and a maximum of 4 MB per block. NAND
There are a few things to be noted from the above chart :
- As the number of bits per cell increases, the read / write latency skyrockets along with the erasure latency.
- TLC NAND’s latency is about 4x times worse for reads and 6x times worse for writes in comparison to SLC NAND.
- The impact is much worse on writes as compared to reads, it isn’t proportional either viz. TLC NAND can hold only 50% more data than MLC NAND while being almost twice as low as MLC.
The reason why TLC NAND is slower than MLC or SLC has to do with how the data moves in and out of the NAND cell. With SLC, the controller only needs to know whether the bit is a ‘0’ or ‘1’. MLC NAND can have upto 4 possible values for each cell viz. ’00’, ’01’, ’10’, ’11’. The situation grows even more complex with TLC NAND as the controller has to ascertain an accurate amount of voltage to operate on the cells.
Data manipulation on SSDs
Apart from the innumerable structural and functional advantages that SSDs boast of, they have an impediment. While SSDs can read and write at a page level (Assuming that the surrounding cells are empty), overwriting data is a much more arduous task as it involves erasure followed by writing. Theoretically, SSDs should be able to erase data at a page level but the amount of voltage required to accomplish the errand stresses the surrounding cells to a point that their original state is wiped clean. Therefore, to prevent undesired loss of information, SSDs confront the issue by erasing data on a block level.
When data is to be re-written, the only rational path is an intricate one which involves the following steps :-
- The original contents of the block are copied onto on-disk memory and then the block is erased.
- The new data along with the replacement data is written onto the now erased block
- In case the SSD is full and there are no empty pages available for writing, the SSD must first browse through shelves of NAND in order to find blocks that are marked for deletion, erase them and then re-write the new contents of the cell.
This is why SSDs can begin limping as they age. A brand new SSD is likely to have much more empty pages than an old one, owing to this, the SSD can skip through wandering endless roads of NAND and immediately get down to its job.
SSDs tackle with the foe that we discussed earlier by applying a technique called ‘Garbage Collection’. Garbage collection can be defined as a background process which eliminates stale data and frees up new blocks for overwriting / writing when the system is in an idle or low power state. This ensures that when new data is to be written next time, the SSD doesn’t have to scroll through pages of NAND in seeking an empty block.
The above diagram is a perfectly articulate depiction of how Garbage Collection works. Apart from replacing the old pages (A-D) with (A’-D’), it also writes four new blocks (E-H). After writing the new information to the empty pages, the information in the original pages is marked as stale. When the computer in an idle state, the SSD copies the contents of Block X that are to be retained, to a new Block (Y) and then proceeds with wiping out the older Block X to free up some vacancy.
The TRIM command
When you delete a file from an HDD on windows, the file isn’t exactly wiped out of existence. Instead, the OS tells the HDD that it can re-write the data present in that particular location, the next time it needs to perform a write. This is why its possibly to restore files from the recycle bin (Unless you clear it). With a traditional HDD, the OS doesn’t need to maintain a journal comprising the state of sectors and the state of queues related to them. With an SSD, this job assumes much importance.
The TRIM command instructs the SSD that it can skip re-writing certain data the next time it performs a block erase in case the block to be erased has been marked stale and requires no longer data preservation. This reduces write amplification and the net amount of writes that are performed. Both reads and writes damage NAND flash, but writes have a relatively larger adverse effect on the longevity of SSDs. Fortunately, this woe has been effectively ruled out with the introduction of smarter SSD controllers and durable NAND but as the saying goes ‘Prevention is better than cure’.
Write Amplification and Wear Leveling
When you update a file of any size, the amount of data written is always larger than the actual update. Because SSDs write at a page level but erase data at a block level, altering a file would mean that you are instructing the controller to erase the entire block and write all the original contents + the updated information to the now erased block or a new block. This means that on your way to updating a 4KB file, you could well end up writing as much as 4MB of data. The TRIM command and Garbage collection over the years have effectively palliated write amplification but a flawless solution is yet to be engineered.
As we earlier discussed how writes and erasure have a significant chunk of the life of an SSD, it is important to ensure that certain NAND blocks aren’t overused and re-written over and over again. This is known as wear level which generally refers to maintaining a balance between the reads and writes among the entire SSD so that parts of the SSD aren’t crippled while others remain fully functional. Wear leveling is supposed to increase the life expectancy of an SSD but while fulfilling its job, it ends up bumping write amplification. In other words, to distribute reads and writes evenly across the entire disk, it is required that some blocks be erased even when their contents haven’t actually been altered. A decent wear leveling algorithm aims to assuage these impacts and bring about a net boost in the life of an SSD
The SSD controller
SSD controllers have a strenuous job at hand. Conceptualize everything that I earlier explained being performed in a span of microseconds simultaneously. By now, you should be envisaging quite a few electrons bump around inside your head trying to figure out where to head next ; that could be an inveigling way to describe how SSD controllers accomplish their jobs but the path is a rather steep one.
They often employ DDR3 memory pools to act as on-disk caches, providing assistance during the read/write and erasure cycles. Faster cache generally assumes marked importance in the case of SSDs and hence many SSD manufacturers have come up with solutions such as furnishing SLC NAND as memory buffers. Because the NAND flash in an SSD is typically connected to the controller through a series of parallel memory channels, you can think of the drive controller as performing some of the same load balancing work as a high-end storage array — SSDs don’t deploy RAID internally, but wear leveling, garbage collection, and SLC cache management all have parallels in the big iron world.
Some drives also deploy data compression algorithms in order to dial down the amount of data being written. The SSD controller is also associated with the job of error correction.
Unfortunately, not much is known about the delphic pieces of hardware that SSD controllers are since manufacturers have been wretched enough to not part with any secret. A major proportion of people flummox the IOPS, the type of NAND used, to be the key behind SSD performance but much NAND flash’s underlying performance is governed by the controller making them an even more substantial component.
Blooming days ahead?
With recent years witnessing a discernible fall in the prices of SSDs, we could well surmise that the future looks blinding bright for them but we shouldn’t remain oblivious of some facts such as the concern behind SSD longevity and data fades. It is quite overt that these issues have been perceptibly eradicated but the troubles are pestering to say the least. Another concern that could possibly pose a threat to the throne of SSDs is the fact that shrinking node size of NAND cells makes them more fragile owing to which both the data retention times and write performance are consequentially lower for 20nm NAND as compared to a 40nm one. The capacity of SSDs were a debated issue a year ago but they have vanished since the introduction of X-Point 3D NAND in the SSD market which could prove to be a pioneering fix to all the worries of SSD manufacturers.
NAND is deduced to defend its title of ‘King of the Hill’ for another 4-5 years with technologies such as Magnetic RAM and phase change memory already approaching. Though both are still in early phases of their development and barely pose any major alarm to SSD manufacturers since they are yet to overcome countless hurdles before running on our desks and leaving us in awe.
Currently, SSDs are unarguably the best money can buy as far as speed is concerned and if you haven’t got yours yet, now would be the perfect time to grab one here considering the prices have dropped to an abominable low in the recent years.