Title: What you should know about Flash Storage
1- What you should know about Flash Storage
2The flash storage is often a topic on our support
channels. Toradex invests a lot of resources into
making the storage as reliable as possible.
Nevertheless, it is important to understand some
basics of the underlying storage device. One of
the most important things you have to know is
that if the storage wears out, you can destroy
your storage device by writing a lot to the
built-in storage device.
With this post, we want to give you a basic
overview of potential issues flash storage can
have. Lets start with a short technology
overview first.
Flash types Raw Flash vs Managed
Flash Currently, Toradex computer modules used
NOR, NAND, and eMMC flash. NOR and NAND are raw
storage devices. The main difference between NAND
and NOR is that NOR allows random access, doesnt
need error correction as well as has higher
cost-per-bit. NAND on the other side can only be
read in pages, some bits in a page may be wrong
and need to be corrected by an error correction
mechanism.
3eMMC Flash combines NAND memory with a built-in
controller that handles most of the nasty things
you have to take care of when dealing with NAND
flash. eMMC is also called managed NAND. With
NAND and NOR flash on the other side, the OS and
device drivers are responsible to handle these
issues. We will discuss the different kinds of
challenges later in this blog post. Here is a
small overview on the flash type used on our
computer modules
4Evolution of NAND Flash From SLC to MLC The bit
density on NAND flash has evolved over time.
First NAND devices were Single Level Cell (SLC)
flash. This means every flash cell stores one
single bit. With Multi Level Cell (MLC), flash
can store two or more bits per cell, so the bit
density gets increased. Sounds great but with MLC
there are downsides as well with MLC NAND, comes
also a higher bit error rate and lower endurance.
All eMMC use MLC NAND. Some of the eMMC devices
allow you to switch into a pseudo-SLC (PSLC) mode
on parts of (or) all the storage. This will
reduce the size of the storage whereas the
endurance of the device gets increased.
5Here is a rough comparison of SLC and MLC.
Endurance Limited amount of erase cycles As
already mentioned, one of the most important
things you have to know about any flash
technology used on our devices is that you can
write and erase flash only a limited number of
times.
6Writing huge amounts of data to the flash device
is not a good idea! As shown in the table above,
depending on the type of flash you have between
100K and 10K erase cycles available before the
data potentially gets corrupted or lost. The term
erase cycles is irritating. One limitation of
flash storage is, that it cannot be rewritten
without being erased before. Further on, this
cannot be done at the bit level but only at
bigger chunks called block. In a worst case, this
means that if you only want to write one single
byte, you potentially have to erase and write one
whole block. The block size can be up to 512 KB.
The effect of erasing / writing more than you
actually want is called write amplification. May
be, there are even additional write operations
needed by the flash file system. If you want to
estimate the lifetime of the flash storage on
your embedded device, you should take that into
consideration.
Increase lifetime of flash The following section
shows how the lifetime of NAND or eMMC flash can
be improved. Dont worry, all these things are
already handled by Toradex, there is no need for
any action on your side.
7Prevent wearing Wear leveling Lets assume you
are aware of the fact, that flash can be erased /
written only a limited number of times and you
only update small amounts of data periodically.
If this data would be written always to the same
flash cell you could only write max.15K times on
MLC flash. While you have never touched all the
other flash cells, your data could get lost and
the flash is broken as the cells you have been
writing to are worn out. Smart flash drivers use
wear leveling. This technique ensures that all
flash cells are worn similarly and not always the
same cells are used.
Detect and correct errors Error correction
Codes On a NAND flash device, it can happen that
single bits start flipping and your data could
get corrupted. This can either be due to wearing
or any other disturbance. Therefore, the data is
secured by Error Correction Codes (ECC). This
allows first to detect corrupted data and second
to correct the data. Depending on the Flash
Controller and the NAND / eMMC flash itself, more
or less errors can be detected and corrected.
8Bad block handling As ECCs enables us to find
erroneous blocks, we can stop using these bad
blocks any longer. Depending on the ECC and the
amount of bits that can corrected, a threshold is
set that defines the maximal number of errors
that are accepted before further action is taken.
Once we reach this threshold, the data gets
corrected and is moved to a good block on the
device. The previous location is marked as bad.
Bad blocks are not used any longer as they are
potentially broken.
Power fail tolerance What happens to your device
in case of a sudden power loss while writing to
the flash? On embedded devices, you expect that
the device still boots properly and your data did
not get corrupted. To reach that, all software
layers and hardware parts involved have to be
capable of handling such a situation. You find
some more details in the next section on how we
reach that goal.
9Implementation Details on Toradex SoMs As seen
above, having a proper setup depending on the
underlying storage type is crucial. Lets go into
the details of the current setup you on the
Toradex BSPs. NAND-based devices The following
figure gives you a generic overview on the setup
of our WinCE and Linux BSPs on NAND based devices.
10Storage device On all our devices using NAND, we
use SLC NAND. Hardware Driver The hardware
driver offers a generic interface between the
NAND device and the upper layers. This layer is
also responsible to detect and correct errors. On
Linux, all our current images use MTD. On WinCE,
we use the Microsoft Flash PDD layer. There are
some exceptions such as Colibri T20, where we use
a device specific PDD layer on WinCE.
Flash Translation Layer This layer is
responsible for wear leveling and bad block
management. On Linux, this is done by the UBI
subsystem while on WinCE, it is done by the
Microsoft MDD layer. Again, on the Colibri T20,
we use a device specific layer and not the
Microsoft Flash MDD.
11Filesystem The file system is actually the part
that manages the partitions and the files stored
in them. A user will use the file API to use the
file system (on Linux trough the VFS layer). On
Linux, we use currently UBI FS while on WinCE,
Transaction Save exFAT (TexFAT). Both are
power-cut tolerant. The underlying layers are
power-cut tolerant as well by supporting atomic
operations.
eMMC-based devices The following table shows the
setup using the Toradex System on Modules using
eMMC flash devices.
12Storage device Compared to the raw NAND, most
magic is done by the eMMC itself. Higher layers
do not have to take care of wear leveling, error
correction or bad block management. Hardware
Driver This is the interface between the MMC
controller and the file system. Filesystem As
for the NAND based devices on WinCE, here also we
use TexFAT our Linux Images use the ext3
filesystem. Again, both are power-cut tolerant.
13- Conclusion and Recommendations
- Toradex does its best to provide reliable and
enduring flash storage. Nevertheless, you should
always keep an eye on flash usage during
application development. - Reduce write access to the flash device
- Know the write behavior of your final product
- Check if with the write behavior, the requested
lifetime of your product is feasible or not - Run stress tests and longtime tests
- Not using the full capacity greatly improves the
efficiency of wear leveling algorithms
If you need any further information or you think
we could improve our default setup, please get in
contact with our engineers.
14Thank you