The persistent memory style of programming is attractive for many reasons, and it turns out you don’t need non-volatile memory hardware for your software to benefit.
This post complements my PIRL 2019 talk, “Persistent Memory Programming on Conventional Hardware,” and provides updates on recent developments. My article by the same title in the July/August 2019 issue of ACM Queue magazine covers the topic in greater detail: https://dl.acm.org/citation.cfm?id=3358957
Both the Queue article and the PIRL talk present working example code that illustrates how to program in “the persistent memory style” on conventional computers, without novel non-volatile memory (NVM) hardware.
The main attraction of the persistent memory style of programming is simplicity: Instead of keeping persistent data in complex opaque external persistent stores such as relational databases or key-value stores, applications manipulate persistent data directly, in memory, via
STORE instructions. There’s only one format for persistent data, the in-memory format; translation to/from a distinct storage format is no longer necessary. Most importantly, the programmer thinks in the single paradigm of imperative algorithms manipulating in-memory data structures. The programmer no longer mentally “context switches” to another paradigm for persistence, e.g., declarative SQL manipulating relational tables.
It’s easier to appreciate the essence of persistent memory programming if we first consider it separately from crash tolerance. If crash tolerance is not a requirement, then persistent memory programming mainly consists of laying out application data structures in file-backed memory mappings. A pointer cast on the return value of
mmap() allows our code to interpret persistent data in the backing file as application-defined data structures and to enjoy the same degree of type checking as in conventional code. We nearly always want to layer atop raw persistent memory a persistent heap that allocates from a file-backed memory mapping. In addition to
free() analogues, a persistent heap exposes a root pointer that applications can access and modify. Applications must ensure that all live persistent data are reachable from the root, because the root is the entry point to persistent data when programs resume execution following a shutdown.
Equipped with the simple but powerful toolkit of tricks and idioms outlined above, we can easily design persistent data structures from scratch. More importantly, it can be remarkably easy to retrofit persistence onto complex legacy software that was never designed to be persistent. My Queue article shows how to transform a C++ Standard Template Library (STL) container into a persistent container in a few dozen lines of code.
Now let’s consider crash tolerance, which is often a requirement in practice. A number of crash-tolerance mechanisms have been proposed for NVM in the research literature (NV-heaps, Mnemosyne, Atlas) and in industry (PMDK, NVM Direct); all require byte-addressable NVM hardware. If we’re programming in the persistent memory style on conventional hardware, the most natural crash-tolerance mechanism is failure-atomic
msync() (FAMS). FAMS simply strengthens the semantics of conventional
msync(), guaranteeing that the backing file always reflects the most recent successful
msync() call, regardless of failures. To achieve crash consistency with FAMS, the programmer simply calls FAMS when persistent data in memory are consistent, i.e., when they satisfy application-level invariants and correctness criteria, thereby ensuring that post-crash recovery code finds persistent data in a consistent state.
FAMS has been implemented at least a half dozen times in user-space libraries, in file systems, and in the Linux kernel. Two implementations are in commercial products that have been shipping to paying customers for years. Extensive experience both in research settings and in the field have shown that FAMS makes it remarkably easy to make complex legacy production software crash tolerant.
My favorite example involves HP Indigo printing presses. Indigo is a billion-dollar business, and an individual press can cost a million dollars. Several years ago, power outages were a major pain point for Indigo. A brief power failure could bring down an Indigo press for hours or even days; sometimes a field engineer had to visit the customer site to bring the press back into production. The problem was that power failures corrupted print workflow data structures, requiring painfully slow recovery. Indigo engineers had attempted to make their complex, highly-performance-tuned software crash tolerant but had failed. Then I gave Indigo a user-space implementation of failure-atomic
msync(). It took a single software engineer a few weeks to “slide it beneath” Indigo’s key data structures and to insert FAMS calls. The result was that crash recovery times reduced from hours/days to a few minutes. Since 2013 all Indigo presses have shipped with FAMS-based crash tolerance; it runs 24×7 and cannot be disabled, and has completely eliminated the power fail problem. The Indigo case study is described in an HP Labs tech report:
A major attraction of FAMS is that implementations impose no restrictions on the underlying storage layer. Any type of storage will do, from humble hard disks and SSDs to geographically replicated, strongly durable, high-availability, scalable cloud storage. Application operators thus have complete freedom to configure the storage layer to trade monetary cost for performance, reliability, availability, scalability, and other requirements; the application remains unchanged.
During my PIRL talk, I announced the release of a new user-space library implementation of FAMS for Linux called “famus” (failure-atomic
msync() in user space). The Queue article provides details on how to download
Following my 30-minute PIRL talk, I took audience questions for ten minutes. One audience member pointed out that the C++17 language standard expresses misgivings about
mmap(), to which I replied that a standards-track proposal (“bless”) aims to reconcile
mmap() and C++. Another audience member pointed out that different compiler versions may pack
structs differently, which causes trouble for persistent memory programs. In a similar vein, we discussed the problem of “schema evolution,” i.e., the need to modify persistent data structures in response to changing application requirements. Programmers may pad data structures with extra space in anticipation of future growth, thereby averting or at least postponing trouble if new fields must be added, but in the worst-case re-formatting the data structures will be required. Other questions addressed multi-threaded code and the correct use of persistent heap root pointers; written explanations of these topics are in the Queue article.
Since delivering my talk at PIRL I’ve written a second user-space library implementation of FAMS, I’ve written a second article about persistent memory programming on conventional hardware, and I’ve delivered a second talk on these topics. The new library, “
famus_snap“, is remarkably concise because it leverages efficient per-file snapshots on filesystems such as BtrFS, XFS, and OCFS2; XFS developer Christoph Hellwig suggested the design of
famus_snap and reviewed my implementation. The new article, which includes
famus_snap source code listings, is forthcoming in the Winter issue of USENIX ;login: magazine. The new talk was delivered at the Storage Networking Industry Association (SNIA) Storage Developer Conference (SDC) on 25 September. The main difference between the PIRL talk and the SNIA SDC talk is that the SDC talk explicitly recommends teaching persistent memory programming without crash consistency before teaching crash consistency mechanisms: Programming without a crash-tolerance requirement sets the gold standard for ergonomics, against which the additional work required for crash tolerance can be judged. My SNIA SDC talk is available here: https://sniasdc19.pathable.com/meetings/1071902
In the months ahead I plan to torture-test my FAMS implementations against sudden whole-system power interruptions and write an article about these tests. I also intend to implement a persistent memory allocator designed for the FAMS cost model (“pay $1 for every page you’ve changed since the last FAMS call”).
If you have questions about persistent memory programming or about failure-atomic
msync(), please write to me at email@example.com