Log-Structured File Systems

CPUs keep getting faster, disks are becoming much bigger and cheaper (but not much faster), and memories are growing exponentially in size.
The one parameter that is not improving and bounds is disk seek time. A performance bottleneck is arising in many file systems.
The idea that drove the LFS (the Log-structured File System) design is that as CPUs get faster and RAM memories get larger, disk caches are also increasing rapidly. Consequently, it is now possible to satisfy a very substantial fraction of all read requests directly from the file system cache, with no disk access needed.
Most disk accesses will be writes. In most file systems, writes are done in very small chunks. Small writes are highly inefficient, since a 50- $\mu$ sec disk write is often preceded by a 10-msec seek and a 4-msec rotational delay. With these parameters, disk efficiency drops to a fraction of 1 percent.
While the writes can be delayed, doing so exposes the file system to serious consistency problems if a crash occurs before the writes are done.
From this reasoning, the LFS designers decided to re-implement the UNIX file system in such a way as to achieve the full bandwidth of the disk. The basic idea is to structure the entire disk as a log.
The logging algorithms have been also applied successfully to the problem of consistency checking.
The resulting implementations are known as log-based transaction-oriented (or journaling) file systems.
Such file systems are actually in use (NTFS, ext3, ReiserFS).
Recall that a system crash can cause inconsistencies among on-disk file system data structures, such as directory structures, free-block pointers, and free FCB pointers.
A typical operation, such as file create, can involve many structural changes within the file system on the disk.
- Directory structures are modified,
- FCBs are allocated,
- Data blocks are allocated,
- The free counts for all of these blocks are decreased.
These changes can be interrupted by a crash, and inconsistencies among the structures can result.
For example, the free FCB count might indicate that an FCB had been allocated, but the directory structure might not point to the FCB.
The consistency check may not be able to recover the structures, resulting in loss of files and even entire directories.
The solution to this problem is to apply log-based recovery techniques to file-system metadata updates.
Both NTFS and the Veritas (improved UFS) file system use this method, and it is an optional addition to UFS on Solaris 7 and beyond.
Fundamentally, all metadata changes are written sequentially to a log. Each set of operations for performing a specific task is a transaction.
- Once the changes are written to this log, they are considered to be committed, and the system call can return to the user process, allowing it to continue execution.
- Meanwhile, these log entries are replayed across the actual file system structures.
- As the changes are made, a pointer is updated to indicate which actions have completed and which are still incomplete.
- When an entire committed transaction is completed, it is removed from the log file, which is actually a circular buffer.
The log may be in a separate section of the file system or even on a separate disk.
If the system crashes, the log file will contain zero or more transactions.
- Any transactions it contains were not completed to the file system, even though they were committed by the OS, so they must now be completed.
- The transactions can be executed from the pointer until the work is complete so that the file-system structures remain consistent.
The only problem occurs when a transaction was aborted -that is, was not committed before the system crashed.
- Any changes from such a transaction that were applied to the file system must be undone, again preserving the consistency of the file system.
- This recovery is all that is needed after a crash, eliminating any problems with consistency checking.

Cem Ozdogan 2011-02-14