TL;DR: Lustre Unveiled (1): The Lustre File System
Lustre Unveiled: Evolution, Design, Advancements, and Current Trends provides a comprehensive journey of Lustre, including its history and evolution, detailed archtiecture and design elements, comparison with other prominent storage technologies, case study of Lustre on a real-world supercomputer and the future development of Lustre.
In this post I share my digests of this journal’s section about Lustre filesystem introduction.
Introduction
The name “Lustre” is a fusion of “Linux” and “cluster”, signifying its development as a storage architecture tailored for cluster environment.
Lustre has been widely used in large-scale HPC environments. It showcased its leadership by powering top 6 of top 10 supercomputers and top 60 of top 100 suerpcomputers in 2023.
The Lustre File System
Architecture Overview
Lustre’s architecture is built on distributed, object-based storage framework that is managed by servers and accessible to client computers through an efficient network transport.
Servers provide two separate functionalities:
- Metadata Servers (MDS) - managing the filesystem’s namespace and access controls and initial data object collection;
- Object Storage Servers (OSS) - allocating storage space for the files and storing the actual data content;
In Lustre, a file consists of one metadata object and one or more data objects.:
- Metadata Targets (MDT) - storing metadata objects;
- Object Storage Targets (OST) - storing data objects;
Additionally there’s Management Server (MGS) dedicated to keeping track of servers, clients, storage targets and filesystem configuration parameters.
Lustre clients connect to the filesystem over the network using Lustre Netwroking (LNet) protocol.
Key Components
Management Server: an MGS is a host machine that handles configuration data and filesystem registries involving all active servers and clients within the filesystem.
- stores persistent configuration information for one or more Lustre filesystems on a block device (Management Target, MGT) and provides this info to clients and servers in the cluster
- MSS and OSS communicate with MGS to provide information when bootstrapping the filesystem services
- clients contact MGS to gather information when mounting filesystems
Metadata Server (MDS): an MDS is a host machine that manages the namespace and provide metadata services to clients for the filesystem
- this namespace encompasses all filesystem metadata, including hierarchy of directory and file names, and metadata objects that store attributes such as user and group ownership, creation/modification/access time
- MDS oversees various file metadata operations
- additional MDS is beneficial for improving heavy metadata workloads
Metadata Target (MDT): an MDT is a logical storage target utilized by an MDS to store metadata information
- Lustre uses a unique MDT inode for each regular file, directory, symbolic link and special file
- inode also holdsthe file layout referencing one or more OST storing the file data
- Lustre always has one MDT for the root directory, known as primary MDT
- multiple MDT may be used to increase capacity and performance by storing disjoint portion of the namespace
Object Storage Server (OSS) - an OSS is a host machine that manages file data objects, and moderates client access to these data objects by exposing the object attributes and data content to Lustre clients
- multiple OSS nodes are employed to collectively provide higher network bandwidth and attach additional storage
Object Storage Target(OST) - an OST is a logical storage target utilized by an OSS to store file contents
- an OSS typically manage multiple OSTs to increase storage capacity and data parallelism, as well as fault tolerance
- Both storage capacity and performance can be scaled by incorporating additional OSS nodes and/or OSTs
Object Storage Device (OSD) - an OSD is a lower-level software abstraction within the Lustre storage stack that manages access to a storage device
- OSTs and MDTs (and optionally MGTs) manage their local block storage devices using a local disk filesystem, referred to as the backend filesystem; OSD API provides an abstraction layer for accessing
Lustre Client - a Lustre client serves as the bridge between user applications and the metadata and data stored on the servers
- the client provides a POSIX-compliant namespace and I/O interfaces, along with non-POSIX extensions
- Lustre clients directly communicate with each MDS and OSS, and interactions with different servers operate concurrently
Lustre Networking (LNet) - Lustre clients connect to the filesystem over the network using LNet protocol, which facilitates communication between clients and servers, and abstracts details of the underlying network protocol and interfaces
- LNet facilitates efficient high performance data transfer via Remote Direct Memory Access (RDMA) over low-latency networks
- LNet is adaptable, accommodating InfiniBand, Ethernet and proprietary high-performance network
Reference
Anjus George, Andreas Dilger, Michael J. Brim, Richard Mohr, Amir Shehata, Jong Youl Choi, Ahmad Maroof Karimi, Jesse Hanley, James Simmons, Dominic Manno, Veronica Melesse Vergara, Sarp Oral, and Christopher Zimmer. 2025. Lustre Unveiled: Evolution, Design, Advancements, and Current Trends. ACM Trans. Storage 21, 3, Article 21 (June 2025), 109 pages. https://doi.org/10.1145/3736583