The Fire-Flyer File System (3FS) is a high-performance distributed file system designed to address the challenges of AI training and inference workloads. It leverages modern SSDs and RDMA networks to provide a shared storage layer that simplifies development of distributed applications. Key features and benefits of 3FS include:
Performance and Usability
Disaggregated Architecture Combines the throughput of thousands of SSDs and the network bandwidth of hundreds of storage nodes, enabling applications to access storage resource in a locality-oblivious manner.
Strong Consistency Implements Chain Replication with Apportioned Queries (CRAQ) for strong consistency, making application code simple and easy to reason about.
File Interfaces Develops stateless metadata services backed by a transactional key-value store (e.g., FoundationDB). The file interface is well known and used everywhere. There is no need to learn a new storage API.
Diverse Workloads
Data Preparation Organizes outputs of data analytics pipelines into hierarchical directory structures and manages a large volume of intermediate outputs efficiently.
Dataloaders Eliminates the need for prefetching or shuffling datasets by enabling random access to training samples across compute nodes.
Checkpointing Supports high-throughput parallel checkpointing for large-scale training.
KVCache for Inference Provides a cost-effective alternative to DRAM-based caching, offering high throughput and significantly larger capacity.
Performance
1. Peak throughput
The following figure demonstrates the throughput of read stress test on a large 3FS cluster. This cluster consists of 180 storage nodes, each equipped with 2×200Gbps InfiniBand NICs and sixteen 14TiB NVMe SSDs. Approximately 500+ client nodes were used for the read stress test, with each client node configured with 1x200Gbps InfiniBand NIC. The final aggregate read throughput reached approximately 6.6 TiB/s with background traffic from training jobs.
https://github.com/deepseek-ai/3FS
Generative AI, Robot Operating System (ROS 2), Computer Vision, Natural Language Processing service, Generative AI Chatbot, Machine Learning, Mobile App, Web App? Yes, I do provide!
Call me: +84854147015
WhatsApp: +601151992689
https://amatasiam.web.app
Email: ThomasTrungVo@Gmail.Com
Facebook: https://www.facebook.com/voduytrung
X: https://x.com/ThomasTrung
No comments:
Post a Comment