SharePointTaskMaster: DeepSeek EPLB - Expert Parallelism Load Balancer

When using expert parallelism (EP), different experts are assigned to different GPUs. Because the load of different experts may vary depending on the current workload, it is important to keep the load of different GPUs balanced. As described in the DeepSeek-V3 paper, we adopt a redundant experts strategy that duplicates heavy-loaded experts. Then, we heuristically pack the duplicated experts to GPUs to ensure load balancing across different GPUs. Moreover, thanks to the group-limited expert routing used in DeepSeek-V3, we also attempt to place the experts of the same group to the same node to reduce inter-node data traffic, whenever possible.

To facilitate reproduction and deployment, we open-source our deployed EP load balancing algorithm in eplb.py. The algorithm computes a balanced expert replication and placement plan based on the estimated expert loads. Note that the exact method to predict the loads of experts is out of this repo's scope. A common method is to use moving average of historical statistics.

Code of EPLB.py

The Algorithm

The load balancing algorithm comes with two policies used for different cases.

Hierarchical Load Balancing

When the number of server nodes divides the number of expert groups, we use the hierarchical load balancing policy to harness the group-limited expert routing. We first pack the expert groups to nodes evenly, ensuring the loads of different nodes are balanced. Then, we replicate the experts within each node. Finally, we pack the replicated experts to individual GPUs to ensure different GPUs are load-balanced. The hierarchical load balancing policy can be used in prefilling stage with a smaller expert-parallel size.

Global Load Balancing

In other cases, we use the global load balancing policy that replicates the experts globally regardless of expert groups, and pack the replicated experts to individual GPUs. This policy can be adopted in decoding stage with a larger expert-parallel size.

https://github.com/deepseek-ai/EPLB/

Generative AI, Robot Operating System (ROS 2), Computer Vision, Natural Language Processing service, Generative AI Chatbot, Machine Learning, Mobile App, Web App? Yes, I do provide!

Call me: +84854147015

WhatsApp: +601151992689

https://amatasiam.web.app

Email: ThomasTrungVo@Gmail.Com

Facebook: https://www.facebook.com/voduytrung

X: https://x.com/ThomasTrung

SharePointTaskMaster

Thursday, March 6, 2025

DeepSeek EPLB - Expert Parallelism Load Balancer

No comments:

Post a Comment