Thursday, March 6, 2025

DeepSeekMoE 16B is a Mixture-of-Experts (MoE) language model with 16.4B parameters

 DeepSeekMoE 16B is a Mixture-of-Experts (MoE) language model with 16.4B parameters. It employs an innovative MoE architecture, which involves two principal strategies: fine-grained expert segmentation and shared experts isolation. It is trained from scratch on 2T English and Chinese tokens, and exhibits comparable performance with DeekSeek 7B and LLaMA2 7B, with only about 40% of computations. For research purposes, we release the model checkpoints of DeepSeekMoE 16B Base and DeepSeekMoE 16B Chat to the public, which can be deployed on a single GPU with 40GB of memory without the need for quantization.


The model code file



https://github.com/deepseek-ai/DeepSeek-MoE



Generative AI, Robot Operating System (ROS 2), Computer Vision, Natural Language Processing service, Generative AI Chatbot, Machine Learning, Mobile App, Web App? Yes, I do provide!


Call me: +84854147015

WhatsApp: +601151992689

https://amatasiam.web.app

Email: ThomasTrungVo@Gmail.Com

Facebook: 
https://www.facebook.com/voduytrung

X: 
https://x.com/ThomasTrung


No comments:

Post a Comment