DeepSeekMoE 16B is a Mixture-of-Experts (MoE) language model with 16.4B parameters. It employs an innovative MoE architecture, which involves two principal strategies: fine-grained expert segmentation and shared experts isolation. It is trained from scratch on 2T English and Chinese tokens, and exhibits comparable performance with DeekSeek 7B and LLaMA2 7B, with only about 40% of computations. For research purposes, we release the model checkpoints of DeepSeekMoE 16B Base and DeepSeekMoE 16B Chat to the public, which can be deployed on a single GPU with 40GB of memory without the need for quantization.
The model code file
Generative AI, Robot Operating System (ROS 2), Computer Vision, Natural Language Processing service, Generative AI Chatbot, Machine Learning, Mobile App, Web App? Yes, I do provide!
Call me: +84854147015
WhatsApp: +601151992689
https://amatasiam.web.app
Email: ThomasTrungVo@Gmail.Com
Facebook: https://www.facebook.com/voduytrung
X: https://x.com/ThomasTrung
No comments:
Post a Comment