DeepSeek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. deepseek-coder-1.3b-base is a 1.3B parameter model with Multi-Head Attention trained on 1 trillion tokens.
Fine-tuningDocs | DeepSeek Coder 1.3B Base can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
On-demand DeploymentDocs | On-demand deployments allow you to use DeepSeek Coder 1.3B Base on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits. |