Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. Deepseek Coder 6.7B Base is a 6.7B parameter model with Multi-Head Attention trained on 2 trillion tokens by employing a window size of 16K and an extra fill-in-the-blank task
Fine-tuningDocs | DeepSeek Coder 7B Base can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for DeepSeek Coder 7B Base using Fireworks' reliable, high-performance system with no rate limits. |
DeepSeek Coder 7B Base is a base language model developed by DeepSeek AI as part of its DeepSeek Coder family. The model is trained from scratch on 2 trillion tokens, with a composition of 87% code and 13% natural language in English and Chinese. It uses a fill-in-the-blank auxiliary task during training.
This model is optimized for:
The model supports a context length of 4,096 tokens.
The full 4.1K token window is supported on Fireworks' on-demand deployments, which provide dedicated GPU access without rate limits.
The model has 6.9 billion parameters, rounded to 7B in the model name.
Yes. Fireworks supports LoRA-based fine-tuning on dedicated GPUs for this model.
Token metering is based on combined input and output tokens.