DeepSeek R1 0528, an updated version of the state-of-the-art DeepSeek R1 model, is now available. Try it now!

BigCode Logo Mark

StarCoder2 3B

StarCoder2-3B is a 3B parameter model trained on 17 programming languages from The Stack v2, with opt-out requests excluded. The model uses Grouped Query Attention, a context window of 16,384 tokens with a sliding window attention of 4,096 tokens, and was trained using the Fill-in-the-Middle objective on 3+ trillion tokens.

Try Model

Fireworks Features

On-demand Deployment

On-demand deployments give you dedicated GPUs for StarCoder2 3B using Fireworks' reliable, high-performance system with no rate limits.

Learn More

Info

Provider

BigCode

Model Type

LLM

Context Length

16384

Pricing Per 1M Tokens

$0.1