Get started in minutes
Start fast with Serverless
Use popular models instantly with pay-per-token pricing. Perfect for quality vibe testing and prototyping.
Deploy models & autoscale on dedicated GPUs
Deploy with high performance on dedicated GPUs with fast autoscaling and minimal cold starts. Optimize deployments for speed and throughput.
Fine-tune models for best quality
Boost model quality with supervised and reinforcement fine-tuning of models up to 1T+ parameters. Start training in minutes, deploy immediately.
Not sure where to start? Choose Serverless to prototype quickly, then move to Deployments to optimize and run production workloads, or Fine-tuning to improve quality.Need help optimizing deployments, fine-tuning models, or setting up production infrastructure? Talk to our team - we’ll help you get the best performance and reliability.
What you can build
100+ Supported Models
Text, vision, audio, image, and embeddings
Migrate from OpenAI
Drop-in replacement - just change the base URL
Function Calling
Connect models to tools and APIs
Structured Outputs
Reliable JSON responses for agentic workflows
Vision Models
Analyze images and documents
Speech to Text
Real-time or batch audio transcription
Embeddings & Reranking
Use embeddings & reranking in search & context retrieval
Batch Inference
Run async inference jobs at scale, faster and cheaper