LLM
Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
On-demand deployments allow you to use Llama 3 8B on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.
See the On-demand deployments guide for details.