Llama 3.1 Nemotron 70B API & Playground

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries. This model was trained using RLHF on a Llama-3.1-70B-Instruct model. As of 1 Oct 2024, this model is #1 on all three automatic alignment benchmarks (verified tab for AlpacaEval 2 LC), edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet.

Llama 3.1 Nemotron 70B API Features

Fine-tuning Docs	Llama 3.1 Nemotron 70B can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model
On-demand Deployment Docs	On-demand deployments allow you to use Llama 3.1 Nemotron 70B on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Metadata

State

Ready

Created on

10/16/2024

Kind

Base model

Provider

NVIDIA

Hugging Face

nvidia/Llama-3.1-Nemotron-70B-Instruct

Specification

Calibrated

Yes

Mixture-of-Experts

Parameters

70.5B

Supported Functionality

Fine-tuning

Supported

Serverless

Not supported

Context Length

131k tokens

Function Calling

Not supported

Embeddings

Not supported

Rerankers

Not supported

Support image input

Not supported

Llama 3.1 Nemotron 70B

Llama 3.1 Nemotron 70B API Features

Fine-tuning

On-demand Deployment

Metadata

Specification

Supported Functionality