GLM 5.2 is live! Opus-level intelligence at open-source rates. Pay per token on serverless. Try it today.

Model Library
/01.Ai/Yi 34B
Yi Logo Mark

Yi 34B

Ready

The Yi series models are the next generation of open-source large language models trained from scratch by 01.AI. Yi-34B model ranked first among all existing open-source models (such as Falcon-180B, Llama-70B, Claude) in both English and Chinese on various benchmarks, including Hugging Face Open LLM Leaderboard (pre-trained) and C-Eval (based on data available up to November 2023).

Yi 34B API Features

On-demand Deployment

Docs

On-demand deployments give you dedicated GPUs for Yi 34B using Fireworks' reliable, high-performance system with no rate limits.

Yi 34B FAQs

What is Yi 34B and who developed it?

Yi 34B is a 34.4 billion parameter base language model developed by 01.AI. It is part of the Yi series, trained from scratch to support both English and Chinese. As of late 2023, it ranked first among open-source models (including Falcon-180B, Llama 2-70B, and Claude) on benchmarks like the Hugging Face Open LLM Leaderboard and C-Eval.

What applications and use cases does Yi 34B excel at?

The model is suitable for:

  • Conversational AI
  • Code assistance
  • Agentic systems
  • Enterprise RAG
  • Search and multimedia reasoning
What is the maximum context length for Yi 34B?

Yi 34B supports a context length of 4,096 tokens.

What is the usable context window for Yi 34B?

The full 4.1K token window is available when running the model on Fireworks' on-demand infrastructure.

What is the maximum output length Fireworks allows for Yi 34B?

Outputs are constrained by the 4.1K token context length (prompt + completion combined).

What are known failure modes of Yi 34B?
  • No function calling, image input, or embeddings support
  • Not safety-aligned: May generate unsafe or unmoderated outputs
  • No RAG-specific tuning or tool use capabilities
How many parameters does Yi 34B have?

Yi 34B has 34.4 billion parameters.

Is fine-tuning supported for Yi 34B?

Standard fine-tuning is not supported, but LoRA (parameter-efficient fine-tuning) is supported via Fireworks' Serverless LoRA framework.

How are tokens counted (prompt vs completion)?

Token billing is based on total input + output token usage.

What rate limits apply on the shared endpoint?
  • Serverless: Not supported
  • On-demand: Available with no rate limits on dedicated infrastructure

Metadata

State
Ready
Created on
3/3/2024
Kind
Base model
Provider
01.Ai
Hugging Face
01-ai/Yi-34B

Specification

Calibrated
No
Mixture-of-Experts
No
Parameters
34.3B

Supported Functionality

Fine-tuning
Not supported
Serverless
Not supported
Context Length
4.09k tokens
Function Calling
Not supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Not supported