Code Generation with Large Language Models - Fireworks AI Take

Code Generation with Large Language Models - Fireworks AI Take

By Aravind Putrevu|5/8/2024

“Programming is the art of telling another human being what one wants the computer to do. We should continually strive to transform every art into a science: in the process, we advance the art” - Donald Knuth in Art of Programming.

Programming is the art of communicating our needs expressed through logic so that the computer understands. Over the years, many humans have become skilled at programming or coding machines. This has led to innovations across various arenas and form factors we never imagined.

Today, we are heading into a leap forward with Large Language Models and Generative AI leading onto generating code through simple spoken language prompts.

In this blog, we explain how Code Generation through Large Language Models (LLMs) is playing an important role in enabling developers to build new age experiences and improving the developer productivity.

What is Code Generation?

With the rise of Large Language Models (LLMs), generating text and extracting information has become easy. Additionally, LLMs can generate usable code right out of the box.

Imagine you are asked to write a program in Python to print a star pyramid. You would start thinking about the logic and more specifics, like the height of the pyramid, etc. Today, you can just prompt an LLM like Mixtral.


Why use LLMs for code generation?

Code is sacrosanct for us programmers. So generating it out from a mere prompt might not be trustworthy to many. However, LLMs have shown promising results in code generation.

Some compelling reasons include:

  • Efficiency and speed
  • Reducing barriers for non-experts
  • Overall improvement in code quality,
  • Porting or translating code from one programming language to another etc.

Code generation can significantly reduce development time and effort, enabling developers to focus on more critical aspects of their project.

How do LLMs generate code?

Interestingly, code generation isn’t so different from text generation use-case of an LLM. Both fall under the larger umbrella of language generation tasks by models like Llama 3 8B.

Nonetheless the purpose, syntax and semantics, evaluation metrics for the code differ and present a unique challenge.

Like any other LLM, a specific code generation model is pre-trained on diverse data sources. This pre-training allows the models to learn the structure of various programming languages, by recognizing patterns, syntax and semantics.

Some models like Code-Llama, are pre-trained and fine-tuned on code-specific datasets. This further helps the model better understand coding patterns, syntax, and best practices specific to programming.

Integrating LLMs into development workflow

Developers interface with code at different levels. Some use a specific IDE as their daily driver to code, and some a command-line terminal for administrating. We believe that every developer workflow will be impacted by the LLMs in a special way.

We can broadly categorize these LLM-enabled experiences into two categories:

AI-assisted coding (copilots)

AI-assisted coding, often referred to as "Copilots," seamlessly integrates with the developer's existing tools, such as Integrated Development Environments (IDEs) or code editors. These copilots provide real-time suggestions, completions, and feedback as the developer writes code. Examples include Cursor and Sourcegraph's Cody.

For copilots, low latency and consistency are crucial. Even a slight delay or inconsistency in suggestions can disrupt the developer's flow and negatively impact their experience. To ensure a smooth integration, copilots must deliver suggestions within milliseconds and maintain this performance consistently, avoiding jitter or sudden spikes in response time.

Moreover, the quality and relevance of the suggestions are paramount. Copilots should leverage powerful models to understand the context and provide accurate, contextually relevant suggestions.

Standalone code generators

Standalone code generators are LLM-powered tools that generate code based on high-level prompts or requirements. These tools operate independently of the developer's primary coding environment and are often used for rapid prototyping, code scaffolding, or generating boilerplate code. For ex, a developer platform generating code for an API in a specific language.

While low latency is still important for standalone code generators, it is less critical compared to copilots. Developers using these tools are more tolerant of slightly longer response times, as the interaction is not as tightly integrated into their coding flow.

However, the power and versatility of the underlying model are crucial for standalone code generators. These tools should be capable of handling complex prompts, understanding domain-specific requirements, and generating high-quality, functional code. The model's ability to capture nuances and generate code that adheres to best practices and coding conventions is essential.

Use Cases

There are many use cases in code generation, but a couple of them dominate more than others.

Code Completion and Writing Suggestions

One of the early use-cases for LLMs in text-generation is unblocking users in their task, Code completion and writing suggestions help the developers choose from various versions of code.


Debugging and resolving issues

LLMs are also making an impact in how developers identify, diagnose, and resolve issues in their codebase. LLMs learn to recognize patterns that often lead to bugs, offering preemptive alerts before the code is run or deployed, by the virtue of training on vast amounts of code.


What are some popular open-source LLMs for Code Generation?

As the demand for code generation through Large Language Models (LLMs) continues to rise, developers are increasingly turning to open-source LLMs. Here are some popular open-source LLMs for code generation (some of them readily available on our serverless inference API).


OpenCodeInterpreter, a family of open-source code systems designed for generating, executing, and iteratively refining code. Supported by Code-Feedback, a dataset featuring 68K multi-turn interactions, OpenCodeInterpreter integrates execution and human feedback for dynamic code refinement.


DeepSeekCoder comprises a series of code language models (1.3B, 5.7B, 6.7B and 33B) trained from scratch on both 87% code and 13% natural language in English and Chinese, with each model pre-trained on 2T tokens and generates code for more than 80 programming languages.



Starcoder and StarCoderBase are large language models with 15.5 billion parameters, capable of handling 8,000 tokens of context and performing infilling tasks. They utilize multi-query attention for efficient large-batch inference. StarCoderBase was trained on 1 trillion tokens from The Stack, a curated collection of permissively licensed GitHub repositories, and then fine-tuned on 35 billion Python tokens to create StarCoder, a specialized model for Python programming.


Phind-codellama is model fine-tuned on CodeLlama-34B and CodeLlama-34B-Python models on a proprietary dataset of approximately 80k high-quality programming problems and solutions, resulting in impressive performance on the HumanEval benchmark.


There exist several challenges in code generation.

Millisecond response

In interactive coding environments or tools that provide real-time feedback, such as integrated development environments (IDEs), a delay of more than a few milliseconds can disrupt the flow of thought and significantly degrade the user experience.

Also, these environments are accessed by a large number of users simultaneously, the ability to provide responses in milliseconds ensures that the service remains responsive and reliable. This is particularly important for cloud-based development tools and services, which must scale to meet peak demands without compromising on speed.

Total Cost of Offering

Today, builders build through closed model providers but as they hit scale, they would want to customize their offering to accommodate diverse use-cases. Also, Fine-tune models regularly. The total cost of offering with respect to code-generation becomes expensive as code-generation is inherently “token heavy”.

You need to manage serving faster, serving customized use cases through fine-tuned models, while maintaining the total cost of offering reasonable.

Long prompts

Highly contextualized use cases like debugging a code base, necessitating long prompts for information distillation. Because the devs would be adding a large number of lines or even a file as a context to the prompt.

In complex scenarios where the task at hand involves multiple layers of logic or requires adherence to specific org guidelines, long prompts can ensure that all necessary details are included. This precision helps in generating outputs that meet specific criteria or standards, reducing the need for extensive revisions.

Model customizability

Model customizability is serving several fine-tuned models trained on customized data for different code-generation or specific programming language use cases. This requires both the newer fine-tuned versions deliver on speed, efficiency while staying reliable throughout to their users.

Why Fireworks

Fireworks AI is an enterprise scale LLM inference engine. Today, several AI-enabled developer experiences built on the Fireworks Inference platform are serving millions of developers.

Fireworks lightning fast serving stack enables enterprises to build mission critical Generative AI Applications that are super low latency. With methods like prompt caching, speculative API, we guarantee high throughput performance with low total cost of offering (TCO) in addition to bringing best of the open-source LLMs on the same day of the launch.

If you have more questions, join our community and tag a fireworks team member or drop a note to discuss building with LLMs from prototype to production.