Dedicated Deployments
Client-side performance optimization
Optimize your client code for maximum performance with dedicated deployments
When using a dedicated deployment, it is important to optimize the client-side
HTTP connection pooling for maximum performance. We recommend using our Python
SDK as it has good defaults for
connection pooling and utilizes
aiohttp for optimal performance
with Python’s asyncio
library. It also includes retry logic for handling 429
errors that Fireworks returns when the server is overloaded. We have run
benchmarks that demonstrate the performance benefits.
General optimization recommendations
Based on our benchmarks, we recommend the following:
- Use a client library optimized for high concurrency, such as aiohttp in Python or http.Agent in Node.js.
- Keep the
connection pool size
high (1000+). - Increase concurrency until performance stops improving or you observe too many
429
errors. - Use direct routing to avoid the global API load balancer and route requests directly to your deployment.
Code example: Optimal concurrent requests (Python)
Here’s how to implement optimal concurrent requests using asyncio
and the LLM
class:
main.py
This implementation:
- Uses
asyncio.Semaphore
to control concurrency to avoid overwhelming the server - Allows configuration of the maximum number of concurrent connections to the Fireworks API