Build Multi-Agent Systems That Scale
Focus on application logic. Orla optimizes cost and latency. Up to 3.45x faster and 41% cheaper on real-world agentic workloads.
You define
Your Workflows
Your Models
Orla's Runtime
-
Lower inference cost
-
Faster completion time
-
Fine-grained access control
LLM Inference Engines
Why Use Orla For Your Agentic Applications
-
Cost-aware scheduling
Route cheap stages to small models and expensive stages to large ones. Orla co-optimizes scheduling, KV cache, and model assignment across your entire workflow to minimize cost without sacrificing quality.
-
Backend-agnostic
Route stages to SGLang, vLLM, Ollama, or cloud APIs from a single workflow definition. Mix and match backends without rewriting your pipeline.
-
Works with your stack
Drop Orla into existing LangGraph graphs or use the native SDK. No framework lock-in. Adopt incrementally.
-
Fine-grained access control
Control which teams can use which models, tools, and data. Sensitivity labels propagate across stages so PII never reaches unauthorized backends. Policy management is decoupled from agent code.
Works with
Quick look
A Workflow in 20 Lines
from langgraph.graph import END, StateGraph
from pyorla import ChatOrla, OrlaClient, Stage, new_sglang_backend
client = OrlaClient("http://localhost:8081")
# Register backends -- Orla optimizes cost and latency across them
light = new_sglang_backend("Qwen/Qwen3-4B", "http://sglang:30000/v1")
heavy = new_sglang_backend("Qwen/Qwen3-32B", "http://sglang:30001/v1")
client.register_backend(light)
client.register_backend(heavy)
# Define stages with different cost profiles
classify = Stage("classify", light) # fast + cheap
classify.set_max_tokens(512)
classify_llm = ChatOrla(stage=classify)
reply = Stage("reply", heavy) # thorough + accurate
reply.set_max_tokens(1024)
# Wire it up as a LangGraph StateGraph
graph = StateGraph(WorkflowState)
graph.add_node("classify", lambda s: classify_node(s, classify_llm=classify_llm))
graph.add_node("reply", lambda s: reply_node(s, reply_stage=reply))
graph.set_entry_point("classify")
graph.add_edge("classify", "reply")
graph.add_edge("reply", END)
app = graph.compile()
result = app.invoke({"ticket": "My order hasn't arrived..."})
Under the Hood
You define the workflow. Orla figures out which models to use, how to schedule them across backends, and how to share inference state between steps. The result: lower cost and faster completion with no changes to your application logic.
Three core components make this work:
-
Stage Mapper
Routes each stage to the right model and backend, balancing cost and quality across heterogeneous infrastructure.
-
Workflow Orchestrator
Executes and schedules stages according to your workflow graph, enforcing access control policies on every request.
-
Memory Manager
Coordinates KV cache and shared inference state across stages.
Team
-
Hayder Tirmazi
Developer and Maintainer
-
Rana Shahout
Primary Researcher
-
Michael Mitzenmacher
Principal Investigator
-
Minlan Yu
Principal Investigator
Portable deployments
Build Your Agents Once, Ship Them Anywhere
Write agentic workflows once. Test and run them consistently across your stack.
Mix private infrastructure and cloud models in the same graph when you need both.
-
Local machine
Develop and debug with the same runtime you ship to production.
-
CI
Gate merges with the same agent tests and workflows you run locally.
-
Private infrastructure
Keep models and data on your network; call hosted APIs from the same graph where policy allows.
-
Cloud models
Use hosted APIs alone or beside private backends. One workflow, mixed stages, tuned for cost, latency, and compliance.
Cite this work
If you use Orla in your research, please cite our paper.
@misc{shahout2026orlalibraryservingllmbased,
title={Orla: A Library for Serving LLM-Based Multi-Agent Systems},
author={Rana Shahout and Hayder Tirmazi and Minlan Yu and Michael Mitzenmacher},
year={2026},
eprint={2603.13605},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2603.13605},
}
Help Orla grow
A GitHub star helps other developers discover Orla and keeps the project going.
Star on GitHub
Orla is a project of