Build Multi-Agent Systems That Scale

Focus on application logic. Orla optimizes cost and latency. Up to 3.45x faster and 41% cheaper on real-world agentic workloads.

Get Started Read the Paper

pip install pyorla

You define

Your Workflows

Your Models

Your Providers

Orla's Runtime

Lower inference cost
Faster completion time
Fine-grained access control

LLM Inference Engines

Cost-aware scheduling

Route cheap stages to small models and expensive stages to large ones. Orla co-optimizes scheduling, KV cache, and model assignment across your entire workflow to minimize cost without sacrificing quality.
Backend-agnostic

Route stages to SGLang, vLLM, Ollama, or cloud APIs from a single workflow definition. Mix and match backends without rewriting your pipeline.
Works with your stack

Drop Orla into existing LangGraph graphs or use the native SDK. No framework lock-in. Adopt incrementally.
Fine-grained access control

Control which teams can use which models, tools, and data. Sensitivity labels propagate across stages so PII never reaches unauthorized backends. Policy management is decoupled from agent code.

Works with

from langgraph.graph import END, StateGraph
from pyorla import ChatOrla, OrlaClient, Stage, new_sglang_backend

client = OrlaClient("http://localhost:8081")

# Register backends -- Orla optimizes cost and latency across them
light = new_sglang_backend("Qwen/Qwen3-4B",  "http://sglang:30000/v1")
heavy = new_sglang_backend("Qwen/Qwen3-32B", "http://sglang:30001/v1")
client.register_backend(light)
client.register_backend(heavy)

# Define stages with different cost profiles
classify = Stage("classify", light)   # fast + cheap
classify.set_max_tokens(512)
classify_llm = ChatOrla(stage=classify)

reply = Stage("reply", heavy)       # thorough + accurate
reply.set_max_tokens(1024)

# Wire it up as a LangGraph StateGraph
graph = StateGraph(WorkflowState)
graph.add_node("classify", lambda s: classify_node(s, classify_llm=classify_llm))
graph.add_node("reply",    lambda s: reply_node(s, reply_stage=reply))
graph.set_entry_point("classify")
graph.add_edge("classify", "reply")
graph.add_edge("reply", END)

app = graph.compile()
result = app.invoke({"ticket": "My order hasn't arrived..."})

Under the Hood

You define the workflow. Orla figures out which models to use, how to schedule them across backends, and how to share inference state between steps. The result: lower cost and faster completion with no changes to your application logic.

Three core components make this work:

Stage Mapper

Routes each stage to the right model and backend, balancing cost and quality across heterogeneous infrastructure.
Workflow Orchestrator

Executes and schedules stages according to your workflow graph, enforcing access control policies on every request.
Memory Manager

Coordinates KV cache and shared inference state across stages.

Team

Hayder Tirmazi
Developer and Maintainer
Rana Shahout
Primary Researcher
Michael Mitzenmacher
Principal Investigator
Minlan Yu
Principal Investigator

Portable deployments

Build Your Agents Once, Ship Them Anywhere

Write agentic workflows once. Test and run them consistently across your stack.

Mix private infrastructure and cloud models in the same graph when you need both.

Local machine

Develop and debug with the same runtime you ship to production.
CI

Gate merges with the same agent tests and workflows you run locally.
Private infrastructure

Keep models and data on your network; call hosted APIs from the same graph where policy allows.
Cloud models

Use hosted APIs alone or beside private backends. One workflow, mixed stages, tuned for cost, latency, and compliance.

Cite this work

If you use Orla in your research, please cite our paper.

@misc{shahout2026orlalibraryservingllmbased,
      title={Orla: A Library for Serving LLM-Based Multi-Agent Systems},
      author={Rana Shahout and Hayder Tirmazi and Minlan Yu and Michael Mitzenmacher},
      year={2026},
      eprint={2603.13605},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2603.13605},
}

Help Orla grow

A GitHub star helps other developers discover Orla and keeps the project going.

Star on GitHub

Build Multi-Agent Systems That Scale

Why Use Orla For Your Agentic Applications

A Workflow in 20 Lines

Under the Hood

Team

Build Your Agents Once, Ship Them Anywhere

Local machine

CI

Private infrastructure

Cloud models

Cite this work

Help Orla grow