Build Multi-Agent Systems That Scale

Focus on application logic. Orla optimizes cost and latency. Up to 3.45x faster and 41% cheaper on real-world agentic workloads.

Get Started Read the Paper
pip install pyorla

You define

LangGraph Your Workflows
Qwen Claude Your Models
Your Providers

Orla's Runtime

  • Lower inference cost
  • Faster completion time
  • Fine-grained access control

LLM Inference Engines

SGLang
vLLM
Ollama

Why Use Orla For Your Agentic Applications

  • Cost-aware scheduling

    Route cheap stages to small models and expensive stages to large ones. Orla co-optimizes scheduling, KV cache, and model assignment across your entire workflow to minimize cost without sacrificing quality.

  • Backend-agnostic

    Route stages to SGLang, vLLM, Ollama, or cloud APIs from a single workflow definition. Mix and match backends without rewriting your pipeline.

  • Works with your stack

    Drop Orla into existing LangGraph graphs or use the native SDK. No framework lock-in. Adopt incrementally.

  • Fine-grained access control

    Control which teams can use which models, tools, and data. Sensitivity labels propagate across stages so PII never reaches unauthorized backends. Policy management is decoupled from agent code.

Works with

  • LangGraph
  • SGLang
  • vLLM
  • Ollama

Quick look

A Workflow in 20 Lines

from langgraph.graph import END, StateGraph
from pyorla import ChatOrla, OrlaClient, Stage, new_sglang_backend

client = OrlaClient("http://localhost:8081")

# Register backends -- Orla optimizes cost and latency across them
light = new_sglang_backend("Qwen/Qwen3-4B",  "http://sglang:30000/v1")
heavy = new_sglang_backend("Qwen/Qwen3-32B", "http://sglang:30001/v1")
client.register_backend(light)
client.register_backend(heavy)

# Define stages with different cost profiles
classify = Stage("classify", light)   # fast + cheap
classify.set_max_tokens(512)
classify_llm = ChatOrla(stage=classify)

reply = Stage("reply", heavy)       # thorough + accurate
reply.set_max_tokens(1024)

# Wire it up as a LangGraph StateGraph
graph = StateGraph(WorkflowState)
graph.add_node("classify", lambda s: classify_node(s, classify_llm=classify_llm))
graph.add_node("reply",    lambda s: reply_node(s, reply_stage=reply))
graph.set_entry_point("classify")
graph.add_edge("classify", "reply")
graph.add_edge("reply", END)

app = graph.compile()
result = app.invoke({"ticket": "My order hasn't arrived..."})

Under the Hood

You define the workflow. Orla figures out which models to use, how to schedule them across backends, and how to share inference state between steps. The result: lower cost and faster completion with no changes to your application logic.

Three core components make this work:

  • Stage Mapper

    Routes each stage to the right model and backend, balancing cost and quality across heterogeneous infrastructure.

  • Workflow Orchestrator

    Executes and schedules stages according to your workflow graph, enforcing access control policies on every request.

  • Memory Manager

    Coordinates KV cache and shared inference state across stages.

Team

Portable deployments

Build Your Agents Once, Ship Them Anywhere

Write agentic workflows once. Test and run them consistently across your stack.

Mix private infrastructure and cloud models in the same graph when you need both.

  • Local machine

    Develop and debug with the same runtime you ship to production.

  • CI

    Gate merges with the same agent tests and workflows you run locally.

  • Private infrastructure

    Keep models and data on your network; call hosted APIs from the same graph where policy allows.

  • Cloud models

    Use hosted APIs alone or beside private backends. One workflow, mixed stages, tuned for cost, latency, and compliance.

Cite this work

If you use Orla in your research, please cite our paper.

@misc{shahout2026orlalibraryservingllmbased,
      title={Orla: A Library for Serving LLM-Based Multi-Agent Systems},
      author={Rana Shahout and Hayder Tirmazi and Minlan Yu and Michael Mitzenmacher},
      year={2026},
      eprint={2603.13605},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2603.13605},
}

Help Orla grow

A GitHub star helps other developers discover Orla and keeps the project going.

Star on GitHub