Light
Dark
Research

Building the #1 open source terminal-use agent using Letta

August 5, 2025

Our Results: 42.5% on Terminal-Bench (#1 open-source agent)

We built the #1 open-source agent for terminal use, achieving  42.5% overall score on Terminal-Bench ranking 4th overall and 2nd among agents using Claude 4 Sonnet. Our agent is implemented in under 200 lines of code using Letta’s stateful agents SDK.

Our results using Claude 4 Sonnet roughly match Claude Code using 4 Opus, a much larger and expensive model.  This result places our agent in an elite category on one of the most challenging benchmarks for AI agents - a benchmark where even the best proprietary models like Gemini 2.5 Pro, GPT 4.1, and o3 struggle to get above 30%.

This achievement demonstrates a critical point: agents with effective context management  can achieve significant gains in long-running tasks. Letta makes it easy to build specialized agents on top of, with minimal scaffolding and managed memory and state.

What is Terminal-Bench?

Terminal-Bench is a benchmark that evaluates AI agents on real-world command-line tasks consisting of more than 100 challenging tasks that test an agent's capabilities in terminal environments. What makes Terminal-Bench particularly valuable is its focus on real-world complexity, as it consists of tasks such as

  • Compiling code repositories and building Linux kernels from source
  • Training machine learning models
  • Setting up and configuring servers
  • Debugging system configurations

Each task is containerized in Docker environments resembling terminal tasks that engineers and scientists have to deal with every day. 

Building a terminal-use agent with Letta 

Letta provides a stateful agents API layer compatible with any model (OpenAI, Anthropic, etc.) Letta provides tools for managing the context window (or agent memory) over time, such as re-writing segments of the context window (referred to as memory blocks), compacting context, or storing and retrieving external memories. 

Our terminal-use agent uses Letta’s built in capabilities for context management and memory, specifically memory blocks. 

The terminal-usage agent has two memory blocks: 

  • A read-only “task description” block 
  • An read-write “todo list” block used for planning 

The agent is able to modify the todo list over time using the memory_replace and memory_insert tools provided by Letta. 

The agent is also given additional tools specifically for Terminal-Bench: 

send_keys to execute terminal commands, task_completed to signal task completion, and quit_process to interrupt the current running process. 

Visualizing the Letta Terminal-Bench agent's control flow inside the ADE

To solve a terminal task, we instantiate a Letta terminal-use agent and grant it access to the terminal environment. The agent observes the current terminal state, updates its internal todo list as necessary, and generates the next command to execute based on its planning. This cycle — observing the environment, updating memory, and executing actions — repeats iteratively until the agent calls the task_completed tool. Occasionally, when the context window reaches above a reasonable level (40k tokens), Letta performs recursive summarizations (i.e. compaction) of previous messages. The agent’s ability to manage its memory (the message history and memory blocks) allows it to avoid common pitfalls like derailment and distraction while solving long-running, complex tasks.

With Letta, agent developers can rapidly specialize agents for specific tasks, by focusing on building the right prompts, tools, and environment. Building the #1 open source terminal-use agent with Letta shows that general memory management in Letta provides effective building blocks for better and more performant agents beyond long-running chatbots.  

Learn more

Jul 7, 2025
Agent Memory: How to Build Agents that Learn and Remember

Traditional LLMs operate in a stateless paradigm—each interaction exists in isolation, with no knowledge carried forward from previous conversations. Agent memory solves this problem.

Jul 3, 2025
Anatomy of a Context Window: A Guide to Context Engineering

As AI agents become more sophisticated, understanding how to design and manage their context windows (via context engineering) has become crucial for developers.

Feb 13, 2025
RAG is not Agent Memory

Although RAG provides a way to connect LLMs and agents to more data than what can fit into context, traditional RAG is insufficient for building agent memory.

Nov 14, 2024
The AI agents stack

Understanding the AI agents stack landscape.

Nov 7, 2024
New course on Letta with DeepLearning.AI

DeepLearning.AI has released a new course on agent memory in collaboration with Letta.

Sep 23, 2024
Announcing Letta

We are excited to publicly announce Letta.

Sep 23, 2024
MemGPT is now part of Letta

The MemGPT open source project is now part of Letta.

Jul 24, 2025
Introducing Letta Filesystem

Today we're announcing Letta Filesystem, which provides an interface for agents to organize and reference content from documents like PDFs, transcripts, documentation, and more.

Apr 17, 2025
Announcing Letta Client SDKs for Python and TypeScript

We've releasing new client SDKs (support for TypeScript and Python) and upgraded developer documentation

Apr 2, 2025
Agent File

Introducing Agent File (.af): An open file format for serializing stateful agents with persistent memory and behavior.

Jan 15, 2025
Introducing the Agent Development Environment

Introducing the Letta Agent Development Environment (ADE): Agents as Context + Tools

Dec 13, 2024
Letta v0.6.4 release

Letta v0.6.4 adds Python 3.13 support and an official TypeScript SDK.

Nov 6, 2024
Letta v0.5.2 release

Letta v0.5.2 adds tool rules, which allows you to constrain the behavior of your Letta agents similar to graphs.

Oct 23, 2024
Letta v0.5.1 release

Letta v0.5.1 adds support for auto-loading entire external tool libraries into your Letta server.

Oct 14, 2024
Letta v0.5 release

Letta v0.5 adds dynamic model (LLM) listings across multiple providers.

Oct 3, 2024
Letta v0.4.1 release

Letta v0.4.1 adds support for Composio, LangChain, and CrewAI tools.

May 29, 2025
Letta Leaderboard: Benchmarking LLMs on Agentic Memory

We're excited to announce the Letta Leaderboard, a comprehensive benchmark suite that evaluates how effectively LLMs manage agentic memory.

May 14, 2025
Memory Blocks: The Key to Agentic Context Management

Memory blocks offer an elegant abstraction for context window management. By structuring the context into discrete, functional units, we can give LLM agents more consistent, usable memory.

Apr 21, 2025
Sleep-time Compute

Sleep-time compute is a new way to scale AI capabilities: letting models "think" during downtime. Instead of sitting idle between tasks, AI agents can now use their "sleep" time to process information and form new connections by rewriting their memory state.

Feb 6, 2025
Stateful Agents: The Missing Link in LLM Intelligence

Introducing “stateful agents”: AI systems that maintain persistent memory and actually learn during deployment, not just during training.