Building a Claude Agent from Scratch: Planning and Coordination with TodoWrite

The evolution of AI agents has reached a critical juncture where simple "listen and execute" models are no longer sufficient for complex, multi-step tasks. This comprehensive guide explores the implementation of a sophisticated task management system that transforms basic agents into self-aware, goal-oriented assistants capable of maintaining context and tracking progress throughout extended operations.

Introduction: The Need for Self-Reflection and State Management

Traditional AI agents operate on a straightforward paradigm: receive instructions, execute tasks, and return results. While this approach works well for simple, isolated operations, it falls short when dealing with complex, multi-stage projects that require sustained attention and coordination.

The Core Problem: Previous generation agents simply "listened to instructions and worked," making them prone to losing sight of their original objectives or becoming disoriented in complex task landscapes. Without a mechanism for tracking progress and maintaining state, these agents often:

Forget initial goals midway through execution
Lose context when switching between subtasks
Fail to recognize completion criteria
Cannot provide meaningful progress updates
Struggle with error recovery and course correction

The Solution: Introducing the TodoManager—a sophisticated state management system that functions as both a "notebook" and "supervisor" for the agent. This system provides the agent with long-term memory, progress tracking capabilities, and self-reflection mechanisms essential for handling complex, real-world tasks.

Architecture Overview: From Stateless to Stateful Agents

The transformation from a basic tool-using agent to a stateful, self-managing agent represents a significant architectural evolution. Let's examine the key components and their interactions:

Core Components

TodoManager: Central state management system
TaskStatus Enum: Defines valid task states
TodoItem: Structured task representation
Tool Integration: Unified interface for state operations
Supervisor Logic: Automated progress monitoring and reminders

1. Task Status Enumeration: Defining the State Space

The foundation of our task management system begins with a clearly defined state space. We establish three primary task states that capture the essential lifecycle of any task:

public enum TaskStatus {
    PENDING("pending"),
    IN_PROGRESS("in_progress"),
    COMPLETED("completed");
    
    public final String label;
    
    TaskStatus(String label) { 
        this.label = label; 
    }
    
    public static TaskStatus fromLabel(String s) {
        for (TaskStatus ts : values()) {
            if (ts.label.equals(s)) return ts;
        }
        return PENDING;
    }
}

Understanding the State Transitions

Each state serves a specific purpose in the agent's workflow:

PENDING: Tasks that have been identified but not yet started. This state allows the agent to build a comprehensive task list before beginning execution, enabling better planning and prioritization.
IN_PROGRESS: Tasks currently being worked on. Critically, our system enforces a constraint that only one task can be in this state at any time, ensuring focused execution and preventing context switching overhead.
COMPLETED: Tasks that have been successfully finished. This provides a clear record of accomplishment and enables progress tracking.

State-Driven Behavior: The agent uses these states to determine next actions. For example:

When all tasks are PENDING → Planning phase
When one task is IN_PROGRESS → Execution phase
When tasks move to COMPLETED → Review and update phase

2. TodoItem: The Task Entity Structure

Each task in our system is represented as a structured entity with clearly defined attributes:

public static class TodoItem {
    public String id;           // Unique identifier
    public String text;         // Task description
    public TaskStatus status;   // Current state
    
    public TodoItem(String id, String text, String status) {
        this.id = id; 
        this.text = text; 
        this.status = TaskStatus.fromLabel(status);
    }
}

Design Rationale

This structure provides several key benefits:

Structured Representation: Offers the LLM clear, machine-readable context about each task
Unique Identification: Enables precise task tracking and updates
Status Tracking: Allows the agent to understand task progression
Serialization Friendly: Easy to convert to/from JSON for LLM communication

3. TodoManager: The Core State Management System

The TodoManager class serves as the central repository and controller for all task-related state. It implements critical business rules and provides a clean interface for state operations:

public static class TodoManager {
    private List<TodoItem> items = new ArrayList<>();
    
    public String update(List<Map<String, Object>> newItems) throws Exception {
        // Validation and business rule enforcement
        if (newItems.size() > 20) {
            throw new Exception("Max 20 todos allowed");
        }
        
        List<TodoItem> validated = new ArrayList<>();
        int inProgressCount = 0;
        
        for (int i = 0; i < newItems.size(); i++) {
            Map<String, Object> item = newItems.get(i);
            String text = (String) item.getOrDefault("text", "");
            String statusStr = (String) item.getOrDefault("status", "pending");
            String id = String.valueOf(item.getOrDefault("id", String.valueOf(i + 1)));
            
            // Business Rule 2: Task text is required
            if (text.trim().isEmpty()) {
                throw new Exception("Item " + id + ": text required");
            }
            
            TaskStatus status = TaskStatus.fromLabel(statusStr.toLowerCase());
            if (status == TaskStatus.IN_PROGRESS) {
                inProgressCount++;
            }
            
            validated.add(new TodoItem(id, text.trim(), status.label));
        }
        
        // Business Rule 4: Only one task can be in_progress at a time
        if (inProgressCount > 1) {
            throw new Exception("Only one task can be in_progress at a time");
        }
        
        this.items = validated;  // Atomic update
        return render();         // Return visual representation
    }
    
    public String render() {
        if (items.isEmpty()) return "No todos.";
        
        StringBuilder sb = new StringBuilder();
        for (TodoItem item : items) {
            String marker = item.status == TaskStatus.PENDING ? "[ ]" :
                           item.status == TaskStatus.IN_PROGRESS ? "[>]" : "[x]";
            sb.append(String.format("%s #%s: %s%n", marker, item.id, item.text));
        }
        
        long done = items.stream()
                .filter(i -> i.status == TaskStatus.COMPLETED)
                .count();
        sb.append(String.format("%n(%d/%d completed)", done, items.size()));
        
        return sb.toString();
    }
}

Business Rules Enforcement

The TodoManager implements several critical business rules that ensure system integrity:

Rule 1: Task Quantity Limit (Max 20 todos)

Prevents abuse and excessive memory consumption
Encourages focused, prioritized task lists
Forces the agent to break large projects into manageable phases

Rule 2: Required Task Text

Ensures all tasks have meaningful descriptions
Prevents empty or placeholder tasks from cluttering the list
Maintains data quality for LLM context

Rule 3: In-Progress Tracking

Monitors how many tasks are actively being worked on
Enables the single-task constraint enforcement
Provides metrics for agent focus analysis

Rule 4: Single Active Task Constraint

Enforces focused execution (only one task in_progress at a time)
Prevents context switching overhead
Encourages sequential, completed task flow
Reduces cognitive load on the LLM

Atomic Update Pattern

The update method employs an atomic update pattern:

Validate all input items
Build a new validated list
Replace the entire items list in one operation
Return rendered representation

This approach ensures:

Consistency: No partial updates
Predictability: State transitions are clear
Debuggability: Easy to trace state changes

Visual Rendering for LLM Consumption

The render() method transforms internal state into a human and LLM-readable format:

[ ] #1: Analyze project structure
[>] #2: Implement core module
[ ] #3: Write unit tests
[x] #4: Set up CI/CD pipeline

(1/4 completed)

Visual Markers:

[ ] → Pending task (not started)
[>] → In-progress task (currently working)
[x] → Completed task (finished)

Progress Statistics: The completion ratio provides the LLM with immediate feedback on overall progress, enabling better decision-making about task prioritization and resource allocation.

4. Todo Tool Integration: Exposing State Management to the LLM

To enable the LLM to interact with the task management system, we expose the TodoManager through a unified tool interface:

// In tool enumeration
public enum ToolType {
    BASH("bash"), 
    READ_FILE("read_file"), 
    WRITE_FILE("write_file"),
    EDIT_FILE("edit_file"), 
    TODO("todo");  // New state management tool
    
    public final String name;
    ToolType(String name) { this.name = name; }
}

// Register Todo tool implementation
TOOL_HANDLERS.put(ToolType.TODO.name, args -> {
    @SuppressWarnings("unchecked")
    List<Map<String, Object>> items = (List<Map<String, Object>>) args.get("items");
    return TODO_MANAGER.update(items);
    // State update tool: enables LLM to manipulate task state
    // Accepts task list from LLM, updates internal state
});

Design Principles

State Operations as Tools: By abstracting state management as a tool call, we achieve:

Unified Interface: Consistent with other tool invocation patterns
Clear Boundaries: State changes happen through explicit, auditable operations
LLM Accessibility: The LLM can update state using familiar tool call syntax

Bidirectional Communication:

LLM → State: Through tool calls with task list updates
State → LLM: Through rendered output in tool results

5. Supervisor Logic: The Nag Reminder System

One of the most innovative aspects of this architecture is the built-in supervisor mechanism that prevents the LLM from forgetting to maintain task state:

// In agentLoop
int roundsSinceTodo = 0;  // Counter: tracks rounds without todo update
boolean usedTodo = false; // Flag: indicates if todo was used this round

// During tool execution
if (toolName.equals("todo")) {
    usedTodo = true;
}

// After each round: supervisor check
roundsSinceTodo = usedTodo ? 0 : roundsSinceTodo + 1;  // Reset or increment

if (roundsSinceTodo >= 3) {  // If no todo update for 3+ rounds
    Map<String, Object> nag = new HashMap<>();
    nag.put("type", "text");
    nag.put("text", "<reminder>Update your todos.</reminder>");
    toolResults.add(0, nag);  // Insert at front of results list
    
    System.out.println(">>> Supervisor reminder: Update todo list!");
    // Forced reminder: prevents LLM from forgetting state updates
}

Anti-Forgetfulness Mechanism

The Problem: LLMs, despite their impressive capabilities, can lose track of meta-tasks like state maintenance, especially when deeply engaged in complex problem-solving.

The Solution: A progressive reminder system that:

Tracks Usage: Monitors how many rounds have passed since the last todo update
Tolerates Short-Term Lapses: Allows up to 2 rounds without reminders (avoiding excessive nagging)
Intervenes at Threshold: Triggers a reminder after 3 rounds without state updates
Prioritizes Visibility: Inserts reminder at the front of results to ensure LLM sees it first

Structured Prompting

The reminder uses a special tag format (<reminder>...</reminder>) that:

Signals System Message: Helps the LLM distinguish system prompts from tool results
Maintains Context: Doesn't disrupt the normal tool result flow
Clear Instruction: Directly states the required action

Progressive Intervention Strategy

Rounds Without Todo	System Behavior
0-2	Silent monitoring
3+	Insert reminder message
Continuous neglect	Escalating reminders (future enhancement)

6. Architectural Evolution: Comparing Agent Generations

The progression from basic tool-using agents to stateful, self-managing agents represents a significant leap in capability:

Dimension	AgentWithTools	AgentWithTodo
State Management	Stateless	Stateful (TodoManager)
Progress Tracking	Not supported	Full task progress management
Long-Term Memory	Not supported	Task list memory
Supervision	None	Automated nag reminders
Task Management	Tool-level	Project-level
Self-Reflection	None	Built-in state awareness
Context Retention	Limited to conversation	Persistent task state

Key Improvements

From Reactive to Proactive:

Previous agents reacted to immediate instructions
New agents maintain awareness of overall project state

From Atomic to Compound Tasks:

Previous agents handled individual tool calls
New agents coordinate multi-step workflows

From Amnesiac to Memory-Equipped:

Previous agents lost context between turns
New agents retain task state across the entire session

7. Practical Implementation Patterns

Pattern 1: Task Decomposition

When receiving a complex request, the agent should first break it down into manageable tasks:

User: "Build a REST API for user management"

Agent creates todos:
[ ] #1: Design database schema
[ ] #2: Implement user model
[ ] #3: Create authentication endpoints
[ ] #4: Add CRUD operations
[ ] #5: Write integration tests
[>] #6: Set up project structure (in progress)

Pattern 2: Progressive Updates

As work progresses, the agent updates task states:

After completing project setup:
[x] #1: Design database schema
[x] #2: Implement user model
[>] #3: Create authentication endpoints (in progress)
[ ] #4: Add CRUD operations
[ ] #5: Write integration tests
[ ] #6: Set up project structure

(2/6 completed)

Pattern 3: Error Recovery

When encountering obstacles, the agent can adjust tasks:

After discovering authentication complexity:
[x] #1: Design database schema
[x] #2: Implement user model
[>] #3: Research OAuth2 best practices (adjusted)
[ ] #4: Create authentication endpoints
[ ] #5: Add CRUD operations
[ ] #6: Write integration tests
[ ] #7: Set up project structure

(2/7 completed)

8. Benefits and Trade-offs

Advantages

Improved Task Completion:

Clear visibility into remaining work
Motivation from progress tracking
Reduced likelihood of abandoned tasks

Better Context Management:

Persistent state across conversation turns
Reduced need for context repetition
Clearer handoff between agent and human

Enhanced Debugging:

Audit trail of task progression
Easy identification of where things went wrong
Clear completion criteria

Human Collaboration:

Humans can see agent's plan and progress
Easy to intervene and reprioritize
Transparent workflow

Considerations

Overhead:

Additional tokens for state representation
Extra tool calls for state updates
Slightly increased latency

Complexity:

More code to maintain
Additional failure modes to handle
Learning curve for effective use

Potential for Gaming:

Agent might update todos without real progress
Need for verification mechanisms
Balance between trust and validation

9. Future Enhancements

The current implementation provides a solid foundation for several potential enhancements:

Hierarchical Task Management

Support for subtasks and nested todos
Parent-child task relationships
Rollup progress reporting

Dependency Tracking

Task prerequisites and blockers
Automatic dependency resolution
Critical path analysis

Time Estimation

Estimated vs. actual time tracking
Velocity metrics
Predictive completion dates

Integration with External Systems

Sync with project management tools (Jira, Trello)
Git commit linking
Calendar integration

Advanced Supervision

Machine learning-based reminder timing
Context-aware intervention thresholds
Adaptive nagging strategies

Conclusion: The Path to Truly Autonomous Agents

The TodoWrite system represents a crucial step toward truly autonomous AI agents. By providing agents with the ability to:

Plan: Break down complex goals into manageable tasks
Track: Maintain awareness of progress and completion
Reflect: Self-monitor and adjust course as needed
Communicate: Clearly express state to human collaborators

...we move beyond simple tool-using assistants toward genuine collaborative partners capable of handling complex, multi-day projects with minimal supervision.

The key insight is that state management is not optional for serious agent applications. Just as human professionals rely on task lists, calendars, and project management tools, AI agents require equivalent mechanisms to maintain coherence and effectiveness across extended engagements.

As the field of AI agents continues to evolve, systems like TodoWrite will become standard infrastructure—the invisible scaffolding that enables agents to tackle increasingly ambitious challenges while maintaining reliability, transparency, and trustworthiness.

The journey from stateless to stateful agents is not just a technical improvement; it's a fundamental shift in how we conceive of AI assistance. We're no longer building tools that respond to commands; we're creating partners that share our goals, track our progress, and work alongside us toward meaningful outcomes.