MiroThinker Technical Reports and Public Resources Compilation
This comprehensive compilation organizes the technical reports and publicly available resources for MiroThinker, an advanced open-source research agent system. Currently, MiroMind has released three major technical reports covering MiroThinker 1.0, MiroThinker 1.7, and MiroFlow.
Official Resources
Project Websites
| Resource | Link | Description |
|---|---|---|
| Project Homepage | mirothinker.io | Official introduction, technical features, model version comparisons |
| Web Demo | dr.miromind.ai | Interactive web-based demo for direct testing |
| Company Page | miromind.ai | MiroMind team introduction and project ecosystem |
Open Source Code and Model Resources
GitHub Repositories
- Main Repository: MiroMindAI/MiroThinker
- MiroFlow: MiroMindAI/MiroFlow - Understood as a Deep Research agent framework, possibly part of MiroThinker's harness system
Hugging Face Models and Datasets
| Model/Dataset Name | Parameters | Context | Tool Calls | Link |
|---|---|---|---|---|
| MiroThinker-1.7-mini | 30B | 256K | 300 | HF Link |
| MiroThinker-1.7 | 235B | 256K | 300 | HF Link |
| MiroThinker-v1.5-30B | 30B | 256K | 400 | HF Link |
| MiroThinker-v1.5-235B | 235B | 256K | 400 | HF Link |
| MiroThinker-v1.0 (8B/30B/72B) | Multiple | 256K | 600 | HF Collection |
| MiroVerse-v0.1 (Dataset) | 147K+ trajectories | - | - | HF Link |
Core Project Ecosystem
The MiroMind Open Deep Research (ODR) ecosystem consists of four interconnected components:
MiroMind ODR (Open Deep Research)
├── MiroThinker → Model (Tool-augmented reasoning LLM)
├── MiroFlow → Agent Framework (Reproducible multi-agent orchestration)
├── MiroVerse → Dataset (147K+ research trajectory samples)
└── MiroTrain → Training Infrastructure (RL and long-context training support)Technical Innovations and Algorithm Overview
Core Innovation: Interactive Scaling
MiroThinker introduces Interactive Scaling as the "third dimension" of model performance, standing alongside model scale and context length as fundamental performance axes. This represents a paradigm shift in how research agent capabilities are measured and optimized.
Training Methodology
The training pipeline employs a sophisticated three-stage optimization approach:
- Mid-training Phase: Reinforces planning and tool interaction capabilities
- SFT (Supervised Fine-Tuning): Establishes base competencies
- DPO (Direct Preference Optimization): Aligns model outputs with human preferences
- RL (Reinforcement Learning): Further optimizes decision-making
A critical innovation is the time-sensitive sandbox training approach, which prevents "future information leakage" during the training process—ensuring the model learns to reason through problems sequentially rather than cheating by accessing information it shouldn't have at each reasoning step.
Reasoning Mechanism
MiroThinker supports a complete hypothesis-driven research loop:
Hypothesis → Search → Verify → Revise
This closed-loop reasoning is supported by dual validation mechanisms:
- Local Validation: Verifies single-step logical consistency
- Global Validation: Ensures overall coherence and consistency across the entire reasoning chain
The system supports up to 600 tool calls per task, enabling extremely thorough and comprehensive research processes.
Tool Integration
The framework integrates multiple external tools:
- Web Search: Serper API integration
- Web Scraping: Jina AI for content extraction
- Code Execution: E2B sandboxed execution environment
- Document Parsing: Multi-format document processing
- Multimodal Processing: Image and video analysis capabilities
Official Documentation
Documentation Resources
| Document Type | Location | Content |
|---|---|---|
| README | GitHub/README.md | Quick start, configuration, benchmark results |
| Tool Documentation | libs/miroflow-tools/README.md | MCP tool configuration, API key instructions |
| Deployment Guide | GitHub Wiki / docs/ directory | SGLang/vLLM deployment, quantization, Docker support |
Technical Reports
| Paper Title | arXiv ID | Release Date | Core Contribution |
|---|---|---|---|
| MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling | 2511.11793 | November 2025 | Introduces Interactive Scaling, v1.0 benchmark results |
| MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification | 2603.15726 | March 2026 | Introduces verification mechanisms, 1.7 and H1 version technical details |
| MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework | 2602.22808 | February 2026 | Agent framework design, high concurrency and reproducible evaluation support |
Official Blog
The official blog at miromind.ai/blog provides updates, though the technical depth varies across posts.
Chinese Community Third-Party Analyses
Several Chinese technology media outlets have published human-written analyses of MiroThinker:
Significance and Impact
MiroThinker represents a significant advancement in open-source research agents. By treating interactive tool usage as a first-class scaling dimension alongside model size and context length, the MiroMind team has demonstrated that carefully orchestrated tool interaction can dramatically enhance research capabilities without requiring proportional increases in model parameters.
The release of the MiroVerse dataset (147K+ research trajectories) provides the community with valuable training data, potentially accelerating further research in this domain. The modular architecture—separating the core model (MiroThinker), orchestration framework (MiroFlow), training infrastructure (MiroTrain), and dataset (MiroVerse)—enables researchers to innovate on individual components while maintaining compatibility with the broader ecosystem.