Towards General Agentic Intelligence via Environment Scaling

September 20, 2025

3 min read

Table of Contents Hide

Why this problem matters
The big idea: environment scaling
How they build the environments
The two-stage training plan
Why this works better
What they measure
How this connects to Tongyi DeepResearch
A mental model for non-experts
What this means for real products
Practical tips if building such agents
Key takeaways
1. Further reading

In one sentence: The paper shows how to grow smarter AI agents by building many realistic, simulated “playgrounds” (environments) with diverse tools, then training the agent in two stages so it learns general tool-using skills first and specializes later, which dramatically improves its ability to call functions and complete tasks reliably.

Why this problem matters

Most AI assistants can chat, but struggle when they must use real tools — like booking APIs, spreadsheets, or calendars — across many different apps and websites. The authors argue that an agent’s tool skills depend on the variety of environments it practices in, just like a chef improves by cooking many cuisines, not just one. So the core question becomes: How do we scale up such environments so the agent can truly generalize?

The big idea: environment scaling

The paper proposes a two-part recipe. First, automatically create many realistic, fully simulated environments that behave like real apps — each with its own database and toolset — so agents can practice safely and repeatedly. Second, use a two-stage training plan that first teaches broad, general skills and then fine-tunes for specific industries or domains. This builds both flexibility and depth.

How they build the environments

Collect lots of APIs and cluster them into “domains,” like travel, retail, or healthcare, each becoming one environment with its own data layout.
Turn each tool (API) into executable code that actually runs against a database, not just a mock — so actions are verifiable.
Generate tasks by sampling tool sequences and arguments, initializing the database accordingly, and wrapping it in a user intent (like “Find a flight under $500 and email the itinerary”). This yields realistic, testable tasks.

In plain terms: it’s like setting up dozens of mini apps with real data and real buttons, then asking the agent to use them to solve everyday requests.

The two-stage training plan

Stage 1: General training across many domains to teach the “grammar” of tool use — how to plan, call tools, read results, and adapt.
Stage 2: Specialization within target verticals (e.g., finance or travel) to refine accuracy, vocabulary, and workflows for those areas.

This is similar to learning to drive in many cities (Stage 1), then practicing taxis in Paris specifically (Stage 2).

Why this works better

Practice diversity: The more varied the environments, the better the agent can generalize to unseen apps or new tool behaviors.
Full verifiability: Because tools operate on real databases, it’s easy to check if the agent did the right thing — no guesswork.
Safer iteration: Fully simulated worlds let researchers generate as much training experience as needed without hitting rate limits or privacy issues in live systems.

What they measure

The team evaluates on standard “agent” benchmarks focused on function calling and multi-step tasks — like tau-bench, tau2-Bench, and ACEBench — and shows significant improvements in tool use quality after applying environment scaling with the two-stage training plan. In short: the agent gets better at using tools across domains.

How this connects to Tongyi DeepResearch

Tongyi DeepResearch is Alibaba’s open agentic model designed for long-horizon research, and it embraces this full-stack approach: synthetic data generation, continual pretraining on agent interactions, and on-policy reinforcement learning — paired with agent inference modes (like ReAct and a heavy, iterative research mode). The environment scaling paper explains the “why” and “how” of training the tool-using brain behind such agents.

A mental model for non-experts

Think of the agent as a smart intern learning to get stuff done across many apps.
The lab builds many realistic practice offices (environments), each with its own tools and files.
The intern first learns how to use any office and any tool (Stage 1), then spends time in a chosen field like accounting or marketing (Stage 2).
Because the offices are simulated but realistic, supervisors can check every step and give exact feedback, so the intern learns fast.

What this means for real products

Better reliability: Agents become less brittle when a tool behaves slightly differently.
Faster onboarding: Adapting agents to new industries becomes a targeted second-stage step instead of starting from scratch.
Safer deployment: Verifiable, simulated training reduces risky trial-and-error in production systems.

Practical tips if building such agents

Start by defining tool schemas and making them executable against a test database so actions are checkable.
Generate tasks by composing tool sequences from a domain graph to ensure variety.
Train broadly first, then narrow to domain specialties; keep evaluating with realistic multi-step tasks.

Key takeaways

General tool-using intelligence comes from rich practice across many environments, not just more model parameters.
Fully simulated, verifiable environments are the most scalable way to provide that practice.
A two-stage training approach — general then specialized — yields robust, transferable skills that hold up on real agent benchmarks.

MiniMax-M2 : Best model for Coding and Agentic

KaniTTS : The fastest TTS model for Conversational AI is here

Hunyuan Mirror: Tencent’s All-in-One 3D AI Reconstruction Model

MightyCursor : AI Dictation, Read & Write for your PC

Featured Posts

MiniMax-M2 : Best model for Coding and Agentic

KaniTTS : The fastest TTS model for Conversational AI is here

Hunyuan Mirror: Tencent’s All-in-One 3D AI Reconstruction Model

MightyCursor : AI Dictation, Read & Write for your PC

Let`s Get Social

Towards General Agentic Intelligence via Environment Scaling

Table of Contents Hide

Why this problem matters

The big idea: environment scaling

How they build the environments

The two-stage training plan

Why this works better

What they measure

How this connects to Tongyi DeepResearch

A mental model for non-experts

What this means for real products

Practical tips if building such agents

Key takeaways

Further reading

SongBloom : AI model to Generate Songs, Free Suno

Wan Animate : AI can now do CGI for free

MiniMax-M2 : Best model for Coding and Agentic

KaniTTS : The fastest TTS model for Conversational AI is here

Hunyuan Mirror: Tencent’s All-in-One 3D AI Reconstruction Model

MightyCursor : AI Dictation, Read & Write for your PC

OpenAI Atlas vs Google Chrome : The best Broswer for you?

MiniMax-M2 : Best model for Coding and Agentic

KaniTTS : The fastest TTS model for Conversational AI is here

Hunyuan Mirror: Tencent’s All-in-One 3D AI Reconstruction Model

MightyCursor : AI Dictation, Read & Write for your PC

OpenAI Atlas vs Google Chrome : The best Broswer for you?

Featured Posts

Let`s Get Social

Towards General Agentic Intelligence via Environment Scaling

Table of Contents Hide

Why this problem matters

The big idea: environment scaling

How they build the environments

The two-stage training plan

Why this works better

What they measure

How this connects to Tongyi DeepResearch

A mental model for non-experts

What this means for real products

Practical tips if building such agents

Key takeaways

Further reading

Share this article

SongBloom : AI model to Generate Songs, Free Suno

Wan Animate : AI can now do CGI for free

Read next