Towards General Agentic Intelligence via Environment Scaling

Towards General Agentic Intelligence via Environment Scaling

In one sentence: The paper shows how to grow smarter AI agents by building many realistic, simulated “playgrounds” (environments) with diverse tools, then training the agent in two stages so it learns general tool-using skills first and specializes later, which dramatically improves its ability to call functions and complete tasks reliably.

Why this problem matters

Most AI assistants can chat, but struggle when they must use real tools — like booking APIs, spreadsheets, or calendars — across many different apps and websites. The authors argue that an agent’s tool skills depend on the variety of environments it practices in, just like a chef improves by cooking many cuisines, not just one. So the core question becomes: How do we scale up such environments so the agent can truly generalize?

The big idea: environment scaling

The paper proposes a two-part recipe. First, automatically create many realistic, fully simulated environments that behave like real apps — each with its own database and toolset — so agents can practice safely and repeatedly. Second, use a two-stage training plan that first teaches broad, general skills and then fine-tunes for specific industries or domains. This builds both flexibility and depth.

How they build the environments

  • Collect lots of APIs and cluster them into “domains,” like travel, retail, or healthcare, each becoming one environment with its own data layout.
  • Turn each tool (API) into executable code that actually runs against a database, not just a mock — so actions are verifiable.
  • Generate tasks by sampling tool sequences and arguments, initializing the database accordingly, and wrapping it in a user intent (like “Find a flight under $500 and email the itinerary”). This yields realistic, testable tasks.

In plain terms: it’s like setting up dozens of mini apps with real data and real buttons, then asking the agent to use them to solve everyday requests.

The two-stage training plan

  • Stage 1: General training across many domains to teach the “grammar” of tool use — how to plan, call tools, read results, and adapt.
  • Stage 2: Specialization within target verticals (e.g., finance or travel) to refine accuracy, vocabulary, and workflows for those areas.

This is similar to learning to drive in many cities (Stage 1), then practicing taxis in Paris specifically (Stage 2).

Why this works better

  • Practice diversity: The more varied the environments, the better the agent can generalize to unseen apps or new tool behaviors.
  • Full verifiability: Because tools operate on real databases, it’s easy to check if the agent did the right thing — no guesswork.
  • Safer iteration: Fully simulated worlds let researchers generate as much training experience as needed without hitting rate limits or privacy issues in live systems.

What they measure

The team evaluates on standard “agent” benchmarks focused on function calling and multi-step tasks — like tau-bench, tau2-Bench, and ACEBench — and shows significant improvements in tool use quality after applying environment scaling with the two-stage training plan. In short: the agent gets better at using tools across domains.

How this connects to Tongyi DeepResearch

Tongyi DeepResearch is Alibaba’s open agentic model designed for long-horizon research, and it embraces this full-stack approach: synthetic data generation, continual pretraining on agent interactions, and on-policy reinforcement learning — paired with agent inference modes (like ReAct and a heavy, iterative research mode). The environment scaling paper explains the “why” and “how” of training the tool-using brain behind such agents.

A mental model for non-experts

  • Think of the agent as a smart intern learning to get stuff done across many apps.
  • The lab builds many realistic practice offices (environments), each with its own tools and files.
  • The intern first learns how to use any office and any tool (Stage 1), then spends time in a chosen field like accounting or marketing (Stage 2).
  • Because the offices are simulated but realistic, supervisors can check every step and give exact feedback, so the intern learns fast.

What this means for real products

  • Better reliability: Agents become less brittle when a tool behaves slightly differently.
  • Faster onboarding: Adapting agents to new industries becomes a targeted second-stage step instead of starting from scratch.
  • Safer deployment: Verifiable, simulated training reduces risky trial-and-error in production systems.

Practical tips if building such agents

  • Start by defining tool schemas and making them executable against a test database so actions are checkable.
  • Generate tasks by composing tool sequences from a domain graph to ensure variety.
  • Train broadly first, then narrow to domain specialties; keep evaluating with realistic multi-step tasks.

Key takeaways

  • General tool-using intelligence comes from rich practice across many environments, not just more model parameters.
  • Fully simulated, verifiable environments are the most scalable way to provide that practice.
  • A two-stage training approach — general then specialized — yields robust, transferable skills that hold up on real agent benchmarks.

Further reading


Towards General Agentic Intelligence via Environment Scaling was originally published in Data Science in Your Pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.

Share this article
0
Share
Shareable URL
Prev Post

SongBloom : AI model to Generate Songs, Free Suno

Next Post

Wan Animate : AI can now do CGI for free

Read next
Subscribe to our newsletter
Get notified of the best deals on our Courses, Tools and Giveaways..