AI Agent by Google
So finally Google has also jumped on the Multi AI Agent bandwagon and has released a rather important framework to fuel up Research & Development called “AI Co-Scientist”, which is specialized for R&D tasks with the help of a team of AI agents, powered by Gemini2.0
https://medium.com/media/c1c71d7bf4a19d5f60e6b457380066de/href
Subscribe to datasciencepocket on Gumroad
What is Google’s AI Co-Scientist?

The AI Co-Scientist is a multi-agent AI system built on Gemini 2.0, designed to assist scientists in generating novel hypotheses, research proposals, and experimental protocols. It acts as a virtual collaborator, accelerating scientific discovery by leveraging advanced AI capabilities. Below is an explanation of how it works and its special features:
How AI Co-Scientist Works
Multi-Agent Architecture

The system is composed of specialized agents that work together to generate, evaluate, and refine hypotheses. These agents include:
Generation Agent: Creates initial hypotheses and research proposals.
Reflection Agent: Reviews and critiques hypotheses for correctness, novelty, and feasibility.
Ranking Agent: Compares and ranks hypotheses using a tournament-based system.
Proximity Agent: Identifies similarities between hypotheses to avoid redundancy.
Evolution Agent: Refines and improves existing hypotheses.
Meta-Review Agent: Synthesizes feedback from other agents and generates a comprehensive research overview.
Supervisor Agent: Manages the workflow, allocates resources, and coordinates the agents.
Workflow
- The Scientist starts by providing a Research goal to the AI co-scientist. This goal is then passed through a Configuration step to set up the initial framework for the research.
- The Supervisor agent takes over, assigning tasks to different Workers and keeping track of the Research overview with detailed hypotheses.
- The Context Memory stores information about the research context and previous findings, which the agents access to inform their processes. The Workers carry out the tasks assigned by the Supervisor agent, using the Research overview for guidance.
- The Scientist can provide Additional feedback at any point, which is fed back into the system to guide the AI and ensure the research aligns with their expectations. This feedback loop is crucial for steering the research in the right direction.
Scientific Reasoning and Hypothesis Generation
The system uses a generate, debate, and evolve approach inspired by the scientific method.
It employs self-play-based scientific debates and tournament-based ranking to iteratively improve hypotheses.
The system scales test-time compute to enhance reasoning and hypothesis quality over time.
Interaction with Scientists
Scientists provide a research goal in natural language, which the system parses into a research plan configuration.
Scientists can interact with the system through a natural language interface, providing feedback, suggesting ideas, and refining hypotheses.
The system generates research overviews and detailed hypotheses tailored to the scientist’s goals.
Tool Use
The AI Co-Scientist integrates external tools like web search and specialized AI models (e.g., AlphaFold) to enhance hypothesis quality and grounding.
It can access databases, literature, and other resources to validate and refine its outputs.
Self-Improvement
The system uses an Elo auto-evaluation metric to assess hypothesis quality.
It iteratively refines hypotheses through recursive self-critique and feedback loops, improving outputs over time.
What is the Elo Rating System?
- The Elo rating system is a method for calculating the relative skill levels of players in competitive games like chess.
- Players gain or lose points based on the outcome of matches against other players. If a higher-rated player wins, they gain fewer points, but if a lower-rated player wins, they gain more points.
How Elo Auto-Evaluation Works?
- Hypotheses are generated and ranked in a tournament system with pairwise comparisons.
- The Ranking Agent simulates debates to determine the better hypothesis based on criteria like novelty and correctness.
- Winners gain Elo points, losers lose points; lower-rated hypotheses beating higher-rated ones result in larger changes.
- Over time, Elo ratings evolve, prioritizing higher-rated hypotheses for refinement and validation.
Metrics and Performance

For the AI co-scientist, the novelty rating is 3.64 and the impact rating is 3.09. Gemini 2.0 Flash Thinking has a novelty rating of 3.55 and an impact rating of 2.82.
Gemini 2.0 Pro Experimental has a novelty rating of 3.27 and an impact rating of 3.00, and OpenAI o1 has a novelty rating of 3.09 and an impact rating of 3.09.
The AI co-scientist has the highest novelty and a decent impact compared to the others.
On the right graph, the AI co-scientist has the lowest average ranking of 2.36, indicating it is the most preferred by experts. Gemini 2.0 Flash Thinking has a ranking of 2.73, and both Gemini 2.0 Pro Experimental and OpenAI o1 have a ranking of 2.45.
The AI co-scientist is considered good because it consistently scores higher in both novelty and impact, and experts prefer its outputs more often. This suggests it is more reliable and produces more valuable insights compared to the other models.
Conclusion
The AI Co-Scientist represents a significant step toward AI-assisted scientific discovery. Its ability to generate novel hypotheses, collaborate with scientists, and validate findings through experiments demonstrates its potential to accelerate research across various fields. By combining advanced AI reasoning with human expertise, it offers a powerful tool for tackling complex scientific challenges.
Google AI Co-Scientist: Multi AI Agent system for Research Scientists was originally published in Data Science in your pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.