Close Menu
    What's Hot

    New Linux Flaw, PAN-OS Exploit, AI-Powered Attacks, OAuth Phishing and More

    The Pentagon is pushing for AI on the battlefield. This top military leader is urging caution

    Ex-F.B.I. Officials Form New Group to Help Agents Grapple With Patel’s Changes

    Facebook X (Twitter) Instagram
    Trending
    • New Linux Flaw, PAN-OS Exploit, AI-Powered Attacks, OAuth Phishing and More
    • The Pentagon is pushing for AI on the battlefield. This top military leader is urging caution
    • Ex-F.B.I. Officials Form New Group to Help Agents Grapple With Patel’s Changes
    • Best Sleep Trackers of 2026: Oura, Whoop, and Eight Sleep
    • Transcript of an interview with Andrew Bailey
    • World Cup: Former USA defender Matt Besler says ‘expectations are higher than ever’ for the host nation – but can they handle the pressure? | Football News
    • Strava declares war on scrapers ahead of IPO
    • Are Texans Ready for Talarico’s Kind of Christianity?
    interluknewsinterluknews
    • Home
    • Business
      • Corporate News
      • Industry Insights
      • Startups & Entrepreneurship
      • Technology & Innovation
    • Economy
      • Economic Policy
      • Financial Analysis
      • Inflation & Interest Rates
      • Trade & Markets
    • Global
      • Conflicts & Security
      • Diplomacy
      • Global Trends
      • International Affairs
    • Lifestyle
      • Fashion
      • Food & Dining
      • Personal Development
      • Travel
    • Opinion
      • Columns
      • Editorials
      • Expert Opinions
      • Reader Voices
    • More
      • Politics
        • Elections
        • Government & Policy
        • International Relations
        • Political Analysis
      • Sports
        • Cricket
        • Football / Soccer
        • International Sports
        • Local Sports
      • Technology
        • Artificial Intelligence
        • Cybersecurity
        • Gadgets & Reviews
        • Tech News
      • South Africa News
    Facebook X (Twitter) Instagram
    interluknewsinterluknews
    Startups & Entrepreneurship

    How Sakana trained a 7B model to orchestrate GPT-5, Claude Sonnet 4 and Gemini 2.5 Pro

    adminBy adminMay 7, 2026No Comments7 Mins Read
    Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
    How Sakana trained a 7B model to orchestrate GPT-5, Claude Sonnet 4 and Gemini 2.5 Pro
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Every LangChain pipeline your team hardcodes starts breaking the moment the query distribution shifts — and it always shifts. That bottleneck is what Sakana AI set out to eliminate.

    Researchers at Sakana AI have introduced the “RL Conductor,” a small language model trained via reinforcement learning to automatically orchestrate a diverse pool of worker LLMs. Conductor dynamically analyzes inputs, distributes labor among workers, and coordinates among agents.

    This automated coordination achieves state-of-the-art results on difficult reasoning and coding benchmarks, outperforming individual frontier models like GPT-5 and Claude Sonnet 4 as well as expensive human-designed multi-agent pipelines. It achieves this performance at a fraction of the cost and with fewer API calls than competitors. RL Conductor is the backbone of Fugu, Sakana AI’s commercial multi-agent orchestration service.

    The limitations of manual agentic frameworks

    Large language models have strong latent capabilities. But tapping these capabilities to their fullest is a great challenge. Extracting this level of performance relies heavily on manually designed agentic workflows, which serve as critical components in commercial AI products. 

    However, these frameworks fall short because they are inherently rigid and constrained. In comments to VentureBeat, Yujin Tang, co-author of the paper, explained the exact breaking point of current systems: “While using frameworks with hard-coded pipelines like LangChain and Mixture-of-Agents can work well for specific use cases … In production, an inherent bottleneck arises when targeting domains with large user bases with very heterogeneous demands.” 

    Tang noted that achieving “real-world generalization in such heterogeneous applications inherently necessitates going beyond human-hardcoded designs.”

    Another bottleneck for building robust agentic systems is that no single model is optimal for all tasks. Different models are fine-tuned to specialize in distinct domains. One model might excel at scientific reasoning, while another is superior at code generation, mathematical logic, or high-level planning. 

    Because models have these varying characteristics and complementary skills, manually predicting and hard-coding the ideal combination of models for every query is practically impossible. An optimal agentic framework should be able to analyze a problem and delegate subtasks to the most suitable expert in the pool.

    Conducting an orchestra of agents

    The RL Conductor is designed to overcome the limitations of rigid, human-designed frameworks. As the name implies, it conducts an orchestra of agents by dividing challenging problems, delegating targeted subtasks, and designing communication topologies for a set of worker LLMs. 

    Instead of relying on fixed code or static routing, the Conductor orchestrates these models by generating a customized workflow. For each step in the workflow, the model generates a natural language instruction for a specific aspect of the task, assigns an agent to carry it out, and defines an “access list” that dictates which past subtasks and responses from other agents are included in that agent’s context.

    By defining everything in natural language, the Conductor builds flexible workflows tailored to each input. It can construct simple sequential chains, parallel tree structures, or even recursive loops depending on the problem’s demands. 

    Screenshot 2026-05-07 at 6.07.40 PM

    RL Conductor (source: Sakana AI)

    Importantly, the model learns these strategies not by human design but through reinforcement learning (RL) and reward maximization. During training, the model is given a task, a pool of workers, and a reward signal based on whether its answer and output format are correct.

    Through a simple trial-and-error RL algorithm, the model organically discovers which combinations of instructions and communication structures yield the highest reward. As a result, it automatically adopts advanced orchestration strategies such as targeted prompt engineering, iterative refinement, and meta-prompt optimization. 

    The model learns to dynamically adjust its strategies and leverage the distinct strengths of its worker agents without any human developer having to hard-code the process.

    Conductor in action

    To test RL Conductor in action, the researchers fine-tuned the 7-billion parameter Qwen2.5-7B using the framework. During training, the Conductor was tasked with designing agentic workflows of up to five steps. It was given access to a worker pool containing seven different models: three closed-source giants (Gemini 2.5 Pro, Claude-Sonnet-4, and GPT-5) and four open-source models (including DeepSeek-R1-Distill-Qwen-32B, Gemma3-27B, and Qwen3-32B).

    The team evaluated the Conductor across a variety of highly challenging benchmarks, comparing it against individual frontier models acting alone, self-reflection agents prompted iteratively to improve their own answers, and state-of-the-art multi-agent routing frameworks like MASRouter, Mixture-of-Agents (MoA), RouterDC, and Smoothie. The small 7B Conductor set new benchmarks across the board. It achieved an average score of 77.27% across all tasks, hitting 93.3% on the AIME25 math benchmark, 87.5% on GPQA-Diamond, and 83.93% on LiveCodeBench, according to the researchers.

    Remarkably, it achieved these marks while remaining highly efficient. While baseline models like MoA burned through 11,203 tokens per question, the Conductor used an average of just 1,820 tokens, taking an average of only three steps per workflow.

    rl-conductor-performance

    RL Conductor outperforms other baselines on key industry benchmarks (source: arXiv)

    A closer look at the experimental details shows exactly why the framework is so effective. The Conductor automatically learned to measure task difficulty. For simple factual recall questions, it often solved the problem in a single step or used a basic two-agent setup. However, for complex coding problems, it built extensive workflows involving up to four agents with dedicated planning, implementation, and verification phases.

    The Conductor also learned that frontier models have different strengths. To achieve record scores on coding benchmarks, the Conductor frequently assigned Gemini 2.5 Pro and Claude Sonnet 4 to act as high-level planners, and only brought in GPT-5 at the very end to write the final optimized code. In a particularly clever display of adaptability, the Conductor would sometimes completely abdicate its own role, handing the entire planning process over to Gemini 2.5 Pro and allowing it to dictate the subtasks for the rest of the pool.

    Beyond math and coding benchmarks, Sakana AI is already putting the underlying architecture to work in front-office utility. “We have been using our Fugu models based on the Conductor technology internally for various practical enterprise applications: software development, deep research, strategy development, and even visual tasks like slide generation,” Tang said.

    Bringing orchestration to the enterprise: Sakana Fugu

    While the 7B model described in the research paper was an exploratory blueprint and is not publicly available, Sakana AI has productized the Conductor framework into its flagship commercial AI product, Sakana Fugu. Now in its beta phase, Fugu serves as a multi-agent orchestration system accessible through a standard OpenAI-compatible API.

    Tang noted Fugu targets “the large market of industries where AI adoption has yet to bring large productivity gains due to the generalization limitations of current hard-coded pipelines, such as finance and defense.”

    For enterprise developers, this allows seamless integration into existing applications without the headache of managing multiple API keys or manually routing tasks across different vendors. Behind the API interface, Fugu automates complex collaboration topologies and role assignments across a pool of models. To support varying business needs, Sakana released two variants: Fugu Mini, built for low-latency operations, and Fugu Ultra, designed for maximum performance on demanding workloads.

    Addressing governance concerns around autonomous agents spinning up invisible workflows, Tang pointed out that the interpretability risks are functionally similar to the hidden reasoning traces of current top-tier closed APIs, and the system is managed with established guardrails to minimize hallucinations. 

    For enterprise architects weighing when to deploy RL-orchestration versus traditional routing, the decision often comes down to engineering resources. “We believe the absolute sweet spot comes whenever users and their teams feel they are spending a disproportionate amount of time guiding their underlying agents,” Tang said. However, he cautioned that the framework isn’t necessary for everything, noting that “it’s hard to beat the economic proposition of a local model running directly on the user’s machine for simple queries.”

    As the diversity of specialized open- and closed-source AI models continues to grow, static hardcoded pipelines will inevitably become obsolete. Looking ahead, this dynamic orchestration will likely extend beyond text and code environments. “There is indeed a large potential to fill this gap with cross-modal Conductor frameworks becoming the foundation for more autonomous, self-coordinating physical AI systems,” Tang said.

    Claude Gemini GPT5 model orchestrate Pro Sakana Sonnet trained
    Follow on Google News Follow on Flipboard
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
    Previous ArticleOpenAI launches new voice intelligence features in its API
    Next Article U.S. and Iran Trade Strikes as Tensions in Strait of Hormuz Rise
    admin
    • Website

    Related Posts

    Critical WP Maps Pro Flaw Actively Exploited to Create Admin Accounts

    June 1, 2026

    How You Can Use Tax “Stacking” to Pay Less in Taxes and Keep More Rental Income in Your Pocket

    June 1, 2026

    ‘This is fine’ artist KC Green reaches agreement with AI startup Artisan

    May 31, 2026
    Leave A Reply Cancel Reply

    Demo
    Latest Posts

    New Linux Flaw, PAN-OS Exploit, AI-Powered Attacks, OAuth Phishing and More

    The Pentagon is pushing for AI on the battlefield. This top military leader is urging caution

    Ex-F.B.I. Officials Form New Group to Help Agents Grapple With Patel’s Changes

    Best Sleep Trackers of 2026: Oura, Whoop, and Eight Sleep

    Latest Posts

    Subscribe to News

    Get the latest sports news from NewsSite about world, sports and politics.

    Advertisement
    Demo

    We are a digital news platform delivering timely, accurate, and insightful coverage of politics, global affairs, business, economy, sports, and more. Our mission is to keep readers informed with reliable news, clear analysis, and stories that truly matter.
    We're social. Connect with us:

    Facebook X (Twitter) Instagram Pinterest YouTube

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Type above and press Enter to search. Press Esc to cancel.

    Powered by
    ...
    ►
    Necessary cookies enable essential site features like secure log-ins and consent preference adjustments. They do not store personal data.
    None
    ►
    Functional cookies support features like content sharing on social media, collecting feedback, and enabling third-party tools.
    None
    ►
    Analytical cookies track visitor interactions, providing insights on metrics like visitor count, bounce rate, and traffic sources.
    None
    ►
    Advertisement cookies deliver personalized ads based on your previous visits and analyze the effectiveness of ad campaigns.
    None
    ►
    Unclassified cookies are cookies that we are in the process of classifying, together with the providers of individual cookies.
    None
    Powered by