Close Menu
    What's Hot

    ‘Before, the land sustained us’: Who benefits from Guinea’s bauxite wealth? | Mining News

    Singapore tries to keep Asia’s oil flowing

    Tyler Fletcher: Manchester United midfielder replaces Billy Gilmour in Scotland World Cup squad | Football News

    Facebook X (Twitter) Instagram
    Trending
    • ‘Before, the land sustained us’: Who benefits from Guinea’s bauxite wealth? | Mining News
    • Singapore tries to keep Asia’s oil flowing
    • Tyler Fletcher: Manchester United midfielder replaces Billy Gilmour in Scotland World Cup squad | Football News
    • Skattebo’s backflip, Odell autographs headline Brian Burns Celebrity Softball Game
    • Dell is bringing back the XPS 13 as a MacBook Neo competitor — with a temporary discount to $599
    • World Cup: IFAB confirms new VAR powers, 10-second substitutions and tactical timeout ban in major rule changes | Football News
    • Cepeda, de la Espriella advance in Colombia’s presidential election | Elections News
    • Nebius: Ballooning Upside Potential (NASDAQ:NBIS)
    interluknewsinterluknews
    • Home
    • Business
      • Corporate News
      • Industry Insights
      • Startups & Entrepreneurship
      • Technology & Innovation
    • Economy
      • Economic Policy
      • Financial Analysis
      • Inflation & Interest Rates
      • Trade & Markets
    • Global
      • Conflicts & Security
      • Diplomacy
      • Global Trends
      • International Affairs
    • Lifestyle
      • Fashion
      • Food & Dining
      • Personal Development
      • Travel
    • Opinion
      • Columns
      • Editorials
      • Expert Opinions
      • Reader Voices
    • More
      • Politics
        • Elections
        • Government & Policy
        • International Relations
        • Political Analysis
      • Sports
        • Cricket
        • Football / Soccer
        • International Sports
        • Local Sports
      • Technology
        • Artificial Intelligence
        • Cybersecurity
        • Gadgets & Reviews
        • Tech News
      • South Africa News
    Facebook X (Twitter) Instagram
    interluknewsinterluknews
    Startups & Entrepreneurship

    Databricks tested a stronger model against its multi-step agent on hybrid queries. The stronger model still lost by 21%.

    adminBy adminApril 14, 2026No Comments6 Mins Read
    Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
    Databricks tested a stronger model against its multi-step agent on hybrid queries. The stronger model still lost by 21%.
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Data teams building AI agents keep running into the same failure mode. Questions that require joining structured data with unstructured content, sales figures alongside customer reviews or citation counts alongside academic papers, break single-turn RAG systems. 

    New research from Databricks puts a number on that failure gap. The company’s AI research team tested a multi-step agentic approach against state-of-the-art single-turn RAG baselines across nine enterprise knowledge tasks and reported gains of 20% or more on Stanford’s STaRK benchmark suite, along with consistent improvement across Databricks’ own KARLBench evaluation framework, according to the research. Databricks argues the performance gap between single-turn RAG and multi-step agents on hybrid data tasks is an architectural problem, not a model quality problem.

    The work builds on Databricks’ earlier instructed retriever research, which showed retrieval improvements on unstructured data using metadata-aware queries. This latest research adds structured data sources, relational tables and SQL warehouses, into the same reasoning loop, addressing the class of questions enterprises most commonly fail to answer with current agent architectures.

    “RAG works, but it doesn’t scale,” Michael Bendersky, research director at Databricks, told VentureBeat. “If you want to make your agent even better, and you want to understand why you have declining sales, now you have to help the agent see the tables and look at the sales data. Your RAG pipeline will become incompetent at that task.”

    Single-turn retrieval cannot encode structural constraints

    The core finding is that standard RAG systems fail when a query mixes a precise structured filter with an open-ended semantic search. 

    Consider a question like “Which of our products have had declining sales over the past three months, and what potentially related issues are brought up in customer reviews on various seller sites?” The sales data lives in a warehouse. The review sentiment lives in unstructured documents across seller sites. A single-turn RAG system cannot split that query, route each half to the right data source and combine the results.

    To confirm this is an architecture problem rather than a model quality problem, Databricks reran published STaRK baselines using a current state-of-the-art foundation model. The stronger model still lost to the multi-step agent by 21% on the academic domain and 38% on the biomedical domain, according to the research. 

    STaRK is a benchmark published by Stanford researchers covering three semi-structured retrieval domains: Amazon product data, the Microsoft Academic Graph and a biomedical knowledge base. 

    How the Supervisor Agent handles what RAG cannot

    Databricks built the Supervisor Agent as the production implementation of this research approach, and its architecture illustrates why the gains are consistent across task types. The approach includes three core steps:

    Parallel tool decomposition. Rather than issuing one broad query and hoping the results cover both structured and unstructured needs, the agent fires SQL and vector search calls simultaneously, then analyzes the combined results before deciding what to do next. That parallel step is what allows it to handle queries that cross data type boundaries without requiring the data to be normalized first.

    Self-correction. When an initial retrieval attempt hits a dead end, the agent detects the failure, reformulates the query and tries a different path. On a STaRK benchmark task that requires finding a paper by an author with exactly 115 prior publications on a specific topic, the agent first queries both SQL and vector search in parallel. When the two result sets show no overlap, it adapts and issues a SQL JOIN across both constraints, then calls the vector search system to verify the result before returning the answer.

    Declarative configuration.  The agent is not tuned to any specific dataset or task. Connecting it to a new data source means writing a plain-language description of what that source contains and what kinds of questions it should answer. No custom code is required.

    “The agent can do things like decomposing the question into a SQL query and a search query out of the box,” Bendersky said. “It can combine the results of SQL and RAG, reason about those results, make follow-up queries and then reason about whether the final answer was actually found.”

    It’s not just about hybrid retrieval

    The distinction Databricks draws isn’t about retrieval technique, it’s about architecture.

    “We almost don’t see it as a hybrid retrieval where you combine embeddings and search results, or embeddings and tables,” he said. “We see this more as an agent that has access to multiple tools.”

    The practical consequence of that framing is that adding a new data source means connecting it to the agent and writing a description of what it contains. The agent handles routing and orchestration without additional code. 

    Custom RAG pipelines require data to be converted into a format the retrieval system can read, typically text chunks with embeddings. SQL tables have to be flattened, JSON has to be normalized. Every new data source added to the pipeline means more conversion work. Databricks’ research argues that as enterprise data grows to include more source types, that burden makes custom pipelines increasingly impractical compared to an agent that queries each source in its native format.

    “Just bring the agent to the data,” Bendersky said. “You basically give the agent more sources, and it will learn to use them pretty well.”

    What this means for enterprises

    For data engineers evaluating whether to build custom RAG pipelines or adopt a declarative agent framework, the research offers a clear direction: if the task involves questions that span structured and unstructured data, building custom retrieval is the harder path. The research found that across all tested tasks, the only things that differed between deployments were instructions and tool descriptions. The agent handled the rest.

    The practical limits are real but manageable. The approach works well with five to ten data sources. Adding too many at once, without curating which sources are complementary rather than contradictory, makes the agent slower and less reliable. Bendersky recommends scaling incrementally and verifying results at each step rather than connecting all available data upfront.

    Data accuracy is a prerequisite. The agent can query across mismatched formats, JSON review feeds alongside SQL sales tables, without requiring normalization. It cannot fix source data that is factually wrong. Adding a plain-language description of each data source at ingestion time helps the agent route queries correctly from the start.

    The research positions this as an early step in a longer trajectory. As enterprise AI workloads mature, agents will be expected to reason across dozens of source types, including dashboards, code repositories and external data feeds. The research argues the declarative approach is what makes that scaling tractable, because adding a new source stays a configuration problem rather than an engineering one.

    “This is kind of like a ladder,” Bendersky said. “The agent will slowly get more and more information and then slowly improve overall.” 

    agent Databricks Hybrid lost model multistep Queries stronger tested
    Follow on Google News Follow on Flipboard
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
    Previous ArticleThe FCC Has a Fast Lane for Complaints About Trump’s Media Critics
    Next Article Quota bungle: Small-scale fishers battle to survive
    admin
    • Website

    Related Posts

    ‘This is fine’ artist KC Green reaches agreement with AI startup Artisan

    May 31, 2026

    Claude Mythos exposed a hard truth: Your enterprise patching process is way too slow

    May 31, 2026

    Wall Street is quietly betting on AI to beat inflation – GeekWire

    May 31, 2026
    Leave A Reply Cancel Reply

    Demo
    Latest Posts

    ‘Before, the land sustained us’: Who benefits from Guinea’s bauxite wealth? | Mining News

    Singapore tries to keep Asia’s oil flowing

    Tyler Fletcher: Manchester United midfielder replaces Billy Gilmour in Scotland World Cup squad | Football News

    Skattebo’s backflip, Odell autographs headline Brian Burns Celebrity Softball Game

    Latest Posts

    Subscribe to News

    Get the latest sports news from NewsSite about world, sports and politics.

    Advertisement
    Demo

    We are a digital news platform delivering timely, accurate, and insightful coverage of politics, global affairs, business, economy, sports, and more. Our mission is to keep readers informed with reliable news, clear analysis, and stories that truly matter.
    We're social. Connect with us:

    Facebook X (Twitter) Instagram Pinterest YouTube

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Type above and press Enter to search. Press Esc to cancel.

    Powered by
    ...
    ►
    Necessary cookies enable essential site features like secure log-ins and consent preference adjustments. They do not store personal data.
    None
    ►
    Functional cookies support features like content sharing on social media, collecting feedback, and enabling third-party tools.
    None
    ►
    Analytical cookies track visitor interactions, providing insights on metrics like visitor count, bounce rate, and traffic sources.
    None
    ►
    Advertisement cookies deliver personalized ads based on your previous visits and analyze the effectiveness of ad campaigns.
    None
    ►
    Unclassified cookies are cookies that we are in the process of classifying, together with the providers of individual cookies.
    None
    Powered by