Close Menu
    What's Hot

    Gwyneth Paltrow’s puzzling dairy substitute—arugula—takes off on social media like a rocket

    Obscure Group With Trump Ties Plans to Route Funds to His Allies for Legal Fights

    Four Immigrant Workers Burned to Death in Italy

    Facebook X (Twitter) Instagram
    Trending
    • Gwyneth Paltrow’s puzzling dairy substitute—arugula—takes off on social media like a rocket
    • Obscure Group With Trump Ties Plans to Route Funds to His Allies for Legal Fights
    • Four Immigrant Workers Burned to Death in Italy
    • Trump Agitates Brazil’s Politics, Again
    • Iranian Drone and Missile Barrage Against Kuwait Inflames Regional Tensions
    • Meta mercifully spun out VR fitness game Supernatural instead of just killing it
    • Enclayve Is a Drab Black Box for Your Private Group Chats
    • Senior official from Germany’s AfD meets top Kremlin associates
    interluknewsinterluknews
    • Home
    • Business
      • Corporate News
      • Industry Insights
      • Startups & Entrepreneurship
      • Technology & Innovation
    • Economy
      • Economic Policy
      • Financial Analysis
      • Inflation & Interest Rates
      • Trade & Markets
    • Global
      • Conflicts & Security
      • Diplomacy
      • Global Trends
      • International Affairs
    • Lifestyle
      • Fashion
      • Food & Dining
      • Personal Development
      • Travel
    • Opinion
      • Columns
      • Editorials
      • Expert Opinions
      • Reader Voices
    • More
      • Politics
        • Elections
        • Government & Policy
        • International Relations
        • Political Analysis
      • Sports
        • Cricket
        • Football / Soccer
        • International Sports
        • Local Sports
      • Technology
        • Artificial Intelligence
        • Cybersecurity
        • Gadgets & Reviews
        • Tech News
      • South Africa News
    Facebook X (Twitter) Instagram
    interluknewsinterluknews
    Artificial Intelligence

    Teaching AI agents to ask better questions by playing “Battleship” | MIT News

    adminBy adminJune 3, 2026No Comments7 Mins Read
    Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
    Teaching AI agents to ask better questions by playing “Battleship” | MIT News
    Share
    Facebook Twitter LinkedIn Pinterest Email

    In 2026, the hype for artificial intelligence agents is louder than ever before. These semi-autonomous programs can “think” and execute well-defined tasks in areas like customer service and software development, typically using language models (LMs). But fields like medical diagnosis and scientific discovery require them to inquire about a vast range of solutions in uncertain environments, which LMs struggle with.

    Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Harvard University’s School of Engineering and Applied Sciences (SEAS) peered deeper into LMs to understand their main issues in high-stakes settings. Their test: “Battleship,” a classic guessing game that’s helped cognitive scientists study how humans seek information. 

    CSAIL and SEAS scholars added a twist by reframing the game around asking and answering natural language questions. In their “Collaborative Battleship” game, one participant is a “captain” who inquires about where hidden ships are, while their teammate plays the “spotter” by responding to those questions in real-time.

    The researchers first had over 40 humans play the game together, collecting their questions and yes-no answers to build the “BattleshipQA” dataset. These results were a helpful point of comparison when the team tested state-of-the-art LMs (like GPT-5) and smaller models (like Llama 4 Scout) on their game. Without training the models beforehand, they found that top LMs can “beat” humans at “Battleship” — that is, complete the game in fewer turns — but smaller systems are far less rational.

    The chief issue was that many models are simply not adept at coming up with useful questions. To get LMs to inquire in ways that reveal more information about hidden ships, the researchers gave each model a Monte Carlo inference strategy, which carefully measures the likelihood of different options being correct with each response. The result: AI models that can beat regular players at “Battleship,” regardless of scale.

    Perhaps the most striking results were Llama 4 Scout’s gains. As a relatively small LM, it only beat humans 8 percent of the time. But with refinements to its inference strategy, the model reached a “Battleship” win rate of 82 percent versus humans. This careful and efficient style of asking questions also enabled the model to outpace a frontier model (GPT-5), while operating at around 1 percent of its cost.

    On top of this improvement, the researchers shrank the gap between humans and LMs in answering questions. While GPT-5 was a reliable spotter that helped models finish games faster, smaller systems had a bad habit of giving the wrong answers about where ships were hidden. The models saw an accuracy boost of 15 percent on average when they began converting questions into code that explicitly tells them how to verify their answers (for example, having the model run a quick search of an area when asked if a ship was there). 

    “Today’s language models are primarily optimized to answer complex queries, but it’s less clear whether they learn to ask good questions for themselves,” says MIT PhD student and CSAIL researcher Gabriel Grand SM ’23, who is a lead author on a paper about the work. “Our work shows that asking informative questions depends on the ability to predict and simulate the world. We find that when we give agents access to a ‘world model,’ they ask better questions and make discoveries more efficiently.”

    A sea change for LMs

    The team’s first focus was getting LMs to ask better questions. By implementing Monte Carlo inference strategies, the LMs reason about potential guesses as individual particles. The ones that appear more valid with each answer from the spotter would be weighted more heavily, sort of like game balls that inflate or deflate each turn. With this more calculated, adaptive approach, the captain could make inquiries that extracted considerably more info from the spotter.

    The scientists then turned to the widely used programming language Python to help out AI spotters. Each question the captain asked was automatically converted into an encoded command. For example, a question like, “Is there a ship in column one that spans two rows?” turns into instructions for the spotter LM to search the area in question and assess how wide the digital game piece is. By giving the model clear directions in a language it understands particularly well, each system gave correct answers considerably more often. The lightweight system GPT-4o-mini saw a nearly 30 percent performance bump, for instance, and even the large model Claude 4 Opus jumped about eight points.

    “The field has seen a lot of success from ‘auto-formalization’ strategies, in which LMs generate code to verify their solutions,” says senior author Jacob Andreas, an MIT electrical engineering and computer science associate professor and CSAIL principal investigator. “What I find most exciting about this work is that it opens up the possibility of using these techniques to generate better solutions in the first place, by improving LMs’ exploration and information gathering capabilities. We are excited to scale this work up from scientific domains to applications like coding and mathematical problem-solving.”

    Let’s play something else

    But how would this approach fare in other board games? The team tested their newly equipped LMs at “Guess Who?”, where large and small models skillfully whittled down 100 options to correctly guess which hidden character had been chosen. Llama 4 Scout was successful 30 percent of the time, but after Grand and his colleagues’ tweaks, it completed the task on over 72 percent of its runs. Meanwhile, GPT-4o leapt from 62 percent to 90 percent. GPT-5 was the spotter in each game to ensure questions were answered as accurately as possible.

    While LMs have made promising progress in both games, there’s room for improvement. For instance, the models still struggle to answer complex questions, compared to humans. OpenAI researcher, recent Harvard graduate, and coauthor Valerio Pepe adds that “GPT-5 can beat your average ‘Battleship’ player, and gets a hair better with our methods. However, expert players are still hard to beat for all models, unlike in chess, where even top players don’t succeed against AI systems.”

    The researchers’ findings show that AI agents have untapped potential in “needle-in-a-haystack” discovery — navigating a massive space of options to find a rare solution to scientific challenges. While improved information-seeking skills would make them excellent research assistants with, say, identifying a compound’s molecular structure, the researchers caution that “Collaborative Battleship” is a somewhat simple test bed. They’d like to test LMs in more complex settings, where the systems have to consider far more options.

    Grand also plans to have humans and AI models collaborate to study whether they work better together. The models might also benefit from a bit of fine-tuning on game simulations, and with more computing power, LMs would have more advanced inference capabilities to predict how a game will evolve. 

    “As AI systems become more agentic, the hardest problems turn out to be social ones: tracking common ground, resolving misunderstandings, and adapting to different partners over time,” says Robert Hawkins, assistant professor of linguistics at Stanford University, who wasn’t involved in the paper. “This work elegantly captures these phenomena in a controlled collaborative setting, and makes a compelling case that the real bottleneck for AI agents isn’t just the calculation of optimal questions, but the pragmatic reasoning needed to make the most of their answers.”

    Grand and Pepe wrote the paper with two CSAIL principal investigators: MIT Associate Professor Jacob Andreas and MIT Professor Joshua Tenenbaum. Their work was supported, in part, by the MIT Siegel Family Quest for Intelligence, the MIT-IBM Watson AI Lab, the FinTechAI@CSAIL initiative, a Sloan Research Fellowship, Intel, the Air Force Office of Scientific Research, the Defense Advanced Research Projects Agency, the Office of Naval Research, and the National Science Foundation. They showcased their paper as an oral presentation at the International Conference on Learning Representations (ICLR) in April.

    agents Battleship MIT news playing Questions teaching
    Follow on Google News Follow on Flipboard
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
    Previous ArticleWhatsApp, Slack Notifications Could Hijack Google Gemini on Android
    Next Article Elon Musk and America’s Far Right Stoke Anger Over Murder of UK Teen
    admin
    • Website

    Related Posts

    Does Israel have nukes? ‘Most of the world assesses they do,’ says Rubio | Nuclear Weapons News

    June 3, 2026

    Players Championship 20: Ross Smith beats William O’Connor in Milton Keynes final to claim 10th career title | Darts News

    June 3, 2026

    Andoni Iraola set to sign Liverpool contract on Thursday after reaching agreement to become Arne Slot’s successor | Football News

    June 3, 2026
    Leave A Reply Cancel Reply

    Demo
    Latest Posts

    Gwyneth Paltrow’s puzzling dairy substitute—arugula—takes off on social media like a rocket

    Obscure Group With Trump Ties Plans to Route Funds to His Allies for Legal Fights

    Four Immigrant Workers Burned to Death in Italy

    Trump Agitates Brazil’s Politics, Again

    Latest Posts

    Subscribe to News

    Get the latest sports news from NewsSite about world, sports and politics.

    Advertisement
    Demo

    We are a digital news platform delivering timely, accurate, and insightful coverage of politics, global affairs, business, economy, sports, and more. Our mission is to keep readers informed with reliable news, clear analysis, and stories that truly matter.
    We're social. Connect with us:

    Facebook X (Twitter) Instagram Pinterest YouTube

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Type above and press Enter to search. Press Esc to cancel.

    Powered by
    ...
    ►
    Necessary cookies enable essential site features like secure log-ins and consent preference adjustments. They do not store personal data.
    None
    ►
    Functional cookies support features like content sharing on social media, collecting feedback, and enabling third-party tools.
    None
    ►
    Analytical cookies track visitor interactions, providing insights on metrics like visitor count, bounce rate, and traffic sources.
    None
    ►
    Advertisement cookies deliver personalized ads based on your previous visits and analyze the effectiveness of ad campaigns.
    None
    ►
    Unclassified cookies are cookies that we are in the process of classifying, together with the providers of individual cookies.
    None
    Powered by