Close Menu
    What's Hot

    Mistrust Spreads With the Ebola Virus in Congo

    Ultra-Orthodox Riot Shocks Israelis In Latest Protest At Military Draft

    Meta’s Oversight Board says account bans lack due process, transparency

    Facebook X (Twitter) Instagram
    Trending
    • Mistrust Spreads With the Ebola Virus in Congo
    • Ultra-Orthodox Riot Shocks Israelis In Latest Protest At Military Draft
    • Meta’s Oversight Board says account bans lack due process, transparency
    • What Are A.I. Agents Actually Doing?
    • ‘60 Minutes’ Turmoil: What to Know After Scott Pelley’s Firing
    • Iran players: U.S. politics takes toll on World Cup preparations
    • Meta rolls out a new AI creator assistant on Facebook
    • NSF renews support for MIT-led AI and physics institute, expanding a new model for discovery | MIT News
    interluknewsinterluknews
    • Home
    • Business
      • Corporate News
      • Industry Insights
      • Startups & Entrepreneurship
      • Technology & Innovation
    • Economy
      • Economic Policy
      • Financial Analysis
      • Inflation & Interest Rates
      • Trade & Markets
    • Global
      • Conflicts & Security
      • Diplomacy
      • Global Trends
      • International Affairs
    • Lifestyle
      • Fashion
      • Food & Dining
      • Personal Development
      • Travel
    • Opinion
      • Columns
      • Editorials
      • Expert Opinions
      • Reader Voices
    • More
      • Politics
        • Elections
        • Government & Policy
        • International Relations
        • Political Analysis
      • Sports
        • Cricket
        • Football / Soccer
        • International Sports
        • Local Sports
      • Technology
        • Artificial Intelligence
        • Cybersecurity
        • Gadgets & Reviews
        • Tech News
      • South Africa News
    Facebook X (Twitter) Instagram
    interluknewsinterluknews
    Startups & Entrepreneurship

    Alibaba’s small, open source Qwen3.5-9B beats OpenAI’s gpt-oss-120B and can run on standard laptops

    adminBy adminMarch 2, 2026No Comments8 Mins Read
    Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
    Alibaba’s small, open source Qwen3.5-9B beats OpenAI’s gpt-oss-120B and can run on standard laptops
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Despite political turmoil in the U.S. AI sector, in China, the AI advances are continuing apace without a hitch.

    Earlier today, e-commerce giant Alibaba’s Qwen Team of AI researchers, focused primarily on developing and releasing to the world a growing family of powerful and capable Qwen open source language and multimodal AI models, unveiled its newest batch, the Qwen3.5 Small Model Series, which consists of:

    • Qwen3.5-0.8B & 2B: Two models, both ptimized for “tiny” and “fast” performance, intended for prototyping and deployment on edge devices where battery life is paramount.

    • Qwen3.5-4B: A strong multimodal base for lightweight agents, natively supporting a 262,144 token context window.

    • Qwen3.5-9B a compact reasoning model that outperforms the 13.5x larger U.S. rival OpenAI’s open soruce gpt-oss-120B on key third-party benchmarks including multilingual knowledge and graduate-level reasoning

    To put this into perspective, these models are on the order of the smallest general purpose models lately shipped by any lab around the world, comparable more to MIT offshoot LiquidAI’s LFM2 series, which also have several hundred million or billion parameters, than the estimated trillion parameters (model settings) reportedly used for the flagship models from OpenAI, Anthropic, and Google’s Gemini series.

    The weights for the models are available right now globally under Apache 2.0 licenses — perfect for enterprise and commercial use, including customization as needed — on Hugging Face and ModelScope.

    The technology: hybrid efficiency and native multimodality

    The technical foundation of the Qwen3.5 small series is a departure from standard Transformer architectures. Alibaba has moved toward an Efficient Hybrid Architecture that combines Gated Delta Networks (a form of linear attention) with sparse Mixture-of-Experts (MoE).

    This hybrid approach addresses the “memory wall” that typically limits small models; by using Gated Delta Networks, the models achieve higher throughput and significantly lower latency during inference.

    Furthermore, these models are natively multimodal. Unlike previous generations that “bolted on” a vision encoder to a text model, Qwen3.5 was trained using early fusion on multimodal tokens. This allows the 4B and 9B models to exhibit a level of visual understanding—such as reading UI elements or counting objects in a video—that previously required models ten times their size.

    Benchmarking the “small” series: performance that defies scale

    Newly released benchmark data illustrates just how aggressively these compact models are competing with—and often exceeding—much larger industry standards. The Qwen3.5-9B and Qwen3.5-4B variants demonstrate a cross-generational leap in efficiency, particularly in multimodal and reasoning tasks.

    Qwen3.5 Small Models Series benchmarks

    Qwen3.5 Small Models Series benchmarks against other similarly-sized/classed models. Credit: Alibaba Qwen

    Multimodal dominance: In the MMMU-Pro visual reasoning benchmark, Qwen3.5-9B achieved a score of 70.1, outperforming Gemini 2.5 Flash-Lite (59.7) and even the specialized Qwen3-VL-30B-A3B (63.0).

    Graduate-level reasoning: On the GPQA Diamond benchmark, the 9B model reached a score of 81.7, surpassing gpt-oss-120b (80.1), a model with over ten times its parameter count.

    Video understanding: The series shows elite performance in video reasoning. On the Video-MME (with subtitles) benchmark, Qwen3.5-9B scored 84.5 and the 4B scored 83.5, significantly leading over Gemini 2.5 Flash-Lite (74.6).

    Mathematical prowess: In the HMMT Feb 2025 (Harvard-MIT mathematics tournament) evaluation, the 9B model scored 83.2, while the 4B variant scored 74.0, proving that high-level STEM reasoning no longer requires massive compute clusters.

    Document and multilingual knowledge: The 9B variant leads the pack in document recognition on OmniDocBench v1.5 with a score of 87.7. Meanwhile, it maintains a top-tier multilingual presence on MMMLU with a score of 81.2, outperforming gpt-oss-120b (78.2).

    Community reactions: “more intelligence, less compute”

    Coming on the heels of last week’s release of an already pretty small, powerful open source Qwen3.5-Medium capable of running on a single GPU, the announcement of the Qwen3.5-Small Models Series and their even smaller footprint and processing requirements sparked immediate interest among developers focused on “local-first” AI.

    “More intelligence, less compute” resonated with users seeking alternatives to cloud-based models.

    AI and tech educator Paul Couvert of Blueshell AI captured the industry’s shock regarding this efficiency leap.

    “How is this even possible?!” Couvert wrote on X. “Qwen has released 4 new models and the 4B version is almost as capable as the previous 80B A3B one. And the 9B is as good as GPT OSS 120b while being 13x smaller!”

    Couvert’s analysis highlights the practical implications of these architectural gains:

    • “They can run on any laptop”

    • “0.8B and 2B for your phone”

    • “Offline and open source”

    As developer Karan Kendre of Kargul Studio put it: “these models [can run] locally on my M1 MacBook Air for free.”

    This sentiment of “amazing” accessibility is echoed across the developer ecosystem. One user noted that a 4B model serving as a “strong multimodal base” is a “game changer for mobile devs” who need screen-reading capabilities without high CPU overhead.

    Indeed, Hugging Face developer Xenova noted that the new Qwen3.5 Small Model series can even run directly in a user’s web browser and perform such sophisticated and previously higher-compute demanding operations like video analysis.

    Researchers also praised the release of Base models alongside the Instruct versions, noting that it provides essential support for “real-world industrial innovation.”

    The release of Base models is particularly valued by enterprise and research teams because it provides a “blank slate” that hasn’t been biased by a specific set of RLHF (Reinforcement Learning from Human Feedback) or SFT (Supervised Fine-Tuning) data, which can often lead to “refusals” or specific conversational styles that are difficult to undo.

    Now, with the Base models, those interested in customizing the model to fit specific tasks and purposes an easier starting point, as they can now apply their own instruction tuning and post-training without having to strip away Alibaba’s.

    Licensing: a win for the open ecosystem

    Alibaba has released the weights and configuration files for the Qwen3.5 series under the Apache 2.0 license. This permissive license allows for commercial use, modification, and distribution without royalty payments, removing the “vendor lock-in” associated with proprietary APIs.

    • Commercial use: Developers can integrate models into commercial products royalty-free.

    • Modification: Teams can fine-tune (SFT) or apply RLHF to create specialized versions.

    • Distribution: Models can be redistributed in local-first AI applications like Ollama.

    Contextualizing the news: why small matters so much right now

    The release of the Qwen3.5 Small Series arrives at a moment of “Agentic Realignment.” We have moved past simple chatbots; the goal now is autonomy. An autonomous agent must “think” (reason), “see” (multimodality), and “act” (tool use). While doing this with trillion-parameter models is prohibitively expensive, a local Qwen3.5-9B can perform these loops for a fraction of the cost.

    By scaling Reinforcement Learning (RL) across million-agent environments, Alibaba has endowed these small models with “human-aligned judgment,” allowing them to handle multi-step objectives like organizing a desktop or reverse-engineering gameplay footage into code. Whether it is a 0.8B model running on a smartphone or a 9B model powering a coding terminal, the Qwen3.5 series is effectively democratizing the “agentic era.”

    The Qwen3.5 series shift from “chatbits” to “native multimodal agents” transforms how enterprises can distribute intelligence. By moving sophisticated reasoning to the “edge”—individual devices and local servers—organizations can automate tasks that previously required expensive cloud APIs or high-latency processing.

    Strategic enterprise applications and considerations

    The 0.8B to 9B models are re-engineered for efficiency, utilizing a hybrid architecture that activations only the necessary parts of the network for each task.

    • Visual Workflow Automation: Using “pixel-level grounding,” these models can navigate desktop or mobile UIs, fill out forms, and organize files based on natural language instructions.

    • Complex Document Parsing: With scores exceeding 90% on document understanding benchmarks, they can replace separate OCR and layout parsing pipelines to extract structured data from diverse forms and charts.

    • Autonomous Coding & Refactoring: Enterprises can feed entire repositories (up to 400,000 lines of code) into the 1M context window for production-ready refactors or automated debugging.

    • Real-Time Edge Analysis: The 0.8B and 2B models are designed for mobile devices, enabling offline video summarization (up to 60 seconds at 8 FPS) and spatial reasoning without taxing battery life.

    The table below outlines which enterprise functions stand to gain the most from local, small-model deployment.

    Function

    Primary Benefit

    Key Use Case

    Software Engineering

    Local Code Intelligence

    Repository-wide refactoring and terminal-based agentic coding.

    Operations & IT

    Secure Automation

    Automating multi-step system settings and file management tasks locally.

    Product & UX

    Edge Interaction

    Integrating native multimodal reasoning directly into mobile/desktop apps.

    Data & Analytics

    Efficient Extraction

    High-fidelity OCR and structured data extraction from complex visual reports.

    While these models are highly capable, their small scale and “agentic” nature introduce specific operational “flags” that teams must monitor.

    • The Hallucination Cascade: In multi-step “agentic” workflows, a small error in an early step can lead to a “cascade” of failures where the agent pursues an incorrect or nonsensical plan.

    • Debugging vs. Greenfield Coding: While these models excel at writing new “greenfield” code, they can struggle with debugging or modifying existing, complex legacy systems.

    • Memory and VRAM Demands: Even “small” models (like the 9B) require significant VRAM for high-throughput inference; the “memory footprint” remains high because the total parameter count still occupies GPU space.

    • Regulatory & Data Residency: Using models from a China-based provider may raise data residency questions in certain jurisdictions, though the Apache 2.0 open-weight version allows for hosting on “sovereign” local clouds.

    Enterprises should prioritize “verifiable” tasks—such as coding, math, or instruction following—where the output can be automatically checked against predefined rules to prevent “reward hacking” or silent failures.

    Alibabas Beats gptoss120B Laptops Open OpenAIs Qwen3.59B run small source standard
    Follow on Google News Follow on Flipboard
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
    Previous ArticleBlipBlox After Dark Review: a Synthesizer for Everybody
    Next Article Majority of Americans oppose Trump’s Iran strikes, per new polling
    admin
    • Website

    Related Posts

    Meta’s Oversight Board says account bans lack due process, transparency

    June 4, 2026

    Data center operator reveals plans for downtown Seattle facility as city weighs one-year ban – GeekWire

    June 4, 2026

    Why Thieves Keep Stealing Copper From Phone Lines and EVs

    June 4, 2026
    Leave A Reply Cancel Reply

    Demo
    Latest Posts

    Mistrust Spreads With the Ebola Virus in Congo

    Ultra-Orthodox Riot Shocks Israelis In Latest Protest At Military Draft

    Meta’s Oversight Board says account bans lack due process, transparency

    What Are A.I. Agents Actually Doing?

    Latest Posts

    Subscribe to News

    Get the latest sports news from NewsSite about world, sports and politics.

    Advertisement
    Demo

    We are a digital news platform delivering timely, accurate, and insightful coverage of politics, global affairs, business, economy, sports, and more. Our mission is to keep readers informed with reliable news, clear analysis, and stories that truly matter.
    We're social. Connect with us:

    Facebook X (Twitter) Instagram Pinterest YouTube

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Type above and press Enter to search. Press Esc to cancel.

    Powered by
    ...
    ►
    Necessary cookies enable essential site features like secure log-ins and consent preference adjustments. They do not store personal data.
    None
    ►
    Functional cookies support features like content sharing on social media, collecting feedback, and enabling third-party tools.
    None
    ►
    Analytical cookies track visitor interactions, providing insights on metrics like visitor count, bounce rate, and traffic sources.
    None
    ►
    Advertisement cookies deliver personalized ads based on your previous visits and analyze the effectiveness of ad campaigns.
    None
    ►
    Unclassified cookies are cookies that we are in the process of classifying, together with the providers of individual cookies.
    None
    Powered by