Alibaba launches open source Qwen3 model that surpasses OpenAI o1 and DeepSeek R1

April 28, 2025 4:56 PM

Credit: VentureBeat made with Qwen Chat

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Chinese e-commerce and web giant Alibaba’s Qwen team has officially launched a new series of open source AI large language multimodal models known as Qwen3 that appear to be among the state-of-the-art for open models, and approach performance of proprietary models from the likes of OpenAI and Google.

The Qwen3 series features two “mixture-of-experts” models and six dense models for a total of eight (!) new models. The “mixture-of-experts” approach involves having several different specialty model types combined into one, with only those relevant models to the task at hand being activated when needed in the internal settings of the model (known as parameters). It was popularized by open source French AI startup Mistral.

According to the team, the 235-billion parameter version of Qwen3 codenamed A22B outperforms DeepSeek’s open source R1 and OpenAI’s proprietary o1 on key third-party benchmarks including ArenaHard (with 500 user questions in software engineering and math) and nears the performance of the new, proprietary Google Gemini 2.5-Pro.

Overall, the benchmark data positions Qwen3-235B-A22B as one of the most powerful publicly available models, achieving parity or superiority relative to major industry offerings.

Hybrid (reasoning) theory

The Qwen3 models are trained to provide so-called “hybrid reasoning” or “dynamic reasoning” capabilities, allowing users to toggle between fast, accurate responses and more time-consuming and compute-intensive reasoning steps (similar to OpenAI’s “o” series) for more difficult queries in science, math, engineering and other specialized fields. This is an approach pioneered by Nous Research and other AI startups and research collectives.

With Qwen3, users can engage the more intensive “Thinking Mode” using the button marked as such on the Qwen Chat website or by embedding specific prompts like /think or /no_think when deploying the model locally or through the API, allowing for flexible use depending on the task complexity.

Users can now access and deploy these models across platforms like Hugging Face, ModelScope, Kaggle, and GitHub, as well as interact with them directly via the Qwen Chat web interface and mobile applications. The release includes both Mixture of Experts (MoE) and dense models, all available under the Apache 2.0 open-source license.

In my brief usage of the Qwen Chat website so far, it was able to generate imagery relatively rapidly and with decent prompt adherence — especially when incorporating text into the image natively while matching the style. However, it often prompted me to log in and was subject to the usual Chinese content restrictions (such as prohibiting prompts or responses related to the Tiananmen Square protests).

In addition to the MoE offerings, Qwen3 includes dense models at different scales: Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, and Qwen3-0.6B.

These models vary in size and architecture, offering users options to fit diverse needs and computational budgets.

The Qwen3 models also significantly expand multilingual support, now covering 119 languages and dialects across major language families. This broadens the models’ potential applications globally, facilitating research and deployment in a wide range of linguistic contexts.

Model training and architecture

In terms of model training, Qwen3 represents a substantial step up from its predecessor, Qwen2.5. The pretraining dataset doubled in size to approximately 36 trillion tokens.

The data sources include web crawls, PDF-like document extractions, and synthetic content generated using previous Qwen models focused on math and coding.

The training pipeline consisted of a three-stage pretraining process followed by a four-stage post-training refinement to enable the hybrid thinking and non-thinking capabilities. The training improvements allow the dense base models of Qwen3 to match or exceed the performance of much larger Qwen2.5 models.

Deployment options are versatile. Users can integrate Qwen3 models using frameworks such as SGLang and vLLM, both of which offer OpenAI-compatible endpoints.

For local usage, options like Ollama, LMStudio, MLX, llama.cpp, and KTransformers are recommended. Additionally, users interested in the models’ agentic capabilities are encouraged to explore the Qwen-Agent toolkit, which simplifies tool-calling operations.

Junyang Lin, a member of the Qwen team, commented on X that building Qwen3 involved addressing critical but less glamorous technical challenges such as scaling reinforcement learning stably, balancing multi-domain data, and expanding multilingual performance without quality sacrifice.

Lin also indicated that the team is transitioning focus toward training agents capable of long-horizon reasoning for real-world tasks.

What it means for enterprise decision-makers

Engineering teams can point existing OpenAI-compatible endpoints to the new model in hours instead of weeks. The MoE checkpoints (235 B parameters with 22 B active, and 30 B with 3 B active) deliver GPT-4-class reasoning at roughly the GPU memory cost of a 20–30 B dense model.

Official LoRA and QLoRA hooks allow private fine-tuning without sending proprietary data to a third-party vendor.

Dense variants from 0.6 B to 32 B make it easy to prototype on laptops and scale to multi-GPU clusters without rewriting prompts.

Running the weights on-premises means all prompts and outputs can be logged and inspected. MoE sparsity reduces the number of active parameters per call, cutting the inference attack surface.

The Apache-2.0 license removes usage-based legal hurdles, though organizations should still review export-control and governance implications of using a model trained by a China-based vendor.

Yet at the same time, it also offers a viable alternative to other Chinese players including DeepSeek, Tencent, and ByteDance — as well as the myriad and growing number of North American models such as the aforementioned OpenAI, Google, Microsoft, Anthropic, Amazon, Meta and others. The permissive Apache 2.0 license — which allows for unlimited commercial usage — is also a big advantage over other open source players like Meta, whose licenses are more restrictive.

It indicates furthermore that the race between AI providers to offer ever-more powerful and accessible models continues to remain highly competitive, and savvy organizations looking to cut costs should attempt to remain flexible and open to evaluating said new models for their AI agents and workflows.

Looking ahead

The Qwen team positions Qwen3 not just as an incremental improvement but as a significant step toward future goals in Artificial General Intelligence (AGI) and Artificial Superintelligence (ASI), AI significantly smarter than humans.

Plans for Qwen’s next phase include scaling data and model size further, extending context lengths, broadening modality support, and enhancing reinforcement learning with environmental feedback mechanisms.

As the landscape of large-scale AI research continues to evolve, Qwen3’s open-weight release under an accessible license marks another important milestone, lowering barriers for researchers, developers, and organizations aiming to innovate with state-of-the-art LLMs.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Subscribe to Updates

What's Hot

Alibaba launches open source Qwen3 model that surpasses OpenAI o1 and DeepSeek R1

Alibaba launches open source Qwen3 model that surpasses OpenAI o1 and DeepSeek R1

Hybrid (reasoning) theory

Model training and architecture

What it means for enterprise decision-makers

Looking ahead

Related Posts