Fueling seamless AI at scale

From large language models (LLMs) to reasoning agents, today’s AI tools bring unprecedented computational demands. Trillion-parameter models, workloads running on-device, and swarms of agents collaborating to complete tasks all require a new paradigm of computing to become truly seamless and ubiquitous.

First, technical progress in hardware and silicon design is critical to pushing the boundaries of compute. Second, advances in machine learning (ML) allow AI systems to achieve increased efficiency with smaller computational demands. Finally, the integration, orchestration, and adoption of AI into applications, devices, and systems is crucial to delivering tangible impact and value.

Silicon’s mid-life crisis

AI has evolved from classical ML to deep learning to generative AI. The most recent chapter, which took AI mainstream, hinges on two phases—training and inference—that are data and energy-intensive in terms of computation, data movement, and cooling. At the same time, Moore’s Law, which determines that the number of transistors on a chip doubles every two years, is reaching a physical and economic plateau.

For the last 40 years, silicon chips and digital technology have nudged each other forward—every step ahead in processing capability frees the imagination of innovators to envision new products, which require yet more power to run. That is happening at light speed in the AI age.

As models become more readily available, deployment at scale puts the spotlight on inference and the application of trained models for everyday use cases. This transition requires the appropriate hardware to handle inference tasks efficiently. Central processing units (CPUs) have managed general computing tasks for decades, but the broad adoption of ML introduced computational demands that stretched the capabilities of traditional CPUs. This has led to the adoption of graphics processing units (GPUs) and other accelerator chips for training complex neural networks, due to their parallel execution capabilities and high memory bandwidth that allow large-scale mathematical operations to be processed efficiently.

But CPUs are already the most widely deployed and can be companions to processors like GPUs and tensor processing units (TPUs). AI developers are also hesitant to adapt software to fit specialized or bespoke hardware, and they favor the consistency and ubiquity of CPUs. Chip designers are unlocking performance gains through optimized software tooling, adding novel processing features and data types specifically to serve ML workloads, integrating specialized units and accelerators, and advancing silicon chip innovations, including custom silicon. AI itself is a helpful aid for chip design, creating a positive feedback loop in which AI helps optimize the chips that it needs to run. These enhancements and strong software support mean modern CPUs are a good choice to handle a range of inference tasks.

Beyond silicon-based processors, disruptive technologies are emerging to address growing AI compute and data demands. The unicorn start-up Lightmatter, for instance, introduced photonic computing solutions that use light for data transmission to generate significant improvements in speed and energy efficiency. Quantum computing represents another promising area in AI hardware. While still years or even decades away, the integration of quantum computing with AI could further transform fields like drug discovery and genomics.

Understanding models and paradigms

The developments in ML theories and network architectures have significantly enhanced the efficiency and capabilities of AI models. Today, the industry is moving from monolithic models to agent-based systems characterized by smaller, specialized models that work together to complete tasks more efficiently at the edge—on devices like smartphones or modern vehicles. This allows them to extract increased performance gains, like faster model response times, from the same or even less compute.

Researchers have developed techniques, including few-shot learning, to train AI models using smaller datasets and fewer training iterations. AI systems can learn new tasks from a limited number of examples to reduce dependency on large datasets and lower energy demands. Optimization techniques like quantization, which lower the memory requirements by selectively reducing precision, are helping reduce model sizes without sacrificing performance.

New system architectures, like retrieval-augmented generation (RAG), have streamlined data access during both training and inference to reduce computational costs and overhead. The DeepSeek R1, an open source LLM, is a compelling example of how more output can be extracted using the same hardware. By applying reinforcement learning techniques in novel ways, R1 has achieved advanced reasoning capabilities while using far fewer computational resources in some contexts.

The integration of heterogeneous computing architectures, which combine various processing units like CPUs, GPUs, and specialized accelerators, has further optimized AI model performance. This approach allows for the efficient distribution of workloads across different hardware components to optimize computational throughput and energy efficiency based on the use case.

Orchestrating AI

As AI becomes an ambient capability humming in the background of many tasks and workflows, agents are taking charge and making decisions in real-world scenarios. These range from customer support to edge use cases, where multiple agents coordinate and handle localized tasks across devices.

With AI increasingly used in daily life, the role of user experiences becomes critical for mass adoption. Features like predictive text in touch keyboards, and adaptive gearboxes in vehicles, offer glimpses of AI as a vital enabler to improve technology interactions for users.

Edge processing is also accelerating the diffusion of AI into everyday applications, bringing computational capabilities closer to the source of data generation. Smart cameras, autonomous vehicles, and wearable technology now process information locally to reduce latency and improve efficiency. Advances in CPU design and energy-efficient chips have made it feasible to perform complex AI tasks on devices with limited power resources. This shift toward heterogeneous compute enhances the development of ambient intelligence, where interconnected devices create responsive environments that adapt to user needs.

Seamless AI naturally requires common standards, frameworks, and platforms to bring the industry together. Contemporary AI brings new risks. For instance, by adding more complex software and personalized experiences to consumer devices, it expands the attack surface for hackers, requiring stronger security at both the software and silicon levels, including cryptographic safeguards and transforming the trust model of compute environments.

More than 70% of respondents to a 2024 DarkTrace survey reported that AI-powered cyber threats significantly impact their organizations, while 60% say their organizations are not adequately prepared to defend against AI-powered attacks.

Collaboration is essential to forging common frameworks. Universities contribute foundational research, companies apply findings to develop practical solutions, and governments establish policies for ethical and responsible deployment. Organizations like Anthropic are setting industry standards by introducing frameworks, such as the Model Context Protocol, to unify the way developers connect AI systems with data. Arm is another leader in driving standards-based and open source initiatives, including ecosystem development to accelerate and harmonize the chiplet market, where chips are stacked together through common frameworks and standards. Arm also helps optimize open source AI frameworks and models for inference on the Arm compute platform, without needing customized tuning.

How far AI goes to becoming a general-purpose technology, like electricity or semiconductors, is being shaped by technical decisions taken today. Hardware-agnostic platforms, standards-based approaches, and continued incremental improvements to critical workhorses like CPUs, all help deliver the promise of AI as a seamless and silent capability for individuals and businesses alike. Open source contributions are also helpful in allowing a broader range of stakeholders to participate in AI advances. By sharing tools and knowledge, the community can cultivate innovation and help ensure that the benefits of AI are accessible to everyone, everywhere.

Learn more about Arm’s approach to enabling AI everywhere.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.

This content was researched, designed, and written entirely by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

Subscribe to Updates

What's Hot

Fueling seamless AI at scale

Fueling seamless AI at scale

Silicon’s mid-life crisis

Understanding models and paradigms

Orchestrating AI

Related Posts