What Is an NPU? Neural Processing Units Explained Simply (2026)
Updated June 2026 · 10-minute read

NPU is one of those acronyms that started showing up everywhere without much explanation. You'll see it in phone specs, laptop reviews, and gadget marketing. Apple calls theirs a "Neural Engine." Qualcomm calls it an "AI Engine." Intel and AMD have their own names for it. They're all referring to essentially the same thing: a Neural Processing Unit.
Understanding what an NPU is matters more than it used to, because it now directly affects which AI features your devices can run, how fast they respond, how long your battery lasts, and how private your data is. This article explains what NPUs are, why they exist, how they differ from the other chips in your device, and what the TOPS numbers in spec sheets actually mean.
The Short Answer
An NPU is a specialised chip designed specifically to run AI calculations efficiently. It's not a replacement for the CPU or GPU in your device it works alongside them, handling a specific type of mathematical operation that AI models require constantly: matrix multiplication.
The reason a dedicated chip matters is efficiency. A CPU can run AI calculations, but it wastes a lot of energy doing so because it's designed for general-purpose tasks. A GPU is better at AI calculations than a CPU, but it's primarily designed for graphics and still not optimal for the specific patterns AI inference uses. An NPU is purpose-built for exactly those calculations which means it does them faster, with far less power consumption.
Why AI Needs So Much Maths
To understand why NPUs exist, it helps to understand what AI models are actually doing when they respond to you.
When you ask Siri a question, send a voice command to Alexa, or use Apple Intelligence's Writing Tools, the AI model takes your input and runs it through billions of mathematical operations to produce a response. These operations are almost entirely matrix multiplications multiplying large grids of numbers together repeatedly, at very high speed.
A large language model like the ones powering modern AI assistants might perform trillions of these operations per second. Running this on a standard CPU would drain your phone battery in minutes and produce results too slowly to be useful. The NPU solves this by having thousands of small processing units that can perform matrix multiplications in parallel doing in milliseconds what a CPU would take seconds to complete, while using a fraction of the power.
Simple analogy: A CPU is like a highly skilled generalist worker who can do any task but works alone. A GPU is like a team of workers who can tackle many tasks at once. An NPU is like a factory assembly line specifically designed for one type of task it's not flexible, but for that one task it's dramatically faster and more efficient than anything else.
NPU vs CPU vs GPU - What Each One Does
Chip | Designed For | AI Performance | Power Efficiency for AI |
|---|---|---|---|
CPU | General computing, logic, single tasks | Poor | ❌ Very inefficient |
GPU | Graphics, parallel processing | Good | ⚠️ Moderate |
NPU | AI inference, matrix maths | Excellent | ✅ Very efficient |
In practice, devices use all three in combination. The CPU handles the operating system and general app logic. The GPU handles display rendering and graphics-intensive tasks. The NPU handles AI inference face recognition, voice processing, language model responses, photo analysis, and anything else that involves running an AI model.
This is why you can have a conversation with Siri or use Writing Tools on the latest iPhones without noticeably draining your battery the NPU handles it efficiently while the CPU and GPU get on with their own jobs.
What TOPS Means (And Why It Matters)
When you see NPU performance listed in specs, it's usually measured in TOPS Tera Operations Per Second. One TOPS means the chip can perform one trillion operations per second. Higher TOPS generally means the NPU can run larger, more capable AI models faster.
Here's how current devices compare:
Device / Chip | NPU Performance | Key AI Features Enabled |
|---|---|---|
Apple A18 Pro (iPhone 16 Pro) | ~35 TOPS | Full Apple Intelligence suite |
Apple M4 (MacBook Pro) | ~38 TOPS | Full Apple Intelligence + larger models |
Qualcomm Snapdragon 8 Gen 3 | ~45 TOPS | On-device AI, Samsung Galaxy AI |
Qualcomm Snapdragon X Elite (laptop) | ~45 TOPS | Windows Copilot+ full feature set |
Intel Core Ultra 200V | ~47 TOPS | Copilot+ PC certified |
MediaTek Dimensity 9300 | ~35 TOPS | On-device AI features, mid-range |
Microsoft set 40 TOPS as the minimum requirement for its Copilot+ PC certification the threshold it calculated is needed to run its AI features in real time at acceptable quality. Apple hasn't published a minimum threshold because it controls the hardware and software together, but the A17 Pro (iPhone 15 Pro) was the first iPhone with enough Neural Engine performance to support Apple Intelligence.
The practical implication: TOPS is a useful comparison metric within the same generation of chips, but it's not perfectly comparable across different architectures. Apple's Neural Engine at 35 TOPS runs Apple Intelligence well because Apple has optimised its software specifically for that architecture. A Qualcomm chip at 45 TOPS doesn't necessarily outperform Apple's chip in real-world AI tasks just because the number is higher.
NPUs in Wearables and Gadgets
NPUs aren't only in phones and laptops. They're increasingly common in the smaller devices that make up the AI gadget category.
Smartwatches
Apple Watch's S-series chips include dedicated Neural Engine hardware that runs health AI features sleep staging, AFib detection, crash detection, and fall detection entirely on the watch without sending data to the cloud. Samsung's Exynos W930 in the Galaxy Watch series serves the same purpose. The on-device processing means these health features work instantly and privately.
Smart glasses
XREAL One's X1 chip is a purpose-built spatial AI processor essentially an NPU designed specifically for the real-time head movement tracking and display stabilisation that AR glasses require. The latency requirement (under 3ms) is impossible to meet with cloud processing, so the AI must run locally. The X1 chip makes this possible without requiring a large battery or generating significant heat in a frame that weighs under 80 grams.
Robot vacuums
Premium AI robot vacuums from Roborock and iRobot include onboard vision processors that run obstacle detection models locally. Identifying a charging cable versus a sock versus pet waste in real time, while navigating, requires dedicated AI inference hardware. Running this in the cloud would introduce too much latency and require reliable WiFi for every cleaning session.
AI voice recorders
Devices like the Plaud NotePin include onboard audio processing chips that handle voice activity detection and basic audio compression locally, with more complex transcription offloaded to the cloud. The local chip ensures recording works without internet and manages battery efficiently during long recording sessions.
Why NPUs Matter for Privacy
The privacy implications of NPUs are significant and often underappreciated. When an AI feature runs on an NPU rather than in the cloud, your data doesn't leave your device. This changes the privacy equation substantially.
Consider face recognition on your phone. If it runs on the cloud, photos of your face go to a server somewhere. If it runs on the NPU, the processing happens inside your device and nothing is transmitted. Apple's Face ID has always worked this way the mathematical representation of your face never leaves the Secure Enclave on your device. Apple Intelligence's Writing Tools, photo editing, and image generation work the same way.
This is why "on-device AI" and "NPU" are closely linked concepts. On-device AI is the goal keeping your data local. NPUs are the hardware that makes it practical providing enough performance to run AI models locally without unacceptable battery drain.
What to Look For When Buying AI Gadgets
You don't need to memorise TOPS numbers to make good buying decisions, but a few questions are worth asking about any AI gadget:
Does it have dedicated AI hardware? A device with an NPU or equivalent dedicated AI chip will perform AI features faster, more consistently, and more efficiently than one relying on a general CPU. For always-on features like health monitoring, dedicated hardware matters a lot for battery life.
Does the AI run on-device or in the cloud? On-device means privacy and offline capability. Cloud means more powerful models but requires internet and involves data leaving your device. Both are valid depending on the use case understanding which applies helps you make an informed privacy decision.
Is the hardware powerful enough for future features? AI features are expanding rapidly. Devices with more capable NPUs will support more advanced AI capabilities as software updates arrive. This is part of why Apple Intelligence requires A17 Pro or later older chips would limit what features could be added in future.
Frequently Asked Questions
Do all phones have an NPU?
Most flagship and upper-mid-range smartphones released since 2022 include a dedicated NPU. Budget phones may use a CPU or GPU for AI tasks instead, which is slower and less power efficient. The NPU is now a standard component in any phone marketed with AI features.
What does Apple call its NPU?
Apple calls it the Neural Engine. It has been part of Apple's A-series chips since the A11 Bionic in 2017 making Apple one of the earliest companies to include dedicated AI inference hardware in a consumer smartphone. The Neural Engine's performance has grown significantly with each chip generation.
What is a Copilot+ PC and what does it have to do with NPUs?
Copilot+ PC is Microsoft's certification for laptops with NPUs capable of at least 40 TOPS. Microsoft introduced this requirement to ensure that AI features like Windows Recall, real-time translation, and generative AI tools run smoothly on-device. Any laptop marketed as a Copilot+ PC is guaranteed to have a capable NPU this is the main hardware requirement of the certification.
Is a higher TOPS number always better?
Higher TOPS indicates more AI processing capacity, but it isn't the only factor in real-world performance. The efficiency of the software, the architecture of the chip, and how well the AI models have been optimised for that specific hardware all matter. Apple's Neural Engine consistently performs well in real-world tasks despite sometimes lower TOPS numbers than competing chips, because Apple optimises iOS and its AI features specifically for its own hardware.
Will my phone's NPU become outdated quickly?
NPU capability doubles roughly every two to three years, similar to GPU performance historically. A phone with a capable NPU today iPhone 15 Pro and later, Snapdragon 8 Gen 3 phones, Galaxy S24 and later will run AI features well for several years. The constraint is usually software support rather than the hardware itself becoming technically incapable.
Can an NPU run ChatGPT or large language models?
Smaller language models typically 7 billion parameters or fewer can run on current flagship NPUs. The Snapdragon 8 Gen 3 can run 7B models locally. Apple's Neural Engine runs the models that power Apple Intelligence. Full-size GPT-4 class models (hundreds of billions of parameters) still require data centre hardware. On-device AI in 2026 runs capable but smaller models, not the largest frontier models.
Chip specifications sourced from manufacturer documentation and independent benchmark testing as of June 2026. NPU performance figures improve with each hardware generation check current specs when purchasing.
