Tuesday, 1 April 2014

Apple’s A7 Cyclone CPU detailed: A desktop class chip that has more in common with Haswell than Krait


Apple A7 SoC


Some six months after Apple shocked the world with its 64-bit A7 SoC, which appeared in the iPhone 5S and then the iPad Air, we finally have some hard details on the Cyclone CPU’s architecture. It seems almost every tech writer was wrong about the A7: The CPU is not just a gradual evolution of its Swift predecessor — it’s an entirely different beast that’s actually more akin to a “big core” Intel or AMD CPU than a conventional “small core” CPU.

These new details come from Apple’s recent source code commits to the LLVM project. For some reason, Apple waited six months before committing the changes (the Swift core was committed very close to its release). The files clearly outline the name of the CPU’s microarchitecture (Cyclone), and all of the key details that ultimately dictate the CPU’s performance, power consumption, optimal usage scenarios, and ability to scale to higher clock speeds.
Code snippet from the LLVM, showing Apple's Cyclone core microarchitecture
Code snippet from LLVM, showing Apple’s Cyclone core microarchitecture

To begin with, Cyclone is very wide. It can decode, issue, and retire up to six instructions per clock cycle. By way of comparison, Swift and Krait (Qualcomm’s current mobile CPU core) can’t do more than three concurrent operations. There is also a massive 192-entry re-order buffer (ROB) — the same size as Haswell’s ROB (which makes sense, given they both make heavy use of OoOE (out-of-order execution).
The brand mispredict penalty goes up slightly, but interestingly there’s a range of penalties from 14 to 19 cycles — the same range as Intel’s newer CPU cores (Sandy Bridge and later).
Apple Cyclone CPU block diagram
Apple Cyclone CPU block diagram [Image credit: Anandtech]
 

On the actual number crunching side of things, Cyclone is seriously beefy: It has four FPUs (up from Swift’s two), two load/store units (up from one), two branch units (up from one), and there are three FP/NEON units. Working together with the six decoder units and 192-entry ROB, Cyclone can sustain three FP/NEON adds in parallel per clock. To accommodate all of this beastliness, Cyclone doubles the instruction and data caches to 64KB each (per core).
In short, Cyclone is a serious CPU. In the words of Anandtech, “With six decoders and nine ports to execution units, Cyclone is big… bigger than anything else that goes in a phone.” When Apple announced the A7 SoC, one of the slides said it had a “64-bit desktop-class architecture” — and now we know that wasn’t just marketing hyperbole. Where Swift was very similar to Krait and other mobile ARM cores, Cyclone is a big departure from the usual thin-and-light approach of building mobile CPUs.

Apple A7 SoC slide, showing "desktop-class" architecture
Apple A7 SoC slide, showing “desktop-class” architecture

The question, of course, is why. Much like octa-core mobile chips, there simply aren’t many mobile applications that can take advantage of a big, hot CPU core. This will change eventually, as battery tech improves and mobile computing continues to grow in popularity, but this won’t be a short-term thing.

So, perhaps a better question to ask is what’s Apple’s long-term plan for its A-series SoCs? Presumably the A8, which should debut with the iPhone 6 in September, will be big, wide, powerful, and power hungry as well. If the A8 makes the jump to 20nm at TSMC, which is likely, we can expect a clock speed bump, and other refinements that will further improve performance. It’s worth noting that, despite being a big core, Cyclone doesn’t appear to consume any more power than Intel’s Silvermont or Qualcomm’s Krait — probably because it’s clocked slower, and because its beefy performance allows it to finish tasks more quickly, and thus enter a low power state sooner — aka “race to sleep.”

Still, though, why the sudden shift towards a big core, when everyone else is still focusing on smaller cores? The only sensible answer, in my opinion, is that Apple is thinking far ahead to the future. It’s clear that more and more of our computing time is being spent on smartphones and tablets, so it stands to reason that more complex, classically desktop-oriented tasks will slowly make the jump to mobile. Imagine if Adobe released some kind of iOS app that processed massive 20-megapixel Raw images from your DSLR — suddenly, Cyclone and its successors make a lot of sense.
Or, of course, maybe Apple is eventually planning to use its A-series chips in its MacBooks as well — a possibility that I discussed way back in 2011. Apple did describe the A7 as “desktop-class” after all. Watch out, Intel!

No comments:

Post a Comment