Intel's Atom Architecture: The Journey Begins
by Anand Lal Shimpi on April 2, 2008 12:05 AM EST- Posted in
- CPUs
It Does Multiple Threads Though: The Case for SMT
Despite being 2-issue, it's not always easy to execute two instructions from a single thread in parallel due to data dependencies between the two. Intel's solution to this problem was to enable SMT (Simultaneous Multi-Threading) on Atom (not all models unfortunately) to allow the concurrent execution of up to two threads. Welcome the return of Hyper Threading.
Remember the rule of thumb for power/performance tradeoffs? Intel's decision to enable SMT on Atom was the perfect example of just that. SMT increased power consumption by less than 20% on Atom, however it also yielded a 30 - 50% increase in performance on the in-order core. The decision couldn't be easier.
The Atom has a 32-entry instruction scheduling queue, but when running with SMT enabled each thread has its own 16-entry queue. The scheduler doesn't have to switch between threads each clock, it can do so intelligently, the only limitation is that it can only dispatch 2 ops per clock (since it is a 2-wide machine). If one thread is waiting on data to complete an instruction, on the next clock tick the scheduler can choose to dispatch an op from a separate thread that will hopefully be able to execute.
Making Atom multithreaded made perfect sense from a logical standpoint. The downside to an in-order core is that if there is an instruction that is waiting on data to begin execution the rest of the pipeline stalls while that dependency is resolved. The chances that you'll have two independent instructions from two independent threads both with misses in cache is highly unlikely.
Execution Units
Atom isn't a superwide processor, with an in-order front end and no on-die memory controller it's unlikely that we'll see tremendous instruction throughput. Data dependencies would do a good job of ensuring that tons of execution units remain idle, so Atom's designers did their best to only include the bare minimum when it came to execution units.
There's no dedicated integer multiplier or divider, these functions are shared with the SIMD FP units. There are two SSE units and the scheduler can dispatch either a float or an integer SIMD op to both ports in a given clock.
All of the functional units are 64-bits wide with the exception of supporting full width SIMD integer and single precision FP ADDs.
46 Comments
View All Comments
lopri - Thursday, April 3, 2008 - link
This article is as much propagana-ish as it is technical. Did you read the last page of the article?clnee55 - Friday, April 4, 2008 - link
Since Anand wrote this article. I let him answer your accusationGulWestfale - Wednesday, April 2, 2008 - link
i believe that the graphics core in the chipset is a powerVR gen5 derivative; intel already uses some of their tech in its existing mainboards and wikipedia states that intel has licensed gen5 tech for one of its chipsets, the GMA500 (which is the same as poulsbo?) gen5 is also DX10-capable, which matches the info in your article.http://en.wikipedia.org/wiki/PowerVR#Series_5_.28S...">http://en.wikipedia.org/wiki/PowerVR#Series_5_.28S...
yyrkoon - Wednesday, April 2, 2008 - link
and wikipedia has been known to be wrong . . . a lot lately it seems.My point here *is*, I would probably trust anandtech more than wikipedia now days, as it seems any Joe can put up a 'reference' without citation.
jones377 - Wednesday, April 2, 2008 - link
Following the references link from the Wiki article...http://www.imgtec.com/News/Release/index.asp?NewsI...">http://www.imgtec.com/News/Release/index.asp?NewsI...
Poulsbo uses a PowerVR 3D core
Anand Lal Shimpi - Wednesday, April 2, 2008 - link
Yep, you guys are correct, I wasn't aware that it was public yet :) I've updated the article.Take care,
Anand