Original Link: https://www.anandtech.com/show/1910



Intel's move to their 65nm process has gone extremely well.  We've had 65nm Presler, Cedar Mill and Yonah samples for the past couple of months now and they have been just as good as final, shipping silicon.  Just a couple of months ago we previewed Intel's 65nm Pentium 4 and showcased their reduction in power consumption as well as took an early look at overclocking potential of the chips. 

Intel's 65nm Pentium 4s will be the last Pentium 4s to come out of Santa Clara and while we'd strongly suggest waiting to upgrade until we've seen what Conroe will bring us, there are those who can't wait another six months, and for those who are building or buying systems today, we need to find out if Intel's 65nm Pentium 4 processors are any more worthwhile than the rather disappointing chips that we had at 90nm. 

The move to 90nm for Intel was highly anticipated, but it could not have been any more disappointing from a performance standpoint.  In a since abandoned quest for higher clock speeds, Intel brought us Prescott at 90nm with its 31 stage pipeline - up from 20 stages in the previous generation Pentium 4s.  Through some extremely clever and effective engineering, Prescott actually wasn't any slower than its predecessors, despite the increase in pipeline stages.  What Prescott did leave us with, however, was a much higher power bill.  Deeply pipelined processors generally consume a lot more power, and Prescott did just that. 

Intel tried to minimize the negative effects of Prescott as much as possible through technologies like their Enhanced Intel SpeedStep (EIST).  However, at the end of the day, the fastest Athlon 64 consumed less power under full load than the slowest Prescott at idle.  Considering that most PCs actually spend the majority of their time idling, this was truly a letdown from Intel. 

With 65nm, the architecture of the chips won't change at all - in fact, the single-core 65nm Pentium 4s based on the Cedar Mill core will be identical to the current Pentium 4 600 series that we have today (with the inclusion of Intel's Virtualization Technology).  So with no architectural changes, the power consumption at 65nm should be lower than at 90nm.  As we found in our first article on Intel's 65nm chips, power consumption did indeed go down quite a bit; however, it's still not low enough to be better than AMD.  It will take Conroe before Intel can offer a desktop processor with lower power consumption than AMD's 90nm Athlon 64 line. 

In an odd move, just before the end of 2005, Intel is introducing their first 65nm processor.  Not the Cedar Mill based Pentium 4 and not even the Presler based Pentium D, but rather the Presler based Pentium Extreme Edition 955. 

The Presler core is Intel's dual-core 65nm successor to Smithfield, which as you will remember was Intel's first dual-core processor.  Presler does actually offer one architectural improvement over Smithfield and that is the use of a 2MB L2 cache per core, up from 1MB per core in Smithfield.  Other than that, Presler is pretty much a die-shrunk version of Smithfield. 

With 2MB cache on each core, the transistor count of Presler has gone up a bit.  While Smithfield weighed in at a whopping 230M transistors, Presler is now up to 376M.  The move to 65nm has actually made the chip smaller at 162 mm2, down from 206 mm2.  With a smaller die size, Presler is actually cheaper for Intel to make than Smithfield, despite having twice the cache.  Equally impressive is that Cedar Mill, the single core version, measures in at a meager 81 mm2

The Extreme Edition incarnation of Presler brings back support for the 1066MHz FSB, which you may remember was lost with the original move to dual-core.  Given that both cores on the chip have to share the same bus, more FSB bandwidth will always help performance.

The Pentium Extreme Edition 955 runs at 3.46GHz (1066MHz FSB), thus giving it a clock speed advantage over all of Intel's other dual-core processors.  And as always, the EE chip offers Hyper Threading support on each of its two cores allowing the chip to handle a maximum of four threads at the same time.  Since it's an Extreme Edition chip, the 955 will be priced at $999.  If you're curious about the cheaper, non-Extreme versions of Presler, here is Intel's 65nm dual-core roadmap for 2006:

Intel Dual Core Desktop
CPU Core Clock FSB L2 Cache
??? Conroe ??? ??? 4MB
??? Conroe ??? ??? 2MB
950 Presler 3.4GHz 800MHz 2x2MB
940 Presler 3.2GHz 800MHz 2x2MB
930 Presler 3.0GHz 800MHz 2x2MB
920 Presler 2.8GHz 800MHz 2x2MB

As you can see, the Extreme Edition 955 will be the first, but definitely not the only dual-core 65nm processor out in the near future, so don't let the high price tag worry you. The remaining 900 series Pentium D chips should come with prices much closer to the equivalent 800 series.



Power Consumption and The Test

When we went to go measure power consumption on our Pentium EE 955 platform, we were met with some extremely troubling results.  Not only did we not see the power consumption figures that we originally saw with Presler and Cedar Mill a couple of months back, but power consumption was actually higher at 65nm than it was at 90nm.  We contacted Intel and were assured that the problem had to do with an issue with our motherboard and a new motherboard is en-route to us now.  When we do receive the new motherboard, we will take a look at power consumption once more to get an idea of the final state of Intel's 65nm power consumption, but until then, we don't want to draw any conclusions based on what we've seen. 

The Test

CPU: AMD Athlon 64 X2 4800+ (2.4GHz/1MBx2)
AMD Athlon 64 X2 3800+ (2.0GHz/512KBx2)
AMD Athlon 64 FX-57 (2.8GHz/1MB)
Intel Pentium Extreme Edition 955 (3.46GHz/2MBx2)
Intel Pentium Extreme Edition 840 (3.2GHz/1MBx2)
Intel Pentium D 820 (2.8GHz/1MBx2)
Motherboard: ASUS A8N-SLI Deluxe
Intel BadAxe 975X
Motherboard BIOS: ASUS: Version 1013 Dated 08/10/2005
Chipset: NVIDIA nForce4 SLI
Intel 975X
Chipset Drivers: nForce4 6.70
Intel 7.0.0.1020
Memory: OCZ PC3500 DDR 2-2-2-7
DDR2-667 5-5-5-15
Video Card: ATI Radeon X1800 XT
Video Drivers: ATI Catalyst 5.13
Desktop Resolution: 1280 x 1024 - 32-bit @ 60Hz
OS: Windows XP Professional SP2



Literally Dual Core

One of the major changes with Presler is that unlike Smithfield, the two cores are not a part of the same piece of silicon. Instead, you actually have a single chip with two separate die on it.  By splitting the die in two, Intel can reduce total failure rates and even be far more flexible with their manufacturing (since one Presler chip is nothing more than two Cedar Mill cores on a single package). 


The chip at the bottom of the image is Presler; note the two individual cores.

Intel's architecture, featuring no on-die memory controller, allows for such a split to be made without any major changes.  Even on Smithfield, all traffic between the cores actually had to travel out one core, off the chip and onto the external FSB and then back into the other core.  With Presler, the same type of communication can take place without any disruptions. The only difference is that the data from core to core has a slightly longer distance to travel. 

In order to find out if there was an appreciable increase in core-to-core communication latency, we used a tool called Cache2Cache, which Johan first used in his series on multi-core processors.  Johan's description of the utility follows:
"Michael S. started this extremely interesting thread at the Ace's hardware Technical forum. The result was a little program coded by Michael S. himself, which could measure the latency of cache-to-cache data transfer between two cores or CPUs. In his own words: "it is a tool for comparison of the relative merits of different dual-cores."

"Cache2Cache measures the propagation time from a store by one processor to a load by the other processor. The results that we publish are approximately twice the propagation time. For those interested, the source code is available here."
Armed with Cache2Cache, we looked at the added latency seen by Presler over Smithfield:

   Cache2Cache Latency in ns (Lower is Better)
AMD Athlon 64 X2 4800+ 101
Intel Smithfield 2.8GHz 253.1
Intel Presler 2.8GHz 244.2

Not only did we not find an increase in latency between the two cores on Presler, communication actually occurs faster than on Smithfield.  We made sure that it had nothing to do with the faster FSB by clocking the chip at 2.8GHz with an 800MHz FSB and repeated the tests only to find consistent results. 

We're not sure why, but core-to-core communication is faster on Presler than on Smithfield.  That being said, a difference of less than 9ns just isn't going to be noticeable in the real world - given that we've already seen that the Athlon 64 X2's 100ns latency doesn't really help it scale better when going from one to two cores.



Larger L2, but no increase in latency?
When Prescott first got a 2MB L2 cache, we noticed that along with a larger L2 came a 17% increase in access latency.  The end result was a mixed bag of performance, with some applications benefitting from the larger cache while others were hampered by the increase in L2 latency.  Overall, the end result was that the two performance elements balanced each other out and Prescott 2M generally offered no real performance improvement over the 1MB version. 

With Presler, each core also gets an upgraded 2MB cache, as compared to the 1MB L2 cache found in Smithfield.  The upgrade is similar to what we saw with Prescott, so we assumed that along with a larger L2 cache per core, Presler's L2 cache also received an increase in L2 cache latency over Smithfield. 

In order to confirm, we ran ScienceMark 2.0 and Cachemem:

   Cachemem L2 Latency (128KB block, 64-byte stride)  ScienceMark L2 Latency (64-byte stride)
AMD Athlon 64 X2 4800+ 17 cycles 17 cycles
Intel Smithfield 2.8GHz 27 cycles 27 cycles
Intel Presler 2.8GHz 27 cycles 27 cycles
Intel Prescott 2M 27 cycles 27 cycles
Intel Prescott 1M 23 cycles 23 cycles

What we found was extremely interesting; however, Presler does have the same 27 cycle L2 cache as Prescott 2M, but so does Smithfield.  We simply took for granted that Smithfield was nothing more than two Prescott 1M cores put together, but this data shows us that Smithfield actually had the same higher latency L2 cache as Prescott 2M.  

Although we were expecting Presler to give us a higher latency L2 over Smithfield, it looks like Smithfield actually had a higher latency L2 to begin with.  This means that, at the same clock speed, Presler will be at least as fast as Smithfield, if not faster.  Normally, we take for granted that a new core means better performance, but Intel has let us down in the past; luckily, this time we're not put in such a situation. 



Presler vs. Smithfield - A Brief Look

Other than the larger L2 cache, Presler as incorporated in the Pentium Extreme Edition 955 provides us with two more enhancements over Smithfield: 1066MHz FSB support and a higher clock speed (3.46GHz).

We wanted to isolate the performance improvement due to the larger L2 cache aside from the other improvements to Presler, so we underclocked our sample and its FSB, and compared it to a Pentium D 820 (2.8GHz). 

Looking at a small subset of our tests, we can get a feel for where you can expect the largest performance gains due simply to the increase in L2 cache size.  Remember that since L2 access latency on Smithfield was already at 27 cycles, Presler's cache isn't any slower, so what we end up measuring is how large of an impact a 2MB cache has in some of our benchmarks. 

 Winstone   Business Winstone 2004  Multimedia Content Creation Winstone 2004
Presler 19.0 30.2
Smithfield 18.5 29.9

Under Business Winstone 2004, we see a boost of just under 3%, thanks to the larger cache size.  We have seen the biggest improvements in Winstone, thanks to lower latency caches and higher clock speeds, so it's not too much of a surprise to see a minimal impact here.  Content Creation Winstone 2004 shows no real performance impact either. 

 Media Encoding  3dsmax 7 Composite DVD Shrink WME9 H.264 iTunes
Presler 2.03 9.1m 31.3fps 10.5m 50s
Smithfield 2.05 8.9m 31.0fps 10.5m 50s

Our 3D rendering, video encoding and audio encoding tests basically all agree with the earlier results - the added cache doesn't really improve performance here, but that's to be expected, given the nature of the applications (and the already quite large 1MB L2 cache to which we are comparing). 

 Gaming   Battlefield 2  Call of Duty 2 Quake 4
Presler 77.3 76.2 130.6
Smithfield 73.0 75.6 125.5

It isn't until we look at some of our 3D gaming tests that we start to see some more tangible performance gains.  In games, there are some decent performance improvements to be had, ranging anywhere from 0 to just under 6%, thanks to the larger cache alone. 

Couple the larger cache with a faster FSB and higher clock speed, and the Pentium Extreme Edition 955 is shaping up to be a decent improvement over its predecessor. 



Multi-core support in Games?

Both Quake 4 and Call of Duty 2 now have SMP support, supposedly offering performance improvements on dual core and/or Hyper Threading enabled processors. 

For Call of Duty 2, you simply install the new patch and off you go; SMP support is enabled.  To verify, we ran our CoD 2 benchmark and kept a log of the total processor utilization over time.  Below is a shot of perfmon with a fresh install of CoD2 (sans SMP patch):

Note how the total CPU utilization for our dual-core testbed hovers right around 50%, with the maximum being just under 52% (the remaining 2% can be attributed to driver and other overhead that can eat up extra CPU cycles). 

Now, let's look at CoD2 CPU utilization with the SMP patch installed:

While the average CPU utilization only goes up by around 9%, the maximum CPU utilization increases tremendously, now up to 83%, showing us that the second core is being used. 

We looked at performance at 1024x768 and obviously the higher the resolution, the lesser the impact of a faster CPU (at the same time, the lower the resolution, the greater the impact will be as the game becomes less GPU limited). 

To ensure a fair comparison, we tested using the SMP patch and simply disabled SMP manually by setting the r_smp_backend variable to "0".  We confirmed that SMP support was actually disabled by running perfmon and measuring CPU utilization. 

 Call of Duty 2    SMP Disabled SMP Enabled
AMD Athlon 64 FX-57 (2.8GHz) 80.6 N/A
AMD Athlon 64 X2 4800+ (2.4GHz) 79.8 70.3
AMD Athlon 64 X2 3800+ (2.0GHz) 78.7 68.1
Intel Pentium Extreme Edition 955 (3.46GHz) 79.8 68.4
Intel Pentium Extreme Edition 840 (3.2GHz) 78.1 68
Intel Pentium D 820 (2.8GHz) 75.6 67.1

Surprisingly enough, we actually saw pretty large performance drops in CoD2 with SMP enabled across both AMD and Intel platforms.  This is unfortunate, but the withdrawn SMP support of Quake 3 makes it less than shocking. We do expect that things will get better as time goes on. 

Quake 4 was a different story; with r_useSMP enabled, we saw some extremely large performance gains with the move to dual core:

 Quake 4    SMP Disabled SMP Enabled
AMD Athlon 64 FX-57 (2.8GHz) 115.4 N/A
AMD Athlon 64 X2 4800+ (2.4GHz) 114.9 147.4
AMD Athlon 64 X2 3800+ (2.0GHz) 100.9 143.2
Intel Pentium Extreme Edition 955 (3.46GHz) 98.9 142.3
Intel Pentium Extreme Edition 840 (3.2GHz) 89.0 133.6
Intel Pentium D 820 (2.8GHz) 80.6 125.5

The SMP patch either only spawns two threads, or the instruction mix of Quake 4 with the patch does not mix well with Intel's Pentium EE 955.  The dual core with Hyper Threading enabled platform didn't do anything at all for performance. 

While we're only looking at two games, this is a start for multithreaded game development.  You can expect to see a lot of examples where dual-core does absolutely nothing for gaming, but as time goes on, the situation will change. 



Dual Core and Hyper Threading: Detriment or Not?

A question that we've always had is whether or not the inclusion of Hyper Threading support on Intel's dual-core Extreme Edition processors actually improves performance.  To answer that question, we have to look at two separate situations: multithreaded application performance and multitasking performance. 

For multithreaded application performance, we can now turn to a number of benchmarks.  We'll start off with 3dsmax 7 (higher numbers are better for the composite score, lower numbers are better for the rest of the numbers):

 3dsmax 7   Composite Score 3dsmax 5 rays CBALLS2 SinglePipe2 UnderWater
HT Enabled 3.0 12.922s 17.297s 83.515s 119.641s
HT Disabled 2.51 14.937s 21.141s 102.734s 141.641s

Here, the performance advantage is clear - enabling Hyper Threading provides Intel with another 14-19% over the base dual core Presler.  The same applies to almost all of the media encoding tests (if minutes or seconds are specified, lower numbers mean better performance):

 Media Encoding  DVD Shrink WME9 H.264 iTunes
HT Enabled 7.1m 46.5fps 9.96m 38s
HT Disabled 8.0m 38.6fps 8.53m 40s

Our Quicktime 7 H.264 encoding test is, generally speaking, an outlier from what we've seen of the impact of HT on multithreaded applications.  The rest of the applications show a clear benefit to being able to execute four threads simultaneously, even if the execution resources of the cores are shared with the remaining two threads. 

Armed with the latest SMP patches for Call of Duty 2 and Quake 4 (SMP was enabled in both games), we can also take a look at the impact of HT on Presler:

 Gaming   Call of Duty 2 Quake 4
HT Enabled 68.4 142.3
HT Disabled 69.3 142.3

Call of Duty 2 is another example where HT actually reduces performance, but given that enabling SMP itself reduces performance, we'd venture a guess that you shouldn't really be drawing any conclusions based on its data.  Quake 4, on the other hand, shows no difference in performance with SMP on or off. 

From what we've seen, with most individual multithreaded applications, enabling HT will improve performance even if, you have a dual core processor.  The degree of performance improvement will vary from application to application, but generally speaking, it's going to be positive (if anything at all). 

The more interesting situation is what happens when you're multitasking - does Hyper Threading really help on top of the inherent benefits of a dual core processor?  To find out, we put together a couple of multitasking scenarios aided by a tool that Intel provided us to help all of the applications start at the exact same time.  We're not necessarily concerned with the actual performance of these applications, but rather with the impact that the number of simultaneous applications has on each other and how that varies with HT being enabled or not. 

We took five applications (Grisoft AVG Anti-Virus 7, Lame MP3 Encoder 3.97a, Windows Media Encoder 9, Info-ZIP extraction utility and Splinter Cell: Chaos Theory) and used various combinations of them to try to figure out if there are multitasking benefits to a dual core processor with Hyper Threading enabled.  Note that some of these applications are multithreaded themselves, so just because we chose five applications doesn't mean that there are only five threads of execution; in reality, there are many more. 

We tested four different scenarios:
  1. A virus scan + MP3 encode
  2. The first scenario + a Windows Media encode
  3. The second scenario + unzipping files, and
  4. The third scenario + our Splinter Cell: CT benchmark.
The graph below compares the total time in seconds for all of the timed tasks (everything but Splinter Cell) to complete during the tests:

 AMD Athlon 64 X2 4800+   AVG LAME WME ZIP Total
AVG + LAME 22.9s 13.8s     36.7s
AVG + LAME + WME 35.5s 24.9s 29.5s   90.0s
AVG + LAME + WME + ZIP 41.6s 38.2s 40.9s 56.6s 177.3s
AVG + LAME + WME + ZIP + SCCT 42.8s 42.2s 46.6s 65.9s 197.5s

 Intel Pentium EE 955 (no HT)   AVG LAME WME ZIP Total
AVG + LAME 24.8s 13.7s     38.5s
AVG + LAME + WME 39.2s 22.5s 32.0s   93.7s
AVG + LAME + WME + ZIP 47.1s 37.3s 45.0s 62.0s 191.4s
AVG + LAME + WME + ZIP + SCCT 40.3s 47.7s 58.6s 83.3s 229.9s

 Intel Pentium EE 955 (HT Enabled)   AVG LAME WME ZIP Total
AVG + LAME 25.0s 13.3s     38.3s
AVG + LAME + WME 34.4s 21.6s 30.2s   86.2s
AVG + LAME + WME + ZIP 41.5s 28.1s 37.7s 54.2s 161.5s
AVG + LAME + WME + ZIP + SCCT 51.4s 33.0s 45.3s 71.1s 200.8s

As you can see, the Presler setup with HT enabled takes less time to complete the tasks as soon as you get beyond two simultaneous applications than the Presler system without HT enabled.  However, including the Athlon 64 X2 4800+ in the picture, we see that despite only being able to execute two threads at the same time, it does just as good of a job as the Presler HT system that can execute twice as many threads.  But to get the full picture, we have to measure one last data point: Splinter Cell performance. 

In the fourth scenario, we ran a total of five applications: AVG, Lame, WME, InfoZip and Splinter Cell.  The first four applications took a total of 197.5 seconds to complete on the Athlon 64 X2 4800+ system, ever so slightly quicker than the 200.8 seconds of the Presler HT system.  However, that does not take into account Splinter Cell performance - now let's see how our fifth application fared:

 Splinter Cell: CT   Average Min Max
Intel Pentium EE 955 (no HT) 71.0 fps 27.8 fps 128.1 fps
Intel Pentium EE 955 (HT enabled) 77.2 fps 32.5 fps 139.6 fps
AMD Athlon 64 X2 4800+ 66.9 fps 10.5 fps 185.0 fps

The Athlon 64 X2 4800+ actually is faster in the Splinter Cell: CT benchmark without anything else running, but here we see a very different story.  Although its 66 fps average frame rate is reasonably competitive with the Presler HT system, its minimum frame rate is barely over 10 fps - approximately 1/3 that of the Presler HT. 

While the regular Presler setup without HT managed to pull in higher frame rates than the AMD system, it did so while performing significantly worse in the remaining four applications.  The Presler HT vs. Athlon 64 X2 comparison is important because the two are virtually tied in the performance of the first four applications - but juggling all five of the applications is better done on the Presler HT system. 

We would say that if implemented properly, the benefits of a SMT system like Hyper Threading are definitely a good companion to a dual core desktop processor.  The usable limit, even for today's applications and usage models, is far from just two threads.  



Overall Performance using Winstone 2004

Business Winstone 2004

Business Winstone 2004 tests the following applications in various usage scenarios:
. Microsoft Access 2002
. Microsoft Excel 2002
. Microsoft FrontPage 2002
. Microsoft Outlook 2002
. Microsoft PowerPoint 2002
. Microsoft Project 2002
. Microsoft Word 2002
. Norton AntiVirus Professional Edition 2003
. WinZip 8.1

Business Winstone 2004

The Pentium EE 955 does a little better than the previous generation Extreme Edition, but AMD continues to be the dominant performer here. Presler's 27 cycle L2 cache doesn't exactly help it out here, so it's not much of a surprise.

Multimedia Content Creation Winstone 2004

Multimedia Content Creation Winstone 2004 tests the following applications in various usage scenarios:
. Adobe® Photoshop® 7.0.1
. Adobe® Premiere® 6.50
. Macromedia® Director MX 9.0
. Macromedia® Dreamweaver MX 6.1
. Microsoft® Windows MediaTM Encoder 9 Version 9.00.00.2980
. NewTek's LightWave® 3D 7.5b
. SteinbergTM WaveLabTM 4.0f
All chips were tested with Lightwave set to spawn 4 threads.

Multimedia Content Creation Winstone 2004

Once again, the EE 955 offers a performance improvement over the EE 840, but at best, it is equal to the performance of the Athlon 64 X2 3800+.



Overall Performance using SYSMark 2004

Office Productivity SYSMark 2004

SYSMark's Office Productivity suite consists of three tests, the first of which is the Communication test. The Communication test consists of the following:
"The user receives an email in Outlook 2002 that contains a collection of documents in a zip file. The user reviews his email and updates his calendar while VirusScan 7.0 scans the system. The corporate web site is viewed in Internet Explorer 6.0. Finally, Internet Explorer is used to look at samples of the web pages and documents created during the scenario."
The next test is Document Creation performance:
"The user edits the document using Word 2002. He transcribes an audio file into a document using Dragon NaturallySpeaking 6. Once the document has all the necessary pieces in place, the user changes it into a portable format for easy and secure distribution using Acrobat 5.0.5. The user creates a marketing presentation in PowerPoint 2002 and adds elements to a slide show template."
The final test in our Office Productivity suite is Data Analysis, which BAPCo describes as:
"The user opens a database using Access 2002 and runs some queries. A collection of documents are archived using WinZip 8.1. The queries' results are imported into a spreadsheet using Excel 2002 and are used to generate graphical charts."

SYSMark 2004 Office Productivity Overall

The EE 955 is far more competitive here, virtually offering the same performance as the Athlon 64 FX-57 and X2 4800+.

ICC SYSMark 2004

The first category that we will deal with is 3D Content Creation. The tests that make up this benchmark are described below:
"The user renders a 3D model to a bitmap using 3ds max 5.1, while preparing web pages in Dreamweaver MX. Then the user renders a 3D animation in a vector graphics format."
Next, we have 2D Content Creation performance:
"The user uses Premiere 6.5 to create a movie from several raw input movie cuts and sound cuts and starts exporting it. While waiting on this operation, the user imports the rendered image into Photoshop 7.01, modifies it and saves the results. Once the movie is assembled, the user edits it and creates special effects using After Effects 5.5."
The Internet Content Creation suite is rounded up with a Web Publishing performance test:
"The user extracts content from an archive using WinZip 8.1. Meanwhile, he uses Flash MX to open the exported 3D vector graphics file. He modifies it by including other pictures and optimizes it for faster animation. The final movie with the special effects is then compressed using Windows Media Encoder 9 series in a format that can be broadcast over broadband Internet. The web site is given the final touches in Dreamweaver MX and the system is scanned by VirusScan 7.0."

SYSMark 2004 Internet Content Creation Overall

In the ICC tests, the EE 955 is once again quite competitive, but unable to outperform the Athlon 64 X2 4800+. The performance improvement over the previous-gen Extreme Edition is a respectable 9%.

SYSMark 2004 Overall

Overall SYSMark performance appears to be a virtual tie between the Pentium Extreme Edition 955 and the Athlon 64 X2 4800+.



Overall Performance using WorldBench 5

Our final set of overall system performance tests come from WorldBench 5, which is a pretty good tool for looking at older application performance as well as single-threaded performance. 

WorldBench 5 Overall

WorldBench performance is somewhere in-between what we saw with Winstone and SYSMark, with the Pentium Extreme Edition 955 falling behind the Athlon 64 X2 4800+. The majority of the applications in WorldBench are single threaded, so the 4-thread advantage of the Pentium EE 955 isn't able to flex its muscle at all. Not to mention that many of these applications favor the performance of AMD's shorter pipelined architecture.



3D Rendering Performance using 3dsmax 7

Once again, we're using an updated version of the SPECapc 3dsmax test for version 7 of the application.  The scenes being rendered haven't actually changed, but the reference numbers used to compute the composite scores have, so these scores aren't directly comparable to results from earlier SPECapc tests.

3dsmax 7 - SPECapc Benchmark

The Pentium Extreme Edition 955 holds a slight 5% performance advantage over the Athlon 64 X2 4800+, which gives it the lead in our 3dsmax 7 test.



Media Encoding Performance using DVD Shrink, WME9, Quicktime and iTunes

First up is DVD Shrink 3.2.0.15. Our test was simple - we took a copy of Star Wars Episode VI and ripped the full DVD to the hard drive without compression, effectively giving us an exact copy of the disc on the hard drive.  Then, using the copy of the DVD on the hard drive (to eliminate any DVD drive bottlenecks), we performed a DVD shrink operation to shrink the movie to fit on a single 4.5GB DVD disc.  All of the options were left on their defaults, so the test ends up being pretty easy to run and reproduce.  The scores reported are DVD encoding times in minutes, with lower numbers meaning better performance. 

The DVD Shrink test is quite important as DVD Shrink is quite possibly one of the easiest tools to rip a DVD.  The easier a tool is to use, the more likely it's going to be used, and arguably, the more important performance using it happens to be. 

DVD Shrink 3.2.0.15

MPEG-2 encoding performance using DVD Shrink clearly favors the Pentium Extreme Edition; while the Athlon 64 X2 4800+ is competitive, it falls behind both EE chips.

Moving on, we have our Windows Media Encoder 9 test, which uses the advanced profile settings for video encoding.  We left all settings at their defaults and just proceeded with an MPEG-2 to WMV-HD conversion.  The values reported are in frames per second, with higher numbers being better.

Windows Media Encoder 9 - Advanced Profile

The situation under Windows Media Encoder 9 is no different, with the EE taking the lead once more.

Next up, we have Quicktime Pro 7.0.3 and we perform a MPEG-2 to H.264 encoding task.  All of the settings are left at their defaults, with the exception that we optimize the output file for download with a 256kbps data rate while leaving the resolution untouched.  We also adjust the video options to optimize for the best quality.  We report the transcoding time in minutes, with lower values being better. 

H.264 Encoding with Quicktime Pro 7.0.3

The tables are turned under Quicktime 7, with the Athlon 64 X2s taking the lead and pushing the Pentium EE 955 to third place.

Finally, we have a MP3 encoding test using iTunes 6.0.1.3.  For this test, we simply took a 304MB wav file and converted it to a 192kbps MP3 file, measuring the encode time in seconds.  The only iTunes option that we changed was to prevent the playback of the song while encoding. 

MP3 Encoding with iTunes 6.0.1.3

MP3 encoding performance is a close race between the Athlon 64 X2 4800+ and the EE 955, with AMD taking the slight lead.



Gaming Performance using Battlefield 2, Call of Duty 2 and Quake 4

Gaming performance is pretty respectable for the Pentium EE 955, with the chip being quite competitive with AMD's Athlon 64 X2 4800+.

The most interesting thing we found is that even with a high end GPU like the Radeon X1800 XT, a number of games are still quite GPU limited even at 1024x768, which is why you don't see F.E.A.R. and Splinter Cell: CT here. Even some of the games that we did include required us to turn down some of the detail settings to start to stress the CPUs.

The pendulum often swings between games being CPU and GPU limited, and it seems that with the latest generation of games, we are definitely more GPU limited.

Battlefield 2

Call of Duty 2

Quake 4



Final Words

The Pentium Extreme Edition 955 finally starts to bring some respectable performance to Intel's high end processors, but there is no clear cut victory. In applications and usage scenarios where the EE's ability to execute four threads simultaneously comes into play, it generally can remain quite competitive with the Athlon 64 X2 4800+. However, looking at older applications, single threaded scenarios and some multithreaded applications that aren't optimized for more than two threads, the EE 955 falls significantly behind.

There are a few other conclusions that we can draw based on what we've seen thus far. For starters, Hyper Threading is quite important to the performance of the Extreme Edition 955. While it isn't always perfect, when under very heavy multitasking loads, the ability to execute more threads translates into better overall performance for the entire system.

We've also been able to take an early look at the state of multithreaded game development, through the latest Call of Duty 2 and Quake 4 patches. Although the performance in CoD2 was terrible in SMP mode, Quake 4 gave us some hope, with performance gains approaching the 50% mark on dual core processors at CPU bound resolutions.

As far as the processor at hand is concerned, Intel has done a reasonable job with the Pentium EE 955, but with Conroe not too far away, we just can't justify recommending it. If you absolutely must upgrade today, the Athlon 64 X2 is still probably going to be a better bang for your buck. However, as we have seen in the benchmarks, there are advantages to being able to execute four threads simultaneously.

It is pretty much a toss-up at this point, but we'd recommend sticking with AMD for now and re-evaluating Intel's offerings when Conroe arrives. If all goes well, we will have a cooler running, faster processor with Conroe that may provide some even tougher competition for AMD's Athlon 64 X2.

While we're not emphatically recommending Intel's latest and greatest, we are impressed with Intel's transition to 65nm thus far. If Intel can use Cedar Mill and Presler to ramp up their 65nm process, hopefully it will be primed and ready for Conroe's introduction later this year. From what we've seen of Yonah, Intel does have their work cut out for them in order to truly regain the performance crown with Conroe, but anything is possible. A successful migration to 65nm would be a definite step in the right direction for Intel.

More than anything, we're hoping not to be disappointed by Conroe. We vividly remember recommending to wait for the original Pentium 4's release and then once more for Prescott's release, and both times being terribly disappointed by Intel's decisions. Let's hope that with the Pentium M team at the helm, Conroe's introduction will be a change of tradition for Intel.

Log in

Don't have an account? Sign up now