
Original Link: https://www.anandtech.com/show/1910
Intel's Pentium Extreme Edition 955: 65nm, 4 threads and 376M transistors
by Anand Lal Shimpi on December 30, 2005 11:36 AM EST- Posted in
- CPUs
Intel's move to their 65nm process has gone extremely well. We've had 65nm Presler, Cedar Mill and Yonah samples for the past couple of months now and they have been just as good as final, shipping silicon. Just a couple of months ago we previewed Intel's 65nm Pentium 4 and showcased their reduction in power consumption as well as took an early look at overclocking potential of the chips.
Intel's 65nm Pentium 4s will be the last Pentium 4s to come out of Santa Clara and while we'd strongly suggest waiting to upgrade until we've seen what Conroe will bring us, there are those who can't wait another six months, and for those who are building or buying systems today, we need to find out if Intel's 65nm Pentium 4 processors are any more worthwhile than the rather disappointing chips that we had at 90nm.
The move to 90nm for Intel was highly anticipated, but it could not have been any more disappointing from a performance standpoint. In a since abandoned quest for higher clock speeds, Intel brought us Prescott at 90nm with its 31 stage pipeline - up from 20 stages in the previous generation Pentium 4s. Through some extremely clever and effective engineering, Prescott actually wasn't any slower than its predecessors, despite the increase in pipeline stages. What Prescott did leave us with, however, was a much higher power bill. Deeply pipelined processors generally consume a lot more power, and Prescott did just that.
Intel tried to minimize the negative effects of Prescott as much as possible through technologies like their Enhanced Intel SpeedStep (EIST). However, at the end of the day, the fastest Athlon 64 consumed less power under full load than the slowest Prescott at idle. Considering that most PCs actually spend the majority of their time idling, this was truly a letdown from Intel.
With 65nm, the architecture of the chips won't change at all - in fact, the single-core 65nm Pentium 4s based on the Cedar Mill core will be identical to the current Pentium 4 600 series that we have today (with the inclusion of Intel's Virtualization Technology). So with no architectural changes, the power consumption at 65nm should be lower than at 90nm. As we found in our first article on Intel's 65nm chips, power consumption did indeed go down quite a bit; however, it's still not low enough to be better than AMD. It will take Conroe before Intel can offer a desktop processor with lower power consumption than AMD's 90nm Athlon 64 line.
In an odd move, just before the end of 2005, Intel is introducing their first 65nm processor. Not the Cedar Mill based Pentium 4 and not even the Presler based Pentium D, but rather the Presler based Pentium Extreme Edition 955.
The Presler core is Intel's dual-core 65nm successor to Smithfield, which as you will remember was Intel's first dual-core processor. Presler does actually offer one architectural improvement over Smithfield and that is the use of a 2MB L2 cache per core, up from 1MB per core in Smithfield. Other than that, Presler is pretty much a die-shrunk version of Smithfield.
With 2MB cache on each core, the transistor count of Presler has gone up a bit. While Smithfield weighed in at a whopping 230M transistors, Presler is now up to 376M. The move to 65nm has actually made the chip smaller at 162 mm2, down from 206 mm2. With a smaller die size, Presler is actually cheaper for Intel to make than Smithfield, despite having twice the cache. Equally impressive is that Cedar Mill, the single core version, measures in at a meager 81 mm2.
The Extreme Edition incarnation of Presler brings back support for the 1066MHz FSB, which you may remember was lost with the original move to dual-core. Given that both cores on the chip have to share the same bus, more FSB bandwidth will always help performance.
The Pentium Extreme Edition 955 runs at 3.46GHz (1066MHz FSB), thus giving it a clock speed advantage over all of Intel's other dual-core processors. And as always, the EE chip offers Hyper Threading support on each of its two cores allowing the chip to handle a maximum of four threads at the same time. Since it's an Extreme Edition chip, the 955 will be priced at $999. If you're curious about the cheaper, non-Extreme versions of Presler, here is Intel's 65nm dual-core roadmap for 2006:
Intel Dual Core Desktop | ||||
CPU | Core | Clock | FSB | L2 Cache |
??? | Conroe | ??? | ??? | 4MB |
??? | Conroe | ??? | ??? | 2MB |
950 | Presler | 3.4GHz | 800MHz | 2x2MB |
940 | Presler | 3.2GHz | 800MHz | 2x2MB |
930 | Presler | 3.0GHz | 800MHz | 2x2MB |
920 | Presler | 2.8GHz | 800MHz | 2x2MB |
As you can see, the Extreme Edition 955 will be the first, but definitely not the only dual-core 65nm processor out in the near future, so don't let the high price tag worry you. The remaining 900 series Pentium D chips should come with prices much closer to the equivalent 800 series.
Power Consumption and The Test
When we went to go measure power consumption on our Pentium EE 955 platform, we were met with some extremely troubling results. Not only did we not see the power consumption figures that we originally saw with Presler and Cedar Mill a couple of months back, but power consumption was actually higher at 65nm than it was at 90nm. We contacted Intel and were assured that the problem had to do with an issue with our motherboard and a new motherboard is en-route to us now. When we do receive the new motherboard, we will take a look at power consumption once more to get an idea of the final state of Intel's 65nm power consumption, but until then, we don't want to draw any conclusions based on what we've seen.
The Test
CPU: | AMD Athlon 64 X2 4800+ (2.4GHz/1MBx2) AMD Athlon 64 X2 3800+ (2.0GHz/512KBx2) AMD Athlon 64 FX-57 (2.8GHz/1MB) Intel Pentium Extreme Edition 955 (3.46GHz/2MBx2) Intel Pentium Extreme Edition 840 (3.2GHz/1MBx2) Intel Pentium D 820 (2.8GHz/1MBx2) |
Motherboard: | ASUS A8N-SLI Deluxe Intel BadAxe 975X |
Motherboard BIOS: | ASUS: Version 1013 Dated 08/10/2005 |
Chipset: | NVIDIA nForce4 SLI Intel 975X |
Chipset Drivers: | nForce4 6.70 Intel 7.0.0.1020 |
Memory: | OCZ PC3500 DDR 2-2-2-7 DDR2-667 5-5-5-15 |
Video Card: | ATI Radeon X1800 XT |
Video Drivers: | ATI Catalyst 5.13 |
Desktop Resolution: | 1280 x 1024 - 32-bit @ 60Hz |
OS: | Windows XP Professional SP2 |
Literally Dual Core
One of the major changes with Presler is that unlike Smithfield, the two cores are not a part of the same piece of silicon. Instead, you actually have a single chip with two separate die on it. By splitting the die in two, Intel can reduce total failure rates and even be far more flexible with their manufacturing (since one Presler chip is nothing more than two Cedar Mill cores on a single package).
The chip at the bottom of the image is Presler; note the two individual cores.
In order to find out if there was an appreciable increase in core-to-core communication latency, we used a tool called Cache2Cache, which Johan first used in his series on multi-core processors. Johan's description of the utility follows:
"Michael S. started this extremely interesting thread at the Ace's hardware Technical forum. The result was a little program coded by Michael S. himself, which could measure the latency of cache-to-cache data transfer between two cores or CPUs. In his own words: "it is a tool for comparison of the relative merits of different dual-cores."Armed with Cache2Cache, we looked at the added latency seen by Presler over Smithfield:
"Cache2Cache measures the propagation time from a store by one processor to a load by the other processor. The results that we publish are approximately twice the propagation time. For those interested, the source code is available here."
Cache2Cache Latency in ns (Lower is Better) | |
AMD Athlon 64 X2 4800+ | 101 |
Intel Smithfield 2.8GHz | 253.1 |
Intel Presler 2.8GHz | 244.2 |
Not only did we not find an increase in latency between the two cores on Presler, communication actually occurs faster than on Smithfield. We made sure that it had nothing to do with the faster FSB by clocking the chip at 2.8GHz with an 800MHz FSB and repeated the tests only to find consistent results.
We're not sure why, but core-to-core communication is faster on Presler than on Smithfield. That being said, a difference of less than 9ns just isn't going to be noticeable in the real world - given that we've already seen that the Athlon 64 X2's 100ns latency doesn't really help it scale better when going from one to two cores.
Larger L2, but no increase in latency?
When Prescott first got a 2MB L2 cache, we noticed that along with a larger L2 came a 17% increase in access latency. The end result was a mixed bag of performance, with some applications benefitting from the larger cache while others were hampered by the increase in L2 latency. Overall, the end result was that the two performance elements balanced each other out and Prescott 2M generally offered no real performance improvement over the 1MB version.
With Presler, each core also gets an upgraded 2MB cache, as compared to the 1MB L2 cache found in Smithfield. The upgrade is similar to what we saw with Prescott, so we assumed that along with a larger L2 cache per core, Presler's L2 cache also received an increase in L2 cache latency over Smithfield.
In order to confirm, we ran ScienceMark 2.0 and Cachemem:
Cachemem L2 Latency (128KB block, 64-byte stride) | ScienceMark L2 Latency (64-byte stride) | |
AMD Athlon 64 X2 4800+ | 17 cycles | 17 cycles |
Intel Smithfield 2.8GHz | 27 cycles | 27 cycles |
Intel Presler 2.8GHz | 27 cycles | 27 cycles |
Intel Prescott 2M | 27 cycles | 27 cycles |
Intel Prescott 1M | 23 cycles | 23 cycles |
What we found was extremely interesting; however, Presler does have the same 27 cycle L2 cache as Prescott 2M, but so does Smithfield. We simply took for granted that Smithfield was nothing more than two Prescott 1M cores put together, but this data shows us that Smithfield actually had the same higher latency L2 cache as Prescott 2M.
Although we were expecting Presler to give us a higher latency L2 over Smithfield, it looks like Smithfield actually had a higher latency L2 to begin with. This means that, at the same clock speed, Presler will be at least as fast as Smithfield, if not faster. Normally, we take for granted that a new core means better performance, but Intel has let us down in the past; luckily, this time we're not put in such a situation.
Presler vs. Smithfield - A Brief Look
Other than the larger L2 cache, Presler as incorporated in the Pentium Extreme Edition 955 provides us with two more enhancements over Smithfield: 1066MHz FSB support and a higher clock speed (3.46GHz).
We wanted to isolate the performance improvement due to the larger L2 cache aside from the other improvements to Presler, so we underclocked our sample and its FSB, and compared it to a Pentium D 820 (2.8GHz).
Looking at a small subset of our tests, we can get a feel for where you can expect the largest performance gains due simply to the increase in L2 cache size. Remember that since L2 access latency on Smithfield was already at 27 cycles, Presler's cache isn't any slower, so what we end up measuring is how large of an impact a 2MB cache has in some of our benchmarks.
Winstone | Business Winstone 2004 | Multimedia Content Creation Winstone 2004 |
Presler | 19.0 | 30.2 |
Smithfield | 18.5 | 29.9 |
Under Business Winstone 2004, we see a boost of just under 3%, thanks to the larger cache size. We have seen the biggest improvements in Winstone, thanks to lower latency caches and higher clock speeds, so it's not too much of a surprise to see a minimal impact here. Content Creation Winstone 2004 shows no real performance impact either.
Media Encoding | 3dsmax 7 Composite | DVD Shrink | WME9 | H.264 | iTunes |
Presler | 2.03 | 9.1m | 31.3fps | 10.5m | 50s |
Smithfield | 2.05 | 8.9m | 31.0fps | 10.5m | 50s |
Our 3D rendering, video encoding and audio encoding tests basically all agree with the earlier results - the added cache doesn't really improve performance here, but that's to be expected, given the nature of the applications (and the already quite large 1MB L2 cache to which we are comparing).
Gaming | Battlefield 2 | Call of Duty 2 | Quake 4 |
Presler | 77.3 | 76.2 | 130.6 |
Smithfield | 73.0 | 75.6 | 125.5 |
It isn't until we look at some of our 3D gaming tests that we start to see some more tangible performance gains. In games, there are some decent performance improvements to be had, ranging anywhere from 0 to just under 6%, thanks to the larger cache alone.
Couple the larger cache with a faster FSB and higher clock speed, and the Pentium Extreme Edition 955 is shaping up to be a decent improvement over its predecessor.
Multi-core support in Games?
Both Quake 4 and Call of Duty 2 now have SMP support, supposedly offering performance improvements on dual core and/or Hyper Threading enabled processors.
For Call of Duty 2, you simply install the new patch and off you go; SMP support is enabled. To verify, we ran our CoD 2 benchmark and kept a log of the total processor utilization over time. Below is a shot of perfmon with a fresh install of CoD2 (sans SMP patch):
Now, let's look at CoD2 CPU utilization with the SMP patch installed:
We looked at performance at 1024x768 and obviously the higher the resolution, the lesser the impact of a faster CPU (at the same time, the lower the resolution, the greater the impact will be as the game becomes less GPU limited).
To ensure a fair comparison, we tested using the SMP patch and simply disabled SMP manually by setting the r_smp_backend variable to "0". We confirmed that SMP support was actually disabled by running perfmon and measuring CPU utilization.
Call of Duty 2 | SMP Disabled | SMP Enabled |
AMD Athlon 64 FX-57 (2.8GHz) | 80.6 | N/A |
AMD Athlon 64 X2 4800+ (2.4GHz) | 79.8 | 70.3 |
AMD Athlon 64 X2 3800+ (2.0GHz) | 78.7 | 68.1 |
Intel Pentium Extreme Edition 955 (3.46GHz) | 79.8 | 68.4 |
Intel Pentium Extreme Edition 840 (3.2GHz) | 78.1 | 68 |
Intel Pentium D 820 (2.8GHz) | 75.6 | 67.1 |
Surprisingly enough, we actually saw pretty large performance drops in CoD2 with SMP enabled across both AMD and Intel platforms. This is unfortunate, but the withdrawn SMP support of Quake 3 makes it less than shocking. We do expect that things will get better as time goes on.
Quake 4 was a different story; with r_useSMP enabled, we saw some extremely large performance gains with the move to dual core:
Quake 4 | SMP Disabled | SMP Enabled |
AMD Athlon 64 FX-57 (2.8GHz) | 115.4 | N/A |
AMD Athlon 64 X2 4800+ (2.4GHz) | 114.9 | 147.4 |
AMD Athlon 64 X2 3800+ (2.0GHz) | 100.9 | 143.2 |
Intel Pentium Extreme Edition 955 (3.46GHz) | 98.9 | 142.3 |
Intel Pentium Extreme Edition 840 (3.2GHz) | 89.0 | 133.6 |
Intel Pentium D 820 (2.8GHz) | 80.6 | 125.5 |
The SMP patch either only spawns two threads, or the instruction mix of Quake 4 with the patch does not mix well with Intel's Pentium EE 955. The dual core with Hyper Threading enabled platform didn't do anything at all for performance.
While we're only looking at two games, this is a start for multithreaded game development. You can expect to see a lot of examples where dual-core does absolutely nothing for gaming, but as time goes on, the situation will change.
Dual Core and Hyper Threading: Detriment or Not?
A question that we've always had is whether or not the inclusion of Hyper Threading support on Intel's dual-core Extreme Edition processors actually improves performance. To answer that question, we have to look at two separate situations: multithreaded application performance and multitasking performance.
For multithreaded application performance, we can now turn to a number of benchmarks. We'll start off with 3dsmax 7 (higher numbers are better for the composite score, lower numbers are better for the rest of the numbers):
3dsmax 7 | Composite Score | 3dsmax 5 rays | CBALLS2 | SinglePipe2 | UnderWater |
HT Enabled | 3.0 | 12.922s | 17.297s | 83.515s | 119.641s |
HT Disabled | 2.51 | 14.937s | 21.141s | 102.734s | 141.641s |
Here, the performance advantage is clear - enabling Hyper Threading provides Intel with another 14-19% over the base dual core Presler. The same applies to almost all of the media encoding tests (if minutes or seconds are specified, lower numbers mean better performance):
Media Encoding | DVD Shrink | WME9 | H.264 | iTunes |
HT Enabled | 7.1m | 46.5fps | 9.96m | 38s |
HT Disabled | 8.0m | 38.6fps | 8.53m | 40s |
Our Quicktime 7 H.264 encoding test is, generally speaking, an outlier from what we've seen of the impact of HT on multithreaded applications. The rest of the applications show a clear benefit to being able to execute four threads simultaneously, even if the execution resources of the cores are shared with the remaining two threads.
Armed with the latest SMP patches for Call of Duty 2 and Quake 4 (SMP was enabled in both games), we can also take a look at the impact of HT on Presler:
Gaming | Call of Duty 2 | Quake 4 |
HT Enabled | 68.4 | 142.3 |
HT Disabled | 69.3 | 142.3 |
Call of Duty 2 is another example where HT actually reduces performance, but given that enabling SMP itself reduces performance, we'd venture a guess that you shouldn't really be drawing any conclusions based on its data. Quake 4, on the other hand, shows no difference in performance with SMP on or off.
From what we've seen, with most individual multithreaded applications, enabling HT will improve performance even if, you have a dual core processor. The degree of performance improvement will vary from application to application, but generally speaking, it's going to be positive (if anything at all).
The more interesting situation is what happens when you're multitasking - does Hyper Threading really help on top of the inherent benefits of a dual core processor? To find out, we put together a couple of multitasking scenarios aided by a tool that Intel provided us to help all of the applications start at the exact same time. We're not necessarily concerned with the actual performance of these applications, but rather with the impact that the number of simultaneous applications has on each other and how that varies with HT being enabled or not.
We took five applications (Grisoft AVG Anti-Virus 7, Lame MP3 Encoder 3.97a, Windows Media Encoder 9, Info-ZIP extraction utility and Splinter Cell: Chaos Theory) and used various combinations of them to try to figure out if there are multitasking benefits to a dual core processor with Hyper Threading enabled. Note that some of these applications are multithreaded themselves, so just because we chose five applications doesn't mean that there are only five threads of execution; in reality, there are many more.
We tested four different scenarios:
- A virus scan + MP3 encode
- The first scenario + a Windows Media encode
- The second scenario + unzipping files, and
- The third scenario + our Splinter Cell: CT benchmark.
AMD Athlon 64 X2 4800+ | AVG | LAME | WME | ZIP | Total |
AVG + LAME | 22.9s | 13.8s | 36.7s | ||
AVG + LAME + WME | 35.5s | 24.9s | 29.5s | 90.0s | |
AVG + LAME + WME + ZIP | 41.6s | 38.2s | 40.9s | 56.6s | 177.3s |
AVG + LAME + WME + ZIP + SCCT | 42.8s | 42.2s | 46.6s | 65.9s | 197.5s |
Intel Pentium EE 955 (no HT) | AVG | LAME | WME | ZIP | Total |
AVG + LAME | 24.8s | 13.7s | 38.5s | ||
AVG + LAME + WME | 39.2s | 22.5s | 32.0s | 93.7s | |
AVG + LAME + WME + ZIP | 47.1s | 37.3s | 45.0s | 62.0s | 191.4s |
AVG + LAME + WME + ZIP + SCCT | 40.3s | 47.7s | 58.6s | 83.3s | 229.9s |
Intel Pentium EE 955 (HT Enabled) | AVG | LAME | WME | ZIP | Total |
AVG + LAME | 25.0s | 13.3s | 38.3s | ||
AVG + LAME + WME | 34.4s | 21.6s | 30.2s | 86.2s | |
AVG + LAME + WME + ZIP | 41.5s | 28.1s | 37.7s | 54.2s | 161.5s |
AVG + LAME + WME + ZIP + SCCT | 51.4s | 33.0s | 45.3s | 71.1s | 200.8s |
As you can see, the Presler setup with HT enabled takes less time to complete the tasks as soon as you get beyond two simultaneous applications than the Presler system without HT enabled. However, including the Athlon 64 X2 4800+ in the picture, we see that despite only being able to execute two threads at the same time, it does just as good of a job as the Presler HT system that can execute twice as many threads. But to get the full picture, we have to measure one last data point: Splinter Cell performance.
In the fourth scenario, we ran a total of five applications: AVG, Lame, WME, InfoZip and Splinter Cell. The first four applications took a total of 197.5 seconds to complete on the Athlon 64 X2 4800+ system, ever so slightly quicker than the 200.8 seconds of the Presler HT system. However, that does not take into account Splinter Cell performance - now let's see how our fifth application fared:
Splinter Cell: CT | Average | Min | Max |
Intel Pentium EE 955 (no HT) | 71.0 fps | 27.8 fps | 128.1 fps |
Intel Pentium EE 955 (HT enabled) | 77.2 fps | 32.5 fps | 139.6 fps |
AMD Athlon 64 X2 4800+ | 66.9 fps | 10.5 fps | 185.0 fps |
The Athlon 64 X2 4800+ actually is faster in the Splinter Cell: CT benchmark without anything else running, but here we see a very different story. Although its 66 fps average frame rate is reasonably competitive with the Presler HT system, its minimum frame rate is barely over 10 fps - approximately 1/3 that of the Presler HT.
While the regular Presler setup without HT managed to pull in higher frame rates than the AMD system, it did so while performing significantly worse in the remaining four applications. The Presler HT vs. Athlon 64 X2 comparison is important because the two are virtually tied in the performance of the first four applications - but juggling all five of the applications is better done on the Presler HT system.
We would say that if implemented properly, the benefits of a SMT system like Hyper Threading are definitely a good companion to a dual core desktop processor. The usable limit, even for today's applications and usage models, is far from just two threads.
Overall Performance using Winstone 2004
Business Winstone 2004
Business Winstone 2004 tests the following applications in various usage scenarios:
. Microsoft Access 2002
. Microsoft Excel 2002
. Microsoft FrontPage 2002
. Microsoft Outlook 2002
. Microsoft PowerPoint 2002
. Microsoft Project 2002
. Microsoft Word 2002
. Norton AntiVirus Professional Edition 2003
. WinZip 8.1
Multimedia Content Creation Winstone 2004
Multimedia Content Creation Winstone 2004 tests the following applications in various usage scenarios:
. Adobe® Photoshop® 7.0.1All chips were tested with Lightwave set to spawn 4 threads.
. Adobe® Premiere® 6.50
. Macromedia® Director MX 9.0
. Macromedia® Dreamweaver MX 6.1
. Microsoft® Windows MediaTM Encoder 9 Version 9.00.00.2980
. NewTek's LightWave® 3D 7.5b
. SteinbergTM WaveLabTM 4.0f
Overall Performance using SYSMark 2004
Office Productivity SYSMark 2004
SYSMark's Office Productivity suite consists of three tests, the first of which is the Communication test. The Communication test consists of the following:
"The user receives an email in Outlook 2002 that contains a collection of documents in a zip file. The user reviews his email and updates his calendar while VirusScan 7.0 scans the system. The corporate web site is viewed in Internet Explorer 6.0. Finally, Internet Explorer is used to look at samples of the web pages and documents created during the scenario."The next test is Document Creation performance:
"The user edits the document using Word 2002. He transcribes an audio file into a document using Dragon NaturallySpeaking 6. Once the document has all the necessary pieces in place, the user changes it into a portable format for easy and secure distribution using Acrobat 5.0.5. The user creates a marketing presentation in PowerPoint 2002 and adds elements to a slide show template."The final test in our Office Productivity suite is Data Analysis, which BAPCo describes as:
"The user opens a database using Access 2002 and runs some queries. A collection of documents are archived using WinZip 8.1. The queries' results are imported into a spreadsheet using Excel 2002 and are used to generate graphical charts."
ICC SYSMark 2004
The first category that we will deal with is 3D Content Creation. The tests that make up this benchmark are described below:
"The user renders a 3D model to a bitmap using 3ds max 5.1, while preparing web pages in Dreamweaver MX. Then the user renders a 3D animation in a vector graphics format."Next, we have 2D Content Creation performance:
"The user uses Premiere 6.5 to create a movie from several raw input movie cuts and sound cuts and starts exporting it. While waiting on this operation, the user imports the rendered image into Photoshop 7.01, modifies it and saves the results. Once the movie is assembled, the user edits it and creates special effects using After Effects 5.5."The Internet Content Creation suite is rounded up with a Web Publishing performance test:
"The user extracts content from an archive using WinZip 8.1. Meanwhile, he uses Flash MX to open the exported 3D vector graphics file. He modifies it by including other pictures and optimizes it for faster animation. The final movie with the special effects is then compressed using Windows Media Encoder 9 series in a format that can be broadcast over broadband Internet. The web site is given the final touches in Dreamweaver MX and the system is scanned by VirusScan 7.0."
Overall Performance using WorldBench 5
Our final set of overall system performance tests come from WorldBench 5, which is a pretty good tool for looking at older application performance as well as single-threaded performance.
3D Rendering Performance using 3dsmax 7
Once again, we're using an updated version of the SPECapc 3dsmax test for version 7 of the application. The scenes being rendered haven't actually changed, but the reference numbers used to compute the composite scores have, so these scores aren't directly comparable to results from earlier SPECapc tests.
Media Encoding Performance using DVD Shrink, WME9, Quicktime and iTunes
First up is DVD Shrink 3.2.0.15. Our test was simple - we took a copy of Star Wars Episode VI and ripped the full DVD to the hard drive without compression, effectively giving us an exact copy of the disc on the hard drive. Then, using the copy of the DVD on the hard drive (to eliminate any DVD drive bottlenecks), we performed a DVD shrink operation to shrink the movie to fit on a single 4.5GB DVD disc. All of the options were left on their defaults, so the test ends up being pretty easy to run and reproduce. The scores reported are DVD encoding times in minutes, with lower numbers meaning better performance.
The DVD Shrink test is quite important as DVD Shrink is quite possibly one of the easiest tools to rip a DVD. The easier a tool is to use, the more likely it's going to be used, and arguably, the more important performance using it happens to be.
Moving on, we have our Windows Media Encoder 9 test, which uses the advanced profile settings for video encoding. We left all settings at their defaults and just proceeded with an MPEG-2 to WMV-HD conversion. The values reported are in frames per second, with higher numbers being better.
Next up, we have Quicktime Pro 7.0.3 and we perform a MPEG-2 to H.264 encoding task. All of the settings are left at their defaults, with the exception that we optimize the output file for download with a 256kbps data rate while leaving the resolution untouched. We also adjust the video options to optimize for the best quality. We report the transcoding time in minutes, with lower values being better.
Finally, we have a MP3 encoding test using iTunes 6.0.1.3. For this test, we simply took a 304MB wav file and converted it to a 192kbps MP3 file, measuring the encode time in seconds. The only iTunes option that we changed was to prevent the playback of the song while encoding.
Gaming Performance using Battlefield 2, Call of Duty 2 and Quake 4
Gaming performance is pretty respectable for the Pentium EE 955, with the chip being quite competitive with AMD's Athlon 64 X2 4800+.
The most interesting thing we found is that even with a high end GPU like the Radeon X1800 XT, a number of games are still quite GPU limited even at 1024x768, which is why you don't see F.E.A.R. and Splinter Cell: CT here. Even some of the games that we did include required us to turn down some of the detail settings to start to stress the CPUs.
The pendulum often swings between games being CPU and GPU limited, and it seems that with the latest generation of games, we are definitely more GPU limited.
Final Words
The Pentium Extreme Edition 955 finally starts to bring some respectable performance to Intel's high end processors, but there is no clear cut victory. In applications and usage scenarios where the EE's ability to execute four threads simultaneously comes into play, it generally can remain quite competitive with the Athlon 64 X2 4800+. However, looking at older applications, single threaded scenarios and some multithreaded applications that aren't optimized for more than two threads, the EE 955 falls significantly behind.
There are a few other conclusions that we can draw based on what we've seen thus far. For starters, Hyper Threading is quite important to the performance of the Extreme Edition 955. While it isn't always perfect, when under very heavy multitasking loads, the ability to execute more threads translates into better overall performance for the entire system.
We've also been able to take an early look at the state of multithreaded game development, through the latest Call of Duty 2 and Quake 4 patches. Although the performance in CoD2 was terrible in SMP mode, Quake 4 gave us some hope, with performance gains approaching the 50% mark on dual core processors at CPU bound resolutions.
As far as the processor at hand is concerned, Intel has done a reasonable job with the Pentium EE 955, but with Conroe not too far away, we just can't justify recommending it. If you absolutely must upgrade today, the Athlon 64 X2 is still probably going to be a better bang for your buck. However, as we have seen in the benchmarks, there are advantages to being able to execute four threads simultaneously.
It is pretty much a toss-up at this point, but we'd recommend sticking with AMD for now and re-evaluating Intel's offerings when Conroe arrives. If all goes well, we will have a cooler running, faster processor with Conroe that may provide some even tougher competition for AMD's Athlon 64 X2.
While we're not emphatically recommending Intel's latest and greatest, we are impressed with Intel's transition to 65nm thus far. If Intel can use Cedar Mill and Presler to ramp up their 65nm process, hopefully it will be primed and ready for Conroe's introduction later this year. From what we've seen of Yonah, Intel does have their work cut out for them in order to truly regain the performance crown with Conroe, but anything is possible. A successful migration to 65nm would be a definite step in the right direction for Intel.
More than anything, we're hoping not to be disappointed by Conroe. We vividly remember recommending to wait for the original Pentium 4's release and then once more for Prescott's release, and both times being terribly disappointed by Intel's decisions. Let's hope that with the Pentium M team at the helm, Conroe's introduction will be a change of tradition for Intel.