Another STAC-A2 record for Intel – what are these guys doing?
With quants developing innovative models at an unprecedented rate, it has become crucial to have access to computational power that enables a smooth and efficient run with room to do even more. In this article, James Reinders, President of James Reinders Consulting, reviews some of the research done by the Securities Technology Analysis Center (STAC) to find the best solutions that allow quant finance to enter its next evolutionary phase.
When Intel posted more great results on STAC-A2 (the Standard Benchmark Suite for Options Pricing and Risk Management), I was curious what we could learn about why Intel systems seem to consistently stay competitive.
Such year-after-year competitive results on STAC-A2 help explain their popularity with FSI customers, especially when one considers that these general-purpose machines are very useful even beyond their ability to excel at STAC-A2.
What’s the secret sauce?
The STAC-A2 website itself points out that “ongoing flow of new technologies promises to improve the speed, capacity, and accuracy of these workloads and/or reduce their cost of ownership or development. These include the latest CPUs, GPUs, and FPGAs, server architectures, programming languages and deployment environments including public and private cloud.”
Which of these secret sauces did Intel use? It turns out, it’s more what we might call a solid foundation: excelling at the essential building blocks of general-purpose computing. Use of Intel® Threading Building Blocks (Intel® TBB) for parallelism, implementation of Intel® AVX-512 (Intel’s very wide vector capability – the widest for any processor today) for vectorisation, and use of the latest and greatest Intel processors with a well-endowed system (memory).
This looks like the system/server configuration we all need for our demanding workloads! Audited results from a year ago (STAC-A2 (derivatives risk) on 448 cores with Intel® Omni-Path Architecture) remain industry leading still today for how to process the most data — something any quant can love!
Sneak a peek at the future?
Engineers at Intel have preliminary results that should continue this trend on Intel’s newest CPUs using their latest microarchitecture (Cascade Lake) in the AP version known as the Intel® Xeon® Platinum 9200 processor. We will have to wait for them to post as audited results on the STAC-A2 reporting website. I expect they will continue Intel’s long running trend of repeatedly improving STAC-A2 results.
For now, there is Cascade Lake vs. Skylake data available thanks to STAC-A2 report SUT ID INTC190402 which used a pair of Intel Xeon Platinum 8280 processors (Cascade Lake microarchitecture) and report SUT ID INTC190401 which used a pair of Intel Xeon Platinum 8180 (Skylake microarchitecture).
The continued ramp in support for general parallelism, including wide vectors (AVX-512), came through because in comparing the Intel Xeon Platinum 8180 ("Skylake") processors, and the solution using Intel Xeon Platinum 8280 processors (“Cascade Lake”) they demonstrated that:
- The Intel Xeon 8280 processor had up to 32% higher throughput and space efficiency (STAC-A2.β2.HPORTFOLIO.SPEED and STAC-A2.β2.HPORTFOLIO.SPACE_EFF, respectively)
- The Intel Xeon 8280 processor was up to 41% faster in warm runs of the large problem size (STAC-A2.β2.GREEKS.10-100k-1260TIME.WARM)
- The Intel Xeon 8280 processor was up to 84% faster in cold runs and up to 23% faster in warm runs of the baseline problem size (STAC-A2.β2.GREEKS.TIME.COLD and STAC-A2.β2.GREEKS.TIME.WARM, respectively)
- The Intel Xeon 8280 processor was able to handle up to 16% more assets in the max assets test (105 assets vs 90 assets) (STAC-A2.β2.GREEKS.MAX_ASSETS)
Observation 1: Intel processor-based solutions offer leadership
As I look over the years of reports for STAC-A2, a trend is there to see: Intel Xeon processor-based systems have held their own against an onslaught of competition as per STAC research. A consistency in leadership over the years shows that code modernisation, even on older hardware, can be a worthwhile investment. Upgrade hardware or software — both help — and they do not need to be simultaneous to have strong benefits when using Intel Xeon processors!
Observation 2: code modernisation matters (we cannot separate hardware and software)
Each year, submissions for STAC-A2 cite both hardware and software improvements leading to their latest results. None of the results that have been posted highlight the use of ‘years old software’ on new hardware. This is primarily because the hardware solutions utilise parallelism and other concurrency-oriented capabilities to achieve ever higher performance. Without co-tuned hardware and software, the results can be expected to be non-optimal. Four-year-old servers and four-year-old software both negatively impact performance.
Observation 3: without code modernisation, CPUs flexibility still leads
This is a subtle, but important point because very few installations can claim they have modernised/ optimised all their code: without code modernisation, CPUs win even bigger. Since new hardware needs the latest highly tuned software to be at its best (match these amazing benchmarks), then we should ask “how will my current software run?”
Experience suggests that Intel Xeon processors are the most flexible architecture at dealing with not fully modernised/ optimised code. While many tuning engineers will agree with me, I do not think any of us can show conclusive empirical proof of our belief. We can, however, all agree this means you should test your actual software on the actual machine you envision using. One way to “goose” the performance of your software is to utilise the latest vendor software. In the case of Intel, that would be the latest libraries and tools bundled as Intel Parallel Studio XE.
Processors we have, and we could buy, fit STAC-A2 best
My three observations tell us that software and hardware investments both matter in getting top performance. Each STAC-A2 submission ends up making this point. The ability of Intel to keep logging better results, year after year, as new versions of Intel Xeon processors appear — bodes well for choosing them for FSI computing needs.
Intel’s continued reporting of strong STAC-A2 results, rests on a strong foundation of hardware to make many workloads excel — not just STAC-A2.
QuantMinds Americas – September 11 – David Cohen
If you are lucky enough to be attending QuantMinds in Boston (September 9-11, 2019), look for David Cohen, an Intel Senior Engineer, and the CTO of Intel’s Storage Solutions group. Prior to Intel, David spent a dozen years working on Wall Street for Merrill Lynch and Goldman Sachs. He will have an interesting talk in the late morning, on September 11, that speaks to one of the most important new innovations for FSI workloads in years – non-volatile memories. They offer to revolutionise data hungry FSI applications by greatly enlarging how much system memory we can consider “normal”.
Next up: revolutionise memory
Even if you cannot attend David’s talk, you can start getting excited about early STAC-A2 results using non-volatile memories from Intel by reading the news, and studying the post results:
- 2-socket system: a) database all in SSD versus b) database all in persistent memory
- 4-socket system: a) database all in SSD versus b) database nearly all in persistent memory
For more information on Intel’s high performance computing resources, check out intel.com/hpc.
About James Reinders
James Reinders likes fast computers and the software tools to make them speedy. With over 30 years in High Performance Computing (HPC) and Parallel Computing including 27 Years at Intel Corporation (retired June 2016), he is also the author of nine books in the HPC field, numerous papers and blogs.
Software and workloads used in performance tests may have been optimised for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at [intel.com]. Intel, the Intel logo, and Xeon are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. © Intel Corporation.