The trend for today’s CPUs is core count … and lots of it. (Cases in point: 2nd Gen Intel® Xeon® Scalable processors scale up to 48 cores per CPU. And Intel® Xeon Phi™ processors have as many as 72!) In this environment, vectorizing your code is critical to delivering optimal application performance on core-rich nodes.
So how do you write vectorization-friendly code?
You start by identifying and removing barriers like those affecting memory access patterns and cache usage, and balancing multi-process programming (MPI) with multi-threaded programming (OpenMP).
This presentation is a deep dive on how to do both, demonstrated on Texas Advanced Computing Center’s newest petascale system, Frontera, powered by Intel Xeon Scalable processors.
Watch Ian Wang, HPC specialist from University of Texas, discuss these concepts, including:
- The basics of vector-aware programming, dependency analysis, and optimization reports
- Guidance in using vector units, the proper placement of tasks/threads, the efficient use of memory bandwidth, and the impact of frequency scaling
- Software tools of the trade, including Intel® Math Kernel Library and Intel® C++ Compilers
- Code samples and step-by-step instructions
Download the software
- Get Intel® Math Kernel Library one of five FREE Intel® Performance Libraries
- Get Intel® C++ Compiler as part of Intel® Parallel Studio XE or Intel® System Studio
- Get Intel® Advisor as a standalone tool, or as part of Intel® Parallel Studio XE or Intel® System Studio