Tachyum® today announced that it has successfully demonstrated enhanced hardware-assisted sampling running on its Prodigy® Universal Processor FPGA Emulation System, marking a breakthrough in the application of advanced compiler optimizations.
Compiler optimizations are critical for enhancing performance, reducing response times, minimizing storage footprints, and improving the overall Total Cost of Ownership (TCO) across a wide variety of modern computational workloads. Among the most powerful techniques employed by modern compilers are link-time optimizations (LTO), profile-guided optimizations (PGO) and feedback-directed optimizations (AutoFDO). These techniques leverage runtime data to fine-tune software performance, achieving results beyond what static optimizations can deliver.
Tachyum’s latest demonstration shows that these advanced AutoFDO optimizations—enabled through its Prodigy hardware platform—deliver performance improvements of 10-15%. By collecting data during program execution, PGO and AutoFDO with LTO allow compilers to adjust code in real time, optimizing how software runs and improving efficiency.
PGO traditionally involves modifying generated code to collect detailed execution information, allowing for full profiling but at the cost of added complexity. In contrast, AutoFDO takes a more practical approach by using specialized hardware blocks within the Prodigy processor to collect performance data with minimal overhead. This method enables feedback from production binaries without the need for code modification, providing a flexible and efficient solution for real-world applications. Though AutoFDO may have slightly lower profile precision compared to PGO, this is mitigated by increasing sampling time and merging data from multiple instances.
Both PGO and AutoFDO require re-compilation to apply insights from the collected profiles. While the results of both techniques are typically similar, AutoFDO's hardware-assisted approach offers significant advantages in terms of ease of implementation and scalability. Tachyum’s Prodigy Universal Processor platform supports out-of-box AutoFDO flow for hardware-assisted profile collection, offering customers a choice of optimization techniques depending on their specific needs.
“This latest milestone demonstrates our continued commitment to pushing the boundaries of computational performance,” said Dr. Radoslav Danilak, founder and CEO of Tachyum. “By enabling hardware-assisted sampling, we are not only simplifying the process of advanced compiler optimizations but also delivering tangible improvements to software performance, making it easier for developers to achieve faster, more efficient applications.”
As a Universal Processor offering industry-leading performance for all workloads, Prodigy-powered data center servers can seamlessly and dynamically switch between computational domains (such as AI/ML, HPC, and cloud) with a single homogeneous architecture. By eliminating the need for expensive dedicated AI hardware and dramatically increasing server utilization, Prodigy reduces CAPEX and OPEX significantly while delivering unprecedented data center performance, power, and economics. Prodigy integrates 192 high-performance custom-designed 64-bit compute cores, to deliver up to 4.5x the performance of the highest-performing x86 processors for cloud workloads, up to 3x that of the highest performing GPU for HPC, and 6x for AI applications.
A video demonstrating hardware support for profile collection used for AutoFDO is now available for viewing on https://youtu.be/Xb3n0Y2We_g.
Follow Tachyum
https://x.com/tachyum
https://www.linkedin.com/company/tachyum
https://www.facebook.com/Tachyum/
About Tachyum
Tachyum is transforming the economics of AI, HPC, public and private cloud workloads with Prodigy, the world’s first Universal Processor. Prodigy unifies the functionality of a CPU, a GPU, and a TPU in a single processor to deliver industry-leading performance, cost and power efficiency for both specialty and general-purpose computing. As global data center emissions continue to contribute to a changing climate, with projections of their consuming 10 percent of the world’s electricity by 2030, the ultra-low power Prodigy is positioned to help balance the world’s appetite for computing at a lower environmental cost. Tachyum received a major purchase order from a US company to build a large-scale system that can deliver more than 50 exaflops performance, which will exponentially exceed the computational capabilities of the fastest inference or generative AI supercomputers available anywhere in the world today. When complete in 2026, the Prodigy-powered system will deliver a 25x multiplier vs. the world’s fastest conventional supercomputer – built just this year – and will achieve AI capabilities 25,000x larger than models for ChatGPT4. Tachyum has offices in the United States, Slovakia and the Czech Republic. For more information, visit https://www.tachyum.com/.
View source version on businesswire.com: https://www.businesswire.com/news/home/20241112762318/en/
Contacts
Mark Smith
JPR Communications
818-398-1424
marks@jprcom.com