Skip to main content

MangoBoost Achieves Record-Breaking MLPerf Inference v5.0 Results for Llama2-70B Offline on AMD Instinct™ MI300X GPUs

MangoBoost, a provider of cutting-edge system solutions designed to maximize AI data center efficiency, has set a new industry benchmark with its latest MLPerf Inference v5.0 submission. The company’s Mango LLMBoost™ AI Enterprise MLOps software has demonstrated unparalleled performance on AMD Instinct™ MI300X GPUs, delivering the highest-ever recorded results for Llama2-70B in the offline inference category.

This milestone marks the first-ever multi-node MLPerf inference result on AMD Instinct™ MI300X GPUs. By harnessing the power of 32 MI300X GPUs across four server nodes, Mango LLMBoost™ has surpassed all previous MLPerf inference results, including those from competitors using NVIDIA H100 GPUs.

Unmatched Performance and Cost Efficiency

MangoBoost’s MLPerf submission demonstrates a 24% performance advantage over the best-published MLPerf result from Juniper Networks utilizing 32 NVIDIA H100 GPUs. Mango LLMBoost™ achieved 103,182 tokens per second (TPS) in the offline scenario and 93,039 TPS in the server scenario on AMD MI300X GPUs, outperforming the previous best result of 82,749 TPS on NVIDIA H100 GPUs.

In addition to superior performance, Mango LLMBoost™ + MI300X offers significant cost advantages. With AMD MI300X GPUs priced between $15,000 and $17,000—compared to the $32,000–$40,000 cost of NVIDIA H100 GPUs (Source: Tom's Hardware – H100 vs. MI300X Pricing)— Mango LLMBoost™ delivers up to 62% cost savings while maintaining industry-leading inference throughput.

In terms of cost-efficiency, the Mango LLMBoost™ + MI300X system delivers approximately 2.8× more inference throughput per $1,000 spent than the H100-based system, making it the clear choice for high-performance, budget-conscious deployments.

Mango LLMBoost™: A Scalable and Hardware-Flexible MLOps Solution

Mango LLMBoost™ is an enterprise-grade AI inference software that provides seamless scalability and cross-platform compatibility. It supports over 50 open models, including Llama, Qwen, and DeepSeek, with one-line deployment via Docker and built-in OpenAI-compatible APIs. The software is cloud-ready—available on AWS Marketplace, Microsoft Azure Marketplace, and Google Cloud Platform—and is also available for on-premise deployment for enterprises requiring full control and security.

Key capabilities of Mango LLMBoost™ include:

  • Auto Parallelization – Efficiently distributes large models across GPUs and nodes.
  • Auto Config Tuning – Optimizes runtime parameters based on workload characteristics.
  • Auto Context Scaling – Dynamically adapts memory usage to maximize GPU utilization.
  • Auto Disaggregated Deployment – Ensures flexible deployment across multiple inference stages.

Collaboration with AMD: Unlocking the Full Potential of MI300X GPUs

MangoBoost’s record-breaking results were achieved through a close partnership with AMD, leveraging the ROCm software stack to maximize MI300X GPU performance. This collaboration has resulted in a scalable and efficient AI inference solution that can be deployed across single-node or multi-node clusters with ease.

Extending Performance Leadership to AWS and Beyond

Beyond the MLPerf results, Mango LLMBoost™ has been extensively tested on various cloud and on-premises configurations. On an 8×NVIDIA A100 GPU setup from AWS, Mango LLMBoost™ achieved up to 138x faster inference compared to Ollama and significantly outperformed HuggingFace TGI and vLLM across multiple model sizes, including LLaMA3.1-70B, DeepSeek-R1-Distill-Qwen-32B, and LLaMA3.1-8B. In terms of cost-efficiency, Mango LLMBoost™ also leads the pack with the lowest GPU cost per million tokens, reducing inference cost by over 99% compared to Ollama, and by over 30% even compared to vLLM on high-throughput workloads.

Expanding AI Infrastructure Solutions

In addition to the Mango LLMBoost™ software, MangoBoost offers hardware acceleration solutions based on Data Processing Units (DPUs) to enhance AI and cloud infrastructure, including:

  • Mango GPUBoost™ – RDMA acceleration for multi-node inference and training via RoCEv2.
  • Mango NetworkBoost™ – TCP/IP stack offloading for enhanced CPU efficiency.
  • Mango StorageBoost™ – High-performance NVMe/TCP initiator and target solutions for scalable AI storage.

For more information, please refer to our technical blog or reach out to contact@mangoboost.io.

About MangoBoost

MangoBoost delivers cutting-edge, full-stack system solutions that maximize AI data center efficiency. The company offers a high-performance DPU that seamlessly integrates with general-purpose GPUs, accelerators, and storage products, enabling cost-effective, standardized AI infrastructure. In addition, MangoBoost provides AI inference optimization software that enhances GPU efficiency for large-scale LLM workloads, accelerating deployment and reducing operational costs.

Founded in 2022 on a decade of research presented at top-tier computer systems conferences such as OSDI and ISCA, MangoBoost has secured over $60 million in funding and is rapidly expanding its presence in the U.S., Canada, and Korea. With a team of over 100 experts—many holding PhDs from world-class research institutions—MangoBoost continues to push the boundaries of AI infrastructure efficiency, backed by more than 30 patents protecting its core technologies.

For more information, visit MangoBoost’s website and LinkedIn page.

This milestone marks the first-ever multi-node MLPerf inference result on AMD Instinct™ MI300X GPUs.

Contacts

Stock Quote API & Stock News API supplied by www.cloudquote.io
Quotes delayed at least 20 minutes.
By accessing this page, you agree to the following
Privacy Policy and Terms and Conditions.