As computational power surges in the realm of artificial intelligence (AI), traditional air-cooling methods are reaching their limits in terms of thermal managementIn this context, liquid cooling technologies stand out as the most viable alternative, propelling the industry towards rapid advancementsThe performance demands of AI servers and evolving regulatory requirements regarding Power Usage Effectiveness (PUE) in data centers have catalyzed this shiftThis article delves into the current trends in AI server technology, the shift from air to liquid cooling, and the ramifications for the market and industry landscape.
During NVIDIA's GTC 2024 conference, CEO Jensen Huang unveiled new systems based on the Blackwell architecture, the B200, and the super chip GB200. Significant attention was drawn to the ambitious performance metrics associated with these new chips, boasting processing capabilities of 20 PetaFLOPS
However, this remarkable performance is contingent upon the use of the newly introduced FP4 precision and, critically, the implementation of liquid cooling systemsThis dependency suggests that to unlock the full potential of these advancements, a paradigm shift towards liquid cooling is not just beneficial, but essential.
Insights from various domestic research institutions indicate that NVIDIA's movements may serve as a bellwether for the industry, helping catalyze the development of liquid cooling technologiesThis conference marks a pivotal moment that may well facilitate an accelerated transition away from traditional air cooling methods toward liquid cooling, solidifying its position as the mainstream approach in thermal management.
The rapid evolution of the AI sector has led to GPUs outpacing traditional CPUs in terms of processing capabilities, particularly in handling dense data sets
- Luxury Brands Exploring Cryptocurrency Payments
- U.S. Oil Production Surge
- The Transformation of OpenAI
- Microsoft Under Antitrust Investigation
- Details of Economic Forecasting
However, this comes at a cost; the power consumption of GPUs significantly outweighs that of CPUs and is on a steep trajectory of increaseA recent report from an investment firm outlines the slowing progress of chip production technologies in the post-Moore's Law era, with manufacturers focusing on increasing core countsThis shift has resulted in a notable surge in power demands for CPUs and GPUs alike, with Intel's processors seeing TDP (Thermal Design Power) values climb from 150W to as much as 385W over recent generations.
The increase in GPU TDP is even more pronouncedFor instance, the early V100 NVLink GPU used for AI computation boasted a TDP of 300W, while NVIDIA's latest introduction, the GB200, exceeds 1000W—a clear indication that air cooling methods are in danger of becoming ineffective for many modern applications, as they typically handle thermal loads no greater than 800W
In this respect, liquid cooling systems rise to the occasion, demonstrating superior thermal management capabilities that comfortably accommodate higher power chips.
It's important to note that the challenges posed are not just confined to individual component cooling, but rather extend to the overall thermal management of entire racks within data centersAccording to the Uptime Institute's 2022 Global Data Center Survey, the surge in AI-related power requirements has caused racks to draw higher power loads—taking examples like the NVIDIA DGX A100 server, which has a rated power of 4KW and can peak at 6.5KWAs a standard 42U cabinet can house roughly five 5U AI servers, total power requirements for a rack can surpass 20KWTraditional air-cooling configurations generally cap cooling solutions to around 12KW, making them ill-suited to meet the demands of AI server racks effectively.
On top of the technological advancements and power constraints, national policies are increasingly steering the industry toward liquid cooling solutions
Data from the China Academy of Information and Communications Technology indicates that average PUE ratings across Chinese data centers stood at 1.52 in 2022. Moreover, the green energy technology alliance reports that nearly half of all data centers fell below a PUE of 1.40—an indication that significant efficiency improvements are needed.
Aligned with the country's objectives around dual carbon targets, a series of regulatory frameworks have emerged that push for stringent PUE benchmarksNew policies are calling for large data centers to lower their PUE to below 1.30 by 2025 and further to 1.25 for national key nodesSuch mandates are anticipated to incentivize the adoption of energy-efficient cooling methodologies.
The implications for cooling technologies reveal themselves not just in terms of raw performance
Cooling systems account for roughly 43% of the energy consumption within data centers—second only to the energy demands of IT equipment itselfLiquid cooling solutions can replace high-energy systems such as air conditioning units and fans, potentially yielding energy savings of 20-30%. Beyond energy consumption reductions, liquid cooling can significantly lower chip temperatures, enhancing reliability and performance in a virtuous cycle.
While upfront capital costs for transitioning to liquid cooling systems may be higher compared to traditional air methods, the potential long-term savings on operational expenses can lead to a payback period of roughly 2.2 years for a 10MW liquid-cooled data centerFurthermore, research from industry analysts notes that construction costs for liquid cooling installations have become increasingly competitive—falling from around 6500 yuan per kilowatt in 2022 to an expected 5000 yuan in 2023, aligning closely with traditional air cooling costs.
The market for liquid cooling technologies encompasses various implementations, including cold plate systems, immersion cooling, and spray cooling
Cold plate systems currently dominate due to their early development and mature ecosystem, featuring advantages like lower modification costs and shorter implementation timelines, albeit with slightly lower cooling efficiency compared to immersion methodsThis places cold plate systems as an intermediate phase between traditional air cooling and complete immersion cooling technologies.
Projections estimate that the market for liquid-cooled data centers in China will expand from 26.09 billion yuan in 2019 to approximately 128.32 billion yuan by 2025, with cold plate systems alone expected to exceed 75 billion yuan, representing an impressive CAGR of 22%. The immersion cooling segment could see annual growth rates as high as 46%, driven by increasing uptake among top telecommunications operators, predicted to exceed 50% for their data center projects by 2025.
The operational landscape includes a variety of stakeholders, from upstream component manufacturers and cooling equipment providers to midstream server manufacturers utilizing cooling technologies and downstream service operators
Key players in the upstream segment include pioneering firms like Yinvike, Feirongda, Shuguang Data, and Shenling Environmental ProtectionYinvike, in particular, has made noteworthy strides, collaborating with major tech companies like Intel to publish industry guidelines aimed at advancing liquid cooling solutions.
As major projects continue rolling out, Yinvike’s financial performance reflects these successesIn the first three quarters of 2023, the firm posted revenues of 2.072 billion yuan, marking an impressive 39.51% year-on-year growth, and witnessed net profits surge by nearly 80% to 210 million yuan—a record-high growth rate in five years.
Feirongda has also established itself as a key player, specializing in thermal management solutions designed for server coolingIn the first half of 2023, the company's revenue from thermal management materials and products approached 704 million yuan, illustrating its deep engagement with top-tier clients in the space, including Huawei, Inspur, and ZTE.
Mid-tier players like Inspur and H3C are similarly innovating around liquid cooling designs, driven by the recognition that the challenges of transitioning from air to liquid cooling necessitate a foundational alignment with server architecture