HBM in Computing: A Comprehensive Overview13
Issuing time:2025-02-10 18:00 HBM in Computing: A Comprehensive OverviewHBM (High Bandwidth Memory) has emerged as a revolutionary technology in the computing landscape. It is a type of CPU/GPU memory chip based on 2.5/3D packaging technology, where DRAM Die are vertically stacked and connected through TSV (Through-Silicon Via) methods. HBM's ability to generate high bandwidth at low power consumption makes it widely used in the GPUs of training-type AI servers. The demand for HBM in AI servers is driven by several factors. Firstly, the number of GPUs搭载 on an AI server has increased from 2 in ordinary servers to 8 currently. Secondly, the number of HBM Stack搭载 on a single GPU has also risen. In the HBM1 scheme, a single GPU was equipped with 4 HBM1, while in the current HBM2e or HBM3 scheme, a typical single GPU is paired with 6 HBM Stack. Moreover, the number of DRAM layers and capacity of HBM stacking have increased. From HBM1 to HBM3, the density of a single DRAM Die has increased from 2Gb to 16Gb, the stacking height has increased from 4Hi to 12Hi, and the capacity of a single HBM stack has increased from 1GB to 24GB. It is estimated that the global server shipments in 2025 will reach 17 million units. Currently, the penetration rate of AI servers is approximately less than 2%. Assuming that the penetration rate of AI servers in 2024 is about 4%, based on the scheme where each AI server is equipped with 8 GPUs and each GPU is equipped with 6 HBM Stack totaling 80GB to 100GB or more, the incremental space of HBM brought by AI servers in 2024 is expected to exceed tens of billions of dollars. The adoption of 2.5D + 3D packaging technology for AI servers' GPUs has driven the demand for core packaging technologies such as TSV and CoWoS. HBM is one of the near-memory computing solutions, with significant advantages such as high bandwidth, low power consumption, and small area. It caters well to the requirements of AI chips. The market for HBM has witnessed explosive growth, with Hynix and Samsung dominating the market. Currently, mainstream AI training chips all utilize HBM. For instance, Nvidia's H100 uses 5 HBM3 with a capacity of 80GB, and the H200 released at the end of 2023 uses 6 HBM3E (the world's first GPU to use HBM3E) with a capacity of 144GB. In March 2024, Nvidia announced the B100 and B200 at the GTC2024 conference held in San Jose, California, USA, which use 192GB (8 24GB 8-layer HBM3E). The increasing usage of HBM in Nvidia's GPUs indicates the growing demand. However, there are certain limitations and challenges associated with HBM. The capacity gap between HBM and advanced SRAM is significant. Currently, the single-chip capacity of HBM3 can reach 24GB, while that of advanced SRAM is only at the hundred-MB level. If the model parameters exceed the capacity of SRAM (such as in LLM reasoning), external storage like HBM or DDR is still required, and in such cases, SRAM merely serves as a cache and cannot fully replace HBM. Additionally, the unit area cost of SRAM is dozens of times that of HBM, making large-capacity SRAM integration uneconomical. In conclusion, HBM has become a standard technology for AI chips and high-end GPUs, and its market is expected to continue growing in the future as the demand for computing power and bandwidth in various applications keeps increasing. |