NVIDIA, an industry leader in AI, has encountered challenges with its latest product. Unveiled at GTC 2025, the GB300 AI servers have generated mixed reviews regarding supply chain issues and customer feedback, despite high expectations for their performance enhancements. The next-generation AI server is facing difficulties mirroring the success of previous versions, hindered by technical complexity and a market preference for established solutions.
The GB300 represents an upgrade to the Blackwell series, introducing the B300 GPU. This single card demands up to 1,400W of power, offering around a 50% increase in FP4 compute performance and expanding memory capacity from 192GB to 288GB, enhanced by a 12-layer HBM3E stack. Networking capabilities have advanced from ConnectX-7 to ConnectX-8, and optical module bandwidth has doubled to 1.6Tbps. These developments aim to satisfy the growing needs of AI inference and training, especially in avant-garde applications like agent-based and physical AI. Nevertheless, there's a well-known saying: "Taking too big a leap can lead to instability." NVIDIA's significant technological advancements have yet to translate into broad market acceptance.
Contrarily, the GB200, the preceding model, has also faced issues. In 2024, NVIDIA shipped just around 15,000 GB200 AI servers, which fell short compared to previous Hopper series outcomes. The contributing factor was the GB200's initial struggles with low yields from TSMC's advanced packaging technologies (like CoWoS-L). Although this obstacle has been partially resolved, deployment difficulties have lingered.
Deploying the GB200 NVL72 server racks has proven complex, with a single setup taking 5 to 7 days and encountering frequent stability issues and system crashes. The NVLink copper connection design demands specific adaptations per installation, complicating the connection of around 5,000 cables, exacerbated by a reliance on NVIDIA engineers for configuration, causing frustration among cloud service providers (CSPs) when encountering cluster failures, and leaving them reliant on NVIDIA's technical support.
Supply chain pressures are significant, too, as the GB300's mass production, initially slated for late 2025, is now delayed, with customer test samples expected only towards year-end, and mass production potentially postponed to 2026. This setback is linked to design complexities such as the demand for liquid-cooling solutions and increased power requirements from the high-power design. The GB300's anticipated power consumption challenges data center infrastructure with projected consumption surpassing the GB200 NVL72 rack's 140kW, raising production costs and demanding a retrofit of existing data centers to accommodate liquid-cooling techniques.
Leading CSPs, including Microsoft, Google, and Amazon, now approach the GB300 cautiously due to past experiences with the GB200 that fell below expectations. For instance, Microsoft's trials with the GB200 NVL72 revealed repeated hardware debugging needs, disrupting data center capacity timelines. Such dissatisfaction drove CSPs to favor NVIDIA's stable alternatives like the HGX series. The Hopper architecture on the HGX H100 supports 60kW to 80kW per rack, compatible with existing air-cooling designs, offering flexibility superior to the GB200 and GB300. Furthermore, the HGX B200, equipped with eight B200 GPUs, supports NVLink at speeds of up to 400Gbps and is widely favored across AI platforms using x86 architecture popular with hyperscalers and small-to-medium cloud providers.
Shifts in market demand also reflect this trend. While orders for the GB300 have not met forecasts, requests for HGX systems rise steadily. Analysts note GB300's pricing as a potential barrier, with the GB200 NVL72 priced about $3 million per unit. GB300's higher cost, due to its liquid-cooling and performance features, might discourage budget-conscious customers, especially if immediate ROI seems improbable.
The competitive landscape is evolving with rivals like AMD and Intel gaining momentum. AMD's Instinct MI300 is making strides with strong price/performance value and an open-source platform, while Intel's Gaudi 3 excels in specific inference tasks. These alternatives are expanding customer choice but haven't severely threatened NVIDIA's market lead yet.
In response, NVIDIA is re-evaluating strategies, considering launching a single-GPU Blackwell variant to cut down lead times. Whether these measures will restore customer confidence remains uncertain, as NVIDIA navigates between innovation and market demands, aiming to balance supply chain efficiency with customer expectations for stability and affordability. While ambitious, NVIDIA faces an intricate path ahead, highlighting the need to adapt swiftly to market and operational challenges.