In 2013 Google started work on TPUs and deployed them internally in 2015. Sundar first publicly announced their existence in 2016 at I/O, letting the world know that they’d developed custom ASICs for TensorFlow. They made TPUs accessible to outside devs via Google Cloud in 2017 and also released the second generation that same year. And since we’re plotting a timeline here, the Attention is All You Need paper that launched the LLM revolution was published in June of that same year.
OpenAI got a lot of attention with GPT4, a product based on the AIAYN paper, putting LLMs on the map globally, and Google has taken heat for not being the first mover. OpenAI last raised $6.6B at a $157B valuation late last year, which incidentally is the largest VC rounder ever, and they did this on the strength of GPT4 and a straight line trajectory that GPT5 will be ASI and/or AGI, or close enough that the hair splitters won’t matter.
But as OpenAI is lining up Oliver Twist style asking NVidia if “please sir, may I have some more” GPU for my data center, Google has vertically integrated the entire stack from chips with their TPUs, to interlink, to the library (TensorFlow) to the applications that they’re so good at serving to a global audience at massive scale with super low latency, using water cooled data centers that they pioneered back in 2018 and which NVidia is getting started with.
Google has been playing a long game since 2013 and earlier, and doesn’t have to create short term attention to raise a mere $6 billion because they have $24 billion in cash on their balance sheet, and that cash pile is growing.
What Google has done by vertically integrating the hardware is strategically similar to SpaceX’s Starlink, with vertically integrated launch capability. It’s impossible for any other space based ISP to compete with Starlink because they will always be able to deploy their infrastructure cheaper. Want to launch a satellite based ISP? SpaceX launched the majority of the global space payload last year, so guess who you’re going to be paying? Your competition.
NVidia’s margin on the H100 is 1000%. That means they’re selling it for 10X what it costs to produce. Google are producing their own TPUs at scale and have been for 10 years. Google’s TPUs produce slightly better performance than NVidia’s H100 and is probably on par when it comes to dollar per compute. Which means Google is paying 10X less for GPU compute than their competitors.
And this doesn’t take into account the engineering advantages derived from having the entire stack from application to chips to interconnect all in-house, and how they can tailor the hardware to their exact application and operational needs. When comparing NVidia to AMD, the former is often described as having a much closer relationship with developers and releasing fixes to Cuda on very short timelines for their large customers. Google is the same company.
As a final note, I don’t think it’s unreasonable to consider the kind of pure research that drives AI innovation as part of the supply chain. And so one might argue that Google has vertically integrated that too.
So amidst the noise and haste of startups and their launches, remember what progress their may be in silence.