Microsoft spent millions to put together a supercomputer for OpenAI

Now its building one that even bigger and even more sophisticated.
Ameya Paleja
Data center with servers of information
Data center with servers of information


Nearly five years ago, a little-known company approached Microsoft with a special request to put together computing horsepower to the scale it had never done before. Microsoft then spent millions of dollars in putting together tens of thousands of powerful chips to build a supercomputer. OpenAI used this to train its large language model, GPT, and the rest, as they say, is history.

Microsoft is no stranger to building artificial intelligence (AI) models that help users work more efficiently. The automatic spell checker that has helped millions of users is an example of an AI model trained on language. However, OpenAI wasn't looking to develop anything along the lines of a spell checker. The company wanted to fundamentally change how people interact with computers and needed something massive.

How Microsoft put together a supercomputer for OpenAI

When OpenAI provided Microsoft with the details of the scale it wanted to compute at, Microsoft realized that it needed to give OpenAI access to its cloud computing services for an extended period of time.

So, Microsoft began stringing together tens of thousands of A100 graphics chips from NVIDIA that have the large computing power and are frequently used for developing AI models. However, it was not just a matter of connecting the chips together.

When training AI models, work is divided between the thousands of processing units that fire up at once, requiring meticulous power management. Beyond that, the processors also need to talk to one another to share the work done for which Microsoft had to write special software.

Now that OpenAI has trained its model, the more significant hurdle lies in making it available to many users. Microsoft had also launched its new Bing which uses GPT-4, an advanced version of the language model from OpenAI, and put users on a waitlist so that it can scale its servers before rolling out its service.

Not just that, through its Azure AI services, Microsoft is also making the computing infrastructure available for other users to build their own AI models, requiring it to scale its resources even faster. Microsoft Azure Team has to keep an eye on every little detail from chips to concrete, pipes to trays to rack-up servers because disruption in the supply of any one component can thwart their plans of scaling up rapidly.

What Microsoft put together for OpenAI a few years ago has blown away users. What the company is building now is even bigger and more sophisticated than before. One can only imagine what exciting models it will bring out.

The age of AI has just begun.

This piece contains information that first appeared in Bloomberg and Microsoft's blog.

Add Interesting Engineering to your Google News feed.
Add Interesting Engineering to your Google News feed.
message circleSHOW COMMENT (1)chevron
Job Board