They were intellectually very interesting, but they were just silly, in that they were never going to connect with anything that moved the needle for the company, which was exactly the problem I was trying to get away from in being a research computer scientist, and so I sorted myself out. I found a pragmatic thing to go work on. Then I was like, "Okay. Elad Gil: Yeah.
Elad Gil:... I remember I was like, "All right. Was there a moment when you decided or you realized personally that AI should be a key technical bet for Microsoft?
I think Microsoft already understood that before I got there and then it was just, how do you focus all of the energy on the company on the right thing, because we had a lot of AI investment and a lot of AI energy and it was very diffuse when I got there. No lack of IQ and actually no lack of capital spending and everything else, but it was just getting peanut buttered across a whole bunch of stuff. The thing that really catalyzed what we were doing is, I mean, maybe this is a little bit too technical, but before I got there, the technical thing that had been happening with some of these AI systems that, to me, was very interesting is transfer learning was starting to work.
You saw a little bit of that with some of the cool stuff that DeepMind was doing with reinforcement learning with play transfer across some of the gaming applications that they were building, but the really exciting thing was when it started working for language with ELMo and then BERT and then RoBERTa and Turing and a bunch of things that we were doing. That was the point where there were so many language-based applications that you could imagine building on top of these things if it continued to get better and better, and so we were just looking for evidence that it was going to continue to get better and better and as soon as we found it, we just started all in. That was everything from doing a partnership with OpenAI to, at one point I seized the entire GPU budgets for the whole company, and I was like, "We will no longer peanut butter these resources around.
I mean, Eli, I think the question you were asking is how we decided to do the open AI partnership. The reason that we did the partnership was twofold. We felt, in addition to the high ambition things that we were doing inside of the company, that we needed high ambition partners and when we looked around OpenAI was clearly the.
And when we looked around, OpenAI was clearly the highest ambition partner that was in the field. And so that was one thing. And then the second thing was they really had a very similar vision to the one that I had about these things were evolving into platforms and we were able to...
Elad Gil: I think one of the stunning things about the partnership in some sense was the timing. Right around GPT-2 and this is before GPT-3, and there was such a big step function between the two of them that I think it was less obvious in the GPT-2 days that this was going to be as important as it was. And so Satya has this thing that he talks about, no regrets investing.
And so this was one of those no regrets things in that the very, very worst thing that could happen is we would go spend a bunch of capital on computing infrastructure and we would learn what to do at very high scale for building these AI training environments. You probably seen the famous OpenAI Compute Scale paper where they sort of plot on the log scale how many petaflop days or whatever the unit of total compute they were using on that graph that shows from 2012 when we first figured out how to train models with GPUs through, I think the plot ends sometime in 2018. Elad Gil: It was a very bold move.
I guess, a more recent move as you announced a collaboration within Nvidia to build a supercomputer part by Azure infrastructure combined with Nvidia GPUs. Could you tell us a little bit more about your supercomputing efforts in general and then maybe a little bit more about those collaborations both Nvidia and OpenAI on the supercomputing side? Kevin Scott: Yeah, so we built the first thing that we called an AI supercomputer.
I think we started working on it in 2019, and we deployed it at the end of that year. And it was the computing environment that GPT-3 was trained on. And we had been building a progressively more powerful set of these supercomputing environments.
But the designs of these systems, we can build smaller stamps of them and they get used by lots of people. So we have tons of people who are training very big models on Azure compute infrastructure, both folks inside the company and partners who can come in. And it was a thing that was not possible to do before where you could say, "Hey, I would like a compute grid of this size with this powerful network to do my thing on.
And we work super closely with them defining what the hardware requirements need to be in the coming generations of GPUs because we have a pretty clear sense of where models are going and what model architectures are evolving towards. Because obviously from an Azure perspective, lots of people are running open source models on top of Azure right now. Kevin Scott: Yeah, it is an interesting thing that people are framing it as some kind of binary thing.
And you use it for performance and cost optimization reasons, and you use it for just precision and quality reasons sometimes. I think my biggest question mark there is how you go deal with all of the REI and safety things. I was just playing around yesterday with that 12 billion parameter Dolly 2.
How do you think about that from the context of enabling AI for your business customers outside of your core products? Are there specific tools coming? And so the first thing that we built was GitHub Copilot, which is a coding tool.
And then you, as the developer, the same way that you would take a suggestion from a payer programmer, you scrutinize it, then code review it, and decide whether or not it makes sense for your application.
