Semi Doped
The business and technology of semiconductors. Alpha for engineers and investors alike.
Semi Doped
Quick Takes: Nvidia Keynote at GTC
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Vik and Austin unpack the Nvidia GTC keynote with fresh, top-of-mind takes while trying to breakdown key announcements, what matters and what doesn't. They discuss Groq's LPX, optics+copper for scale up, new CPU requirements, CPO for networking, and what agents means for software, and much, much, more.
Check out Austin's substack: https://www.chipstrat.com
Check out Vik's substack: https://www.viksnewsletter.com
Chapters
00:00 Introduction and Keynote Context
03:18 Keynote Highlights and Gaming Innovations
06:18 Generative AI: The Three Eras
09:28 Inference: The New Revenue Generator
12:21 NVIDIA's Tiered Approach to AI Models
15:30 The Grok Chip and Its Role
18:35 Vera Rubin System: A Full Data Center
21:18 CPU Demand and Performance
24:31 Networking Innovations and Future Directions
32:32 Innovations in PCB Technology
34:06 Scaling GPU Systems
36:57 Understanding the STX Rack and AI Storage
38:23 The Rosa CPU and Its Significance
40:07 Digital Twin Platforms and AI Factories
43:53 NVIDIA's New Software Innovations
47:09 The Future of Token Budgets in AI
54:15 Balancing CapEx and OpEx in AI Deployments
It's a honking big PCB. Like I've never seen a PCB this big. Like usually PCBs are something you can hold in your hand. This thing requires like two people to lift this PCB. Okay. And from what I've seen, this mid-plane PCB has like 52 layers and is like a work of engineering. To be honest, to even manufacture a board this complex, you know, is fantastic.
SPEAKER_03And so what this does is welcome to another semi-dope podcast. I'm Austin Lines from Chipstrat, and with me is Vic Shaker from Vic's Newsletter. So if you're wondering why it looks like Austin's in a hotel room with a crappy webcam and terrible audio, that's because I'm in a hotel room with a crappy webcam and terrible audio coming at you live from San Jose outside of NVIDIA's GTC. I'm not too far from the SAP Center where they had the keynote today, and also the conference center where they'll be having a lot of stuff tomorrow. But I will say, Vic, I missed the keynote. So I had to stream it from the plane because we had some weather in Iowa and my flight got canceled yesterday. So Vic and I wanted to come to you, people and listeners, and give you a keynote debrief and sort of digest what we learned together. And we wanted to get you it to you ASAP, sort of like an emergency podcast. And that's why it is 8.50 p.m. uh Pacific time here. So it feels like 10 50 p.m., which is past my bedtime. But uh Vic, how are you doing over there?
SPEAKER_01I'm pretty good. You know, usually we record this podcast like your morning time, my evening time. And so I'm kind of used to doing it late at night. But like after you've traveled like all over the place with delays and all of that stuff, and then landing up and meeting with all the other analysts and all the things, and now you have to record a podcast. So yeah, I feel for you. It's far worse than what I usually go through.
SPEAKER_03Yeah, totally. So, yeah, for listeners, we had a blizzard in Iowa. It came through Sunday afternoon, so they canceled my flight. And then it got colder and colder, and the wind was blowing overnight. And so this morning everything kept getting pushed back. Um, I think when I left Iowa, it was seven degrees Fahrenheit, feels like minus 17. And then when I I went down to Dallas and then I went over to San Francisco, and in San Francisco, it was like 78 Fahrenheit when I landed. So I was like, uh oh, the feels like temperature is like a different difference of 100 degrees Fahrenheit from when I left to when I arrived.
SPEAKER_01You know, remember I was joking on an X comment on your your tweet, basically, like how you were like, you're like basically one of those frozen shrimp that you take from the freezer and you put in a microwave to thy it out, but you end up cooking it. So that's that's basically asking today.
SPEAKER_03Yes. Exactly, exactly. And so yeah, I landed, um, came down to the hotel, and then there's I'm an analyst, so I got to go to the analyst relations dinner, and there were some NVIDIA folks there. Ian Buck was there. Um Gillad, the head of the networking business, was there, and a couple other folks. So that was cool to get to mingle with people quick. Uh, had some good California wine. Now I'm plugged in, got some decaf, and Vic and I are gonna unpack the the keynote with you.
SPEAKER_01Okay, let's do it. He said so much stuff, like uh, you know, two and a half hours. So I kind of in my time I got up early in the morning and like I was like on Jensen's keynote before I even brushed. So I've been listening to it like the whole thing immediately because yeah, I was asleep when he actually did the thing. So yeah, he said a whole lot of stuff, but I'm wondering if it's really worth like two and a half hours of talking. Uh he had all these like like science fiction videos of cool stuff. I know it's nice to see when you're in there, probably. But you know, I skip past a lot of science fiction videos.
SPEAKER_03Yeah, so you know, I have an interesting take here, which is when I was talking to the analyst relations team, they were saying, like, because NVIDIA is so big and they have so many businesses, you know, autonomy, telco, data center, client, gaming, that they're trying to be respectful of all of their customers and they want to like talk to each of them. But on the other hand, they're like one of the world's biggest companies, and therefore, even in three hours, they can't talk about everyone. And so I thought it was interesting. You know, for example, the keyno opened up talking like very early on about gaming. And it was, it was like a it was like a really cool demo of showing their like how they use generative AI to make their like gaming better. And then they talked about it for like a minute, and then that was that. But you know, it was probably more than they talked about it as CES. So I think it was like a nod to the gamers, like, hey, we got you covered. We're thinking about it.
SPEAKER_01On our podcast too, you know, we we do care about consumer stuff once in a while. Uh and I think this was cool because of all the videos that I saw that Jensen was playing, I think I liked this the most because it showed this like before or after their deep learning super sampling technology, which is like DLSS. DLSS, yes. Yeah, which is like an AI-powered uh suite that you know mixes their channels. And it was really good too. It was awesome, yeah. I think anybody watching this should at least see the it's the first part of the keynote. So you can see like how you know you have these typical video game characters, you can always tell they're video game-ish. But what ends up happening after DLSS is like awesome. It's just like a real person playing these games. It's amazing. I'm excited to play games now. I literally want to go buy a like a gaming you know card if it's not a million dollars because of like memory problems and memory. Actually play some games.
SPEAKER_03Totally. Yeah, yeah. It went from like kind of blocky faces to just very realistic and like stubble on people's chin and stuff. Like this guy.
SPEAKER_01Yeah, you got like strands of gray hair like we do. It's it's awesome.
SPEAKER_03Yeah, totally. Totally. So now then Jensen quickly went past uh consumer GPUs, and um ultimately this was a data center conversation. And what I liked, how he teed up the framing. I liked the framing and the business case talk. So let's start with that. Um, you know, I think Jensen did a really good job of reminding people that there's been sort of like three big sort of step change eras with generative AI. At first, it was like the training and post-training area era, where it was like, whoa, uh GPT-3, GP3 3.5, like this is amazing, this is the thing, it's a chat bot, but people were still concerned about hallucinations and whatever. So it was like, oh, this is a cool party trick, but you can't trust it. And then of course, in like, and that was like late 2022, in 2024, late 2024, um, the 01 model came out, and now all of a sudden you could reason, and that of course helped solve the hallucinations problem. And then we started to get into test time compute scaling, where you're just like, dude, if this thing can think, just let it think for longer at inference time, and you'll get a better, you'll get better intelligence. Um so you could start to see we're shifting from the training era where everyone needed NVIDIA GPUs for training to inference, to like even more inference, because now all of a sudden it's gonna have real demand because it's trustworthy, and it's gonna have maybe 10x the tokens because you're gonna have it think for longer to make sure it's right. And then Jensen was pointing out like, we've now entered honestly, truly in in 2026, um, like late December 2025 into 2026, this third sort of era, which is the agentic AI era. So think like Claude Code, and as we've talked about, think like open claw. And again, it's like another 10x um, you know, another order of magnitude in need of tokens because all of a sudden it's not humans talking to a chatbot, it's a it's humans talking to AI, and then AI spawning all this other compute, tool calling, CPUs, you need long context, you need lots of storage. And um and actually Ben Thompson even wrote a great uh strategy article this morning, I think, is when he published it, saying, like, hey, I used to think this was a bubble, and now I'm actually not so sure it's a bubble, because with agentic AI, it feels truly useful, and it truly feels like we're gonna need all the compute, not just GPUs, but CPUs. And so I think Jensen was sort of taking the same similar framing and just saying, like, guys, you have to believe us, we're in a new era, and we it's not that AI was a party trick, but it's but especially with OpenCloud, like you're seeing the world is changing. Um so let me let me pause there. Like, what did you think of that framing? Did you think it was on? Does it resonate with your experience?
SPEAKER_01Yeah, I mean, this is this is why I wrote that article about CPUs and the agentic AI boom and all that. Uh it's a real need for inferencing now. And what Jensen pointed out is that inferencing is what will generate revenue. And this is why people, you know, AI has become a true revenue generator only because of inference. All this time, like people are deploying GPUs for training and all of that. And with the inference problem, people initially thought, uh, you know, inference is an afterthought. But what Jensen said was inference is an incredibly hard problem because of the kinds of uh spaces it has to address. You need low, low uh latency, you need high throughput. It is a big problem, and you need long context, and you have to work with so many different tools, and it has so many different use cases, and it has to work on so many different kinds of hardware. So it's it's actually an incredibly difficult problem that now we are really uh deploying, and he calls it the agentic AI inference inflection. And so, yeah, we are here right now.
SPEAKER_03Yeah, totally. And okay, so what I liked also about the framing, and you made the point starting to call out where he talked, Jensen talked about these different tiers. The the question is like as you're inferencing and you've got the Pareto curve of throughput uh on the y-axis, uh, so you know, how many tokens can you generate at a time versus token speed or interactivity on the x-axis? And this is at ISO power. Um, and we very early on, you know, with uh these sort of AI supercomputers, first we were in the training area. So the talk was about like, hey, Nvidia has scale up and scale out networking, and you can have huge clusters and you can train faster. Then as it started to shift into inference, there was this simple Pareto curve, and it's just like, hey, no matter what you're optimizing for, we we own the frontier. So if you want the most efficient frontier, it you it's the cheapest or the most efficient to use NVIDIA. As you know, that's that we're the Pareto frontier, everyone else is behind us. Now, and what I really appreciated about NVIDIA is acknowledging that there's really one workload that matters at scale. 60% of the customers are CSPs, hyperscalers that are serving AI labs, and it's transformer-based LLM inference at scale. And but as we've seen over the last three months, it's not just about throughput or about like tokens per dollar. Because yes, there are some use cases where you're like, oh, okay, I'm fine with a 70B model that's good enough, and I just want it as cheap as possible. But then there's also use cases where you're like, no, no, no, I want um Opus 4.5 or 4.6. And so I want the biggest model. And oh, by the way, or I want the fastest model. And then especially with coding, you're kind of like, actually, I want the biggest model and the fastest model. And so Jensen and Nvidia started to break down that Pareto curve and they introduced this concept of tiers of like, oh, hey, maybe you have a free tier and it's very high throughput, very low speed, like Quen 3 was the example they gave. It's, you know, a decent sized model, you've got a very small context length, and you actually just give it away for free because it's like response to those simple questions for those simple use cases. And this is just like top of the funnel, get you new customers, get them hooked because it's some free inference. Um, but then you want to introduce tiers beyond that where it's like a little bit larger model, it's a little bit faster, it's a little bit longer context length. So maybe you have like a medium tier where now you're charging them$3 per million tokens. And then maybe you uh have a uh like a high and a premium tier, which he he said that you know, Hopper couldn't maybe only serve like those lower tiers. And then um Grace Blackwell unlocked these like higher tiers where you could have an even because of because of the um HBM capacity uh and the especially with the NVL272 scale up, now all of a sudden you could legitimately do inference and and do like really big models and have them fast and have like a pretty long context length. And then, you know, finally he he transitioned this very nicely when they introduced Grok, which we'll get to, which is to say, well, what if um 400 tokens per second is not fast enough and you actually want to unlock a thousand tokens per second longer? And so I'll pause there, I'll I'll get your reactions, but I I just want to say like what I liked was this was an implicit acknowledgement of like, we're past just GPUs are great for training and GPUs are perfect for inference, kind of one size fits all, to acknowledging like there are different use cases, there's different user demands from intelligence context, speed, and there's therefore we are going to start optimizing our system so that you can hit different points. And as we'll get into, he talked about like maybe you need grok to unlock some of those certain user experiences.
SPEAKER_01Yeah, so I have a couple of things I can say about this. So, first thing is like I like the free tier, you know, the way he described it, you know, everybody uses it to get a Quen 3 model or whatever at almost zero cost. This is great because this is like something I can get my mom to use. Like, just use a, you know, don't Google anything. Can you just like use an LLM to get your answer? And it provides a good enough answer. You know, she's not gonna be like, oh, go off and launch some coding uh, you know, task with some database queries. You know, she's impossible. Like she just wants to know, like, you know, uh basically some kind of a travel plan that she's planning to go next month or whatever. It's a perfect tier for that. I mean it's gonna solve, like you know, he's gonna rewrite the basic uh use case tier for everybody in the world. You know, everybody should be using this for at least basic queries if you're not already. So that out of the way, I think then we can go to the extreme end of grok, which is what um I've been saying to many people who ask me about this is that grok is it like you say, an ultra premium tier. Okay, it's ultra-premium because it is not that everybody needs this, there are only a few people who need it. And I think Meta initially said that they would like this to be like 10% of their inference needs. In Jensen's talk, he actually said like maybe 25% would be this kind of grok thing. Okay, so yeah, you've got some delta of numbers here, but we can say like bracket it, you know, 10 to 25%. That's all about that's all the ultra fast inferencing will need. And maybe it's the uh you know enterprise that you know needs this because for them time means money. And they are willing to pay for the ultra premium tier, and that will accordingly cost uh ultra premium token cost, which they are willing to pay because their ROI is significantly different from somebody like my mom was just trying to plan a trip. She does not need groc. So between these two use cases and everything in between, you still have a massive need for HBM. It's not going anywhere. Like I've had some investors ask me, like, what happened to HBM? Like, is Grok going to take over the need for HBM? Is the memory pressure going to ease off of HBM? I don't think so at all. Because for majority of the use cases, this is not the technology that will be most useful. And this by means by that I mean the SRAM-based thing. So it addresses a particular market where you, like you say, what if 4,000 tokens per second per user isn't enough? What if you want 1,000? What if you want 10,000? You know, so those are the kinds of questions that like the agentic AI world brings to the table. But uh there still for a majority of use cases, I think the uh HBM-based inference systems are here to stay. So all of that, where are Ruben GPUs, they're all not good going anywhere. And whatever else anybody else is developing based on HBM systems, those are all still going to be needed. So that's where I think this chart was really useful in setting the stage on what kind of inference is useful where.
SPEAKER_03Totally, totally agree. And and uh so should we dive more into like the full Vera Rubin system and and talk about all the components, including Grock?
SPEAKER_01Yeah, let's do that. I think uh the what this uh GTC introduced was the seventh chip. Uh like the last conference uh CES was actually like the six chips that he showed. Uh but this time it's like you got the GROK uh LP30 chip uh that is now part of a rack unit. So you can basically slot on slot in this compute tray with uh eight grok chips, I believe, uh into one rack unit, and you can build a rack of uh grok chips, which is which is pretty sweet. And uh I was closely looking at the picture of that tray actually, even has like a CPU and like four DRAM sticks. That's interesting. So it's not just like the FPU. There is a CPU on that tray.
SPEAKER_03No, so yeah, speaking of that, so I'm gonna pull it up on my side and look at it. Oh, I don't have the I don't have the screenshot right in front of me, but yes, there's the grob chips, there's a CPU. There was a call out of an FPGA, which I thought was interesting, and I was like, I wonder what the FPGA is used for. Would love to know more. Um, but I saw Patrick Kennedy from Serve the Home, has a bunch has a really good YouTube channel. He had tweeted, like, hey, that looks a lot like an Intel CPU. And people are like, well, how do you know? Like they never said anything about Intel CPU. And he's like, no, I guarantee you it's a yes, they didn't say anything, but I guarantee it's an Intel Xeon because I've just dealt with a lot of CPUs and it looks a lot like uh Intel Xeon CPU, the way that they uh have a heatsink on it and the way that they like kind of physically attach it in. And actually, down at this dinner that I went to right before, he was down there. And so I I was asking him about it, and he's like, Yeah, man, I don't know what it is, but just based on experience, it looks like an Intel CPU. So for all the Intel bulls out there, maybe maybe the Grok rack is gonna sell some extra Intel CPUs. But it's probably I don't remember which version of Xeon he said, it's probably an older version that maybe they don't have enough supply anyway, because Intel has some supply issues with their older CPUs. So we'll see how this all shakes out.
SPEAKER_01It'll be funny if it is actually Intel, because then I'm like, didn't you just say that like CPUs are all the rage? And you know, even Jensen said, Oh, I don't think I didn't think CPUs were going to be such a big deal, but they are. And so he's introduced basically this CPU rack, compute rack, which like has I don't know how many CPUs, four CPUs, two CPUs, I can't remember. Uh but yeah, there is a code like eight CPUs to a tray?
SPEAKER_03Yeah, in a tray. Maybe in a tray. I don't know. Someone had a great picture, a great tweet, and it was like, because I think there's like eight CPUs and just a ton of memory. It's just like all memory. And then someone's like, oh, I see why memory is so expensive, right now.
SPEAKER_01I see, yeah, okay. Yeah, so whatever how many ever CPUs are to a tray? Basically, now you can get these like CPU uh trays, compute trays, uh that are based on Vera systems, and you can build a rack of CPUs. It's nothing new. I mean, we've had racks of CPUs for years, but now the fact that you need a rack of CPUs tells you something about CPU demand.
SPEAKER_03Totally. So, yes, for people, I don't know if everyone has seen the keynote end-to-end, but basically now the Vera Rubin system is actually a full data center. And it has, yes, it has GPU racks where each GPU tray has GPUs and a head node CPU, but it also has a CPU rack, or maybe multiple, I can't remember, of just Vera CPUs, and that is for the agentic AI and orchestration processing. The head nodes all about like feeding the GPU and keeping it fed, but all of that tool calling and everything, you want it to happen on CPUs, but you don't have to reach out to a different data center or to the other end of the data center. You still want that CPU rack as close as possible. So NVIDIA introduced the Vera CPU rack, and I had some quotes from Jensen. He said it's extremely high single-threaded performance, incredibly good at data processing, extreme energy efficiency. He said it's the only data center CPU in the world that uses LP DDR5, which is like mobile memory. So therefore the performance per watt is unrivaled. He made sure to say it's in production. And then to your point, Vic, um he said, we never thought we'd be selling CPU standalone. We are selling a lot of CPU standalone. This is gonna be a multi-billion dollar business for us. And for reference, you know, I think NVIDIA or uh AMD and Intel do something in the order of like tens of billions of dollars a year in CPU revenue. So to say you're gonna do multi-billion is not too shabby.
SPEAKER_01Yeah. And I know he likes to play up the Vera thing, but it has good single threaded performance, but its Olympus cores are like there are 88 cores, and they have the equivalent of multi-threading. I forgot what the Their marketing term for it is. But yeah, basically it has like 176 threads. Uh when you compare that to AMD's Venice uh DENS that is due, I think, later this year, uh, it's it has like 512 threads and like extremely good single core performance. I mean, so there are some really competitive uh CPUs out there, and I like had a like a CPU um yellow pages that I created that I compared various data center CPUs according to various metrics, uh including memory. So if he's saying like LPDDR5 is the thing that is unique, I'm gonna go back to my yellow pages and compare it because I don't know why. It doesn't seem like it to me. I don't know, I feel like they have other things going on.
SPEAKER_03Yeah, totally. Yeah, you'll have to dig into it. And for listeners, like the number of cores, you can think of as just having access to more agents per CPU that you could run in parallel.
SPEAKER_01Yeah, that's exactly. Yeah, so maybe Vera system is not the greatest for like maximizing the number of agents, in my opinion. Uh if you need like one agent per one core or something, and you're just gonna have to put in more CPUs. Uh yeah, but they're like good fast CPUs. So it's not, yeah. So they are drawing a fine line here between what kind of CPU is required versus not. But uh yeah, I wrote in in my article, I I proposed that we might see a time where the number of CPUs might increase compared to the number of GPUs at the rack scale. And now when you start adding CPU racks next to GPU racks, the question is how many CPU racks you need to add. And it depends on what kind of workload you want to run. So there could be use cases where CPUs exceed GPUs in a rack scale system. It's my theory at least.
SPEAKER_03Totally. Yeah, totally. Yeah, no, I know. We should go count in his reference diagram like how many CPUs there were and get that updated CPU to GPU GPU ratio. Yeah, yeah. So exactly because it could get a lot closer to one to one. Definitely. Or more CPU than GPUs. Yeah, exactly, exactly, exactly. So to Vera Ruben in the full data center system. So now there's GPU racks, there's a Vera CPU rack. Um, there's a if you want, you can get a Grok three LPX rack, which I honestly was a little confused because it he called it Grok 3 LPX at first. And I was just thinking, like, oh, they've only got Grok 1, right? Like just they've only had one chip. Um, it was, you know, some sort of like 14 or 16 nanometer chip. Um, but and I think it was maybe fabricated at Global Foundries, but then on this call, he talked about Grok 3, and there was a shout-out. He's um Jensen said, I want to thank Samsung, who manufactures the Grok LP3 chip for us. They are cranking out as hard as they can. I really appreciate you guys. We'll ship in the second half about Q3 time frame. I know that Grok had publicly stated they were working on a next version of chip and that they were working with Samsung on that. I think maybe Samsung 4 nanometer, maybe. Yeah. Uh don't quote me. Um, so this seems to be that next iteration. But it was a little surprising that it wasn't just like grok two, but it's grok three. I I don't know. Would love to hear more. But um let me tell you some other quotes. I think on the Grok one, there was a picture where Jensen showed nicely, hey, we're gonna do pref. Well, first he said, um, hey, Grok has been attractive to me because uh, you know, it's a deterministic data flow processor, it's statically compiled, the compiler schedules it, the compiler figures everything out, um, the compute and data data, it's all there right at the same time, so you're not waiting on memory. Um, there's no dynamic scheduling. Therefore, you can just have SRAM and the computation just flows through and it's super low latency. And he said, you know, obviously this is designed just for inference, ultra fast inference, um, just one workload, but guess what? That workload is the workload of our era. So woo-hoo. Um they, you know, they're they're good to go. But then he did say, but guess what? Grok only has a tiny amount of SRAM per chip. So the downside is it takes a lot of chips, and that has historically limited Grok. Oh, you want to run a 70B model? That'll be 576 chips, oh, you know, spread over whatever, 12 racks or something. You're like, wow, that's pretty expensive to run a 70B model. But then Jensen framed it as like, we NVIDIA with Dynamo having that software layer to decouple pre-fill from decode. We essentially unlocked Grok to be the decode stand-in. If you recall, they had the Rubin CPX skew, but he didn't talk about that at all today. Now it's Grok LPX as the stand-in. And so he said, guess what? Now all of a sudden we have unlocked Grok, and it's okay if it takes a ton of Grok chips. You'll just have a Grok server. And then he had a nice picture that said, we'll do the pre-fill on Vera Rubin, tons of HBM. Um, we'll also handle all the KV cache and do the uh attention mechanism part of the decode. And then we'll just send over the activations to the Grok rack. It'll do all the decode, um, the feed forward network part that happens sequentially, because that's bandwidth limited. Because but since grok is all just SRAM, it's super high bandwidth, and then it'll spit out the tokens. And so I kind of like how he both acknowledged why grok had a shortcoming and couldn't really take off on its own, but then acknowledged how it fits in perfectly with what Nvidia was building. And so that's why, you know, they like and he was very specific. He never said they acquired Grok. It was like we licensed Grok's technology and acquired the team essentially. Um but did you hear the piece about him? The question is how do they communicate? Um, is it over NVLink or something? And he said, Jensen said, um, the two systems work together, tightly coupled today over Ethernet with a special mode that reduces latency by half. And so I thought it was interesting that it's over Ethernet, which feels still kind of like scale out.
SPEAKER_01So the although he said like it's Ethernet, I think that eventually it's going to become NVLink, or more specifically, like NVLink Fusion, because the ability to use like other hardware maybe become important in the future. I don't know. Like if Nvidia wants to maintain compatibility with other platforms in the future, it could be useful that this is actually NVLink Fusion. We'll see.
SPEAKER_03Totally. Yes, yes, yes. So to that end, right, if it's NVLink Fusion, maybe you could plug in any hardware at the other end. It could be Grok's next versions. They they called, so it's LPX, but the first version, which was Grok 3 LPX, was also called LP30. So maybe the X is a stand-in for 30, LP30. And then I think they teased an LP35 and an LP40. I don't know if I wrote that down. Um so there's clearly more Grok chips coming. Um, but but presumably it could be other accelerators because Jensen did hammer home that they are fully vertically integrated, but they're also horizontally open. So if you only want certain parts of their stack, you could do it. So in theory, maybe you could do pre-fill with NVIDIA, and you could do decode with something else. Yeah, maybe some other AI startup. Or or the other way around, you maybe you could do pre-fill with um Intel Crescent Island or something. Uh, and then you could do decode with NVIDIA. Who knows?
SPEAKER_01Yeah, yeah, yeah. So and maybe they want to have the optionality to swap out uh Intel CPUs in their CPU rack. I don't know. Yeah, it's interesting. You'll see how it goes. And one of the things, other things is like he declared that like Spectrum uh CP spectrum networking is now uh in production and it uses CPO at scale. So that's cool actually. Like we have now a sign that CPO is no longer the mythical beast that has been you know always threatening to come but never did. But now it's like I don't know, something's in production and it says CPO on it, right? I mean that has to count for something.
SPEAKER_03Totally right. Okay, let's talk. Yes, let's talk CPO, let's talk scale out and scale up. So when Jensen first introduced CPO, he was very clear to say that it was for scale out. So he's right, I feel like I feel like he was very clear about that saying it's CPO but for scale out.
SPEAKER_01Yeah. It was, yeah, it's for scalable. And then he then he went on to talk about the whole optics versus copper debate. So I'll let you continue. You have a good way of saying it.
SPEAKER_03Okay, for sure. I'll but feel free to jump in. So there was a roadmap slide where Jensen was showing their future like Oberon, which he accidentally called Opteron at first at first, but which was funny. But um Oberon, um, and he he was very specific about saying, you know, hey, there's a lot of talk about copper versus optical scale up, and where is uh NVIDIA gonna land? And he basically said, we're gonna do both. So um let me see if I don't, yeah. Uh Kyber is gonna have copper scale up. And then he said CPO scale up, which I felt like maybe was a slip of a tongue, but then he said, you It's not actually here is how it works, right?
SPEAKER_01Like I'll tell you what he was like talking about here. So when you're talking about uh scale up, he brought out that NB link cartridge spine, right? That that heavy spine thing that he always shows. Yes. And he brought it out. I'm like, look at all the copper cable in this. And I was thinking to myself, oh yeah, look at all the copper bulls now going, yeah, look at all that. Jensen's still holding up copper spines. That's good. So then he was like, okay, this is the what has been used to scale up the Oberon rack, the NBL72, and that's the copper that connects everything uh in the rack. And then he went on to like, hey, do you want to see Rubin Ultra? Right, and then he comes up and then he just like summons, like he's you know, and the the thing appears from the ground, and it's very dramatic, it's awesome. Uh so then the Rubin thing shows up, and then Rubin Ultra shows up, and he's like, Look, this Rubin Ultra is not a rack that goes in horizontally, it goes in vertically, right? And we saw this in the Kuiper uh rack pictures, if you've ever seen it in the past. It goes in vertically. Yes. The reason it does that is because instead of the giant spine that connects stuff now, they have what is called a mid-plane PCB. Okay, it's a honking big PCB. Like I've never seen a PCB this big. Like usually PCBs are something you can hold in your hand. This thing requires like two people to lift this PCB, okay? And from what I've seen, this mid-plane PCB has like 52 layers and is like a work of engineering, to be honest, to even manufacture a complexity, uh a board this complex, you know, is fantastic. And so what this does is in the kyber rack, this thing will plug into the mid-plane PCB. And think of it like instead of all that copper cable going up and down the rack, now the mid-plane PCB is like the burger patty, where on one bun is basically the compute, you know, uh the you know, whatever is plugging into this side of it, and the other side of it is like the back-end network thing. So it's like you know, you've got two buttons, like you've got computer networking, and in the middle is this mid plane PCB that hooks it up. So you can run really fast uh copper interconnects because of this mid-plane PCB approach, which is awesome. So far, so good. We're still a copper, right? Now, both two systems, uh, Oberon and Kyber can scale up to 576 GPU domain. How does it do that? So basically you can take NVL72 and put all these racks next to each other and connect them with optical links. And so what that happens is now you have it's still scale up. Just because it does it's it's different racks doesn't mean it's scale out. It's still scale up because it's within the same part. But you have optical connections between all of them because you need the reach and the speed. And for the same thing in the Kuiper rack, you have 144 uh GPUs to a single rack. And so you can what they call a canister in that terminal. So you can hook up four canisters together with optical links and get to the 576 domain. So the copper is both in the scale up, you know, and optics is a scale up. So that's how this whole thing is. It's very cool, actually.
SPEAKER_03Yes, yes. No, that's good. Yes, exactly. The 576 specific specifically is where you're right, where he's saying, okay, within a canister, it's copper scale up. But then between these canisters, it's optical scale up, but it's still scale up because it's all one domain. So they're all like doing the remote memory access and thinking that they're all part of a system and they're sharing their HBM, even though they're literally separate rags next to each other.
SPEAKER_01Yeah. So this is why you know scale up has both copper and optics.
SPEAKER_03Yes, yes. And then still in that scenario, optical scale out as well.
SPEAKER_01Yes, that is that is a thing, right? I mean, optical scale out is uh definitely the future or is here. The future is here.
SPEAKER_03It's exactly, exactly, exactly. So so um so then CPO still fits in the optical scale out there, co-packaged optics.
SPEAKER_01I what I'm not sure of is that when you're connecting these canisters together in the Oberon or Kyber rack, is that a pluggable transceiver or is that CPO? Because he said that it's going to involve NVLink 6 to do connect between those canisters. And NVLink 6 does not have CPO. Because you need to switch hardware to add CPO if you're going to connect this thing with CPO. So it's gonna be pluggable transceivers for now? Because I'm guessing it's not like that many connections as if it's like scale up uh like within a rack. That's a lot of cables, but within between canisters, maybe it's isn't that many cables. I don't know. I have to count the number of cables that will go between canisters and how many of them are optics, and how many pluggable transceivers will be used there, and what is the power consumption of each pluggable transceiver, and when that will go to CPO. Because I know that people who are gonna listen to this are gonna ask all these questions. I don't really know. Right, totally.
SPEAKER_03I'll I'll try to take out pictures and then and then you try to you know just reverse and engineer it from like the power and you know thinking about it from first principle. It's a good figure it out. Yeah. Totally. So, okay, back to Vera Rubin system. So you can have the GPU compute, you could have the Vera CPU rack. Um there's gonna be some CPO switches, you could have the Grok rack, you could with the the different uh once you're on kyber, you know, you could even like have a huge scale-up domain. The last thing that was mentioned was this STX rack with Bluefield 4. I at first I was a little confused by that, but my inter because he was talking about AI native storage, and I was like, wait, I thought this was ICMS, like that we've been talking about, this like massive, you know, inference context memory storage server, whatever. Um but then I kind of left, and you clarify for me, but I kind of left thinking like, oh, maybe ICMS is an implementation of this STX rack, but how did you interpret it?
SPEAKER_01Yeah, I think that's what it is. Like I think you can put this STX racks in a separate STX trays in a separate rack and build out like an ICMS storage unit with DPUs in it, uh, and then put it next to all these compute racks, GPU racks, uh LPU racks, you know, networking racks, and now you have a storage rack uh built with these STX. That's what I interpreted it as, but he said very little about it.
SPEAKER_03Yeah, it was just confusing that they like would change the name or abstract it or something.
SPEAKER_01Yeah, yeah. Storage, I think the S stands for storage, right?
SPEAKER_03There you go. Totally. So, all right, what next? I feel like we hit on Vera Rubin pretty good.
SPEAKER_01Yeah, yeah.
SPEAKER_03What else stuck out to you?
SPEAKER_01So um I think that's like the meat of what ended up happening. What other in like thing I took away from um the keynote here was that the Feynman CPU is gonna be called Rosa for uh Rosalind. And uh we were discussing who this Rosalind is. Do you have a theory?
SPEAKER_03Well, I thought it was Rosalind Franklin, some sort of English chemist, x-ray, crystallographer. Um, I remember hearing her name having to do something with DNA and RNA. So that's who I thought it is. But who who did oh, and apparently, according to Wikipedia, she died when she was 37, which is sad. Um But what who did you who do you think Rosa is or Rosalind?
SPEAKER_01Um I don't know. I've been I've been trying to find out, but uh, you know, I think we will let's go to NVIDIA's company blog because I found you know what that's the best source, right? Yeah, for sure. Yeah, they actually say it there. So it's actually you are right, it is named for Rosalind Franklin uh for extra crystallography. So yeah, you win that, you know. Uh you win this round of Jeopardy. You're the Jeopardy champion. So you were right. Yeah, that's what I was I thought it was somebody else, but uh I will take it from NVIDIA's blonde. So you are right.
unknownThere you go.
SPEAKER_01But that's good to know that after Vera we have Rosa. So uh we have a name for the next CPU. Kind of interested to see what's in there. But that's in the Feynman uh architecture.
SPEAKER_03Yeah, it'll be fascinating to hear what's architecturally different because now they'll be fully designing it for the agentic AI era.
SPEAKER_01Yeah. I think the next thing that we should like briefly mention, I think a couple of things, like just to wrap up this. And now beyond this, you know, if we make this whole episode two and a half hours, people might have as well watched the GTC. Why would they watch it one? So let's keep it quick. Okay. So we have two more things. Like one is um their idea of DSX, which is, I would say, a digital twin platform that is going to be useful for uh like future building of AI factories. Very broad concept, but this is uh seems like something that they are uh introducing as a new platform for the digital twin universe.
SPEAKER_03Yes, yes. I thought this was super cool. Like the way I felt uh Jensen positioned it was like, hey, all of a sudden we have to start to work with all these new companies as we're manufacturing data centers. And how do we best work with them? Oh, we decided let's just create a simulation in Omniverse, and we can just simulate the whole thing and essentially just co-design it together in Omniverse, run all the tests, the thermals, the power grid load, and everything, and then we can all agree, like, oh yeah, this seems to be working in simulation, and then we'll build it. But but not only can you design and simulate it and make sure you build it according to spec, but then it sounded like you can close the loop. And when you're actually operating the real data center, you can feed the data back into your omniverse simulations. You can have real data like from the grid, from whatever thermals you're measuring and stuff actually feed back into your simulation to help you like refine it and continue to iterate on it. I just thought this was pretty wild.
SPEAKER_01It seems pretty wild, but I don't know. Like many times I've I've been to many like software tool presentations in my lifetime uh as part of a career, right, in this stuff. And when I actually end up using the tool, when I see the demonstrations, it's always like amazing. Like, wow, look at all co-optimization and the core design. But ultimately, when you get to working with it, you're like, oh, I guess this export doesn't work, that interface doesn't work, this doesn't, and like there's so many bugs and internet. But yeah, in theory, it's a pretty good concept. I don't know. I want to talk to somebody who actually uses this platform. I want to get some, you know. If anybody has actually used this omniverse, you know, let us know. Like leave a comment or something. Like, really, I don't know how this digital twin emulation thing even works on a data center scale. It seems like you can simulate everything from like power to like GPU usage and to tokens and then feed it all back. And it sounds too amazing to me.
SPEAKER_03It is pretty meta to be like, yeah, you could simulate. So it's like, okay, well, what model are you running on those GPUs?
SPEAKER_01You know, like are you done? Yeah, it sounds a little bit like if only EDA tools were like that amazing. Uh yeah, I don't know. Yeah, anyway, but cool idea. Cool idea.
SPEAKER_03The future possibilities of simulation. I think there's a lot of real-world manufacturers who make things and who could be iterating a lot faster by simulating, but they're actually just making the real thing and finding out it doesn't work and then iterating and making another real thing, you know. Um for example, wind tunnels with car design. It's like, oh yeah, you could make a million-dollar wind tunnel, or you could just simulate it all and then like change the shape of the mirror and see what happens. And so, yes, anyone who's an expert in simulation, I'd love to learn more. So hit us up for sure.
SPEAKER_01Yeah, so going down the software part, let's talk about the last one. And I want to mention something. Like, I think Nvidia needs somebody who can make up acronyms properly. Like they don't think this stuff through. Okay, when they introduced their uh lithography platform, they called it Culito, which I guess like a certain like Spanish speaking people in the audience told me it means little donkey or something like that. Serious? Yes, that's hilarious. And now they're coming up with this memo claw, open claw thing and calling it an agent as. A service, like what do you think that spells? Like, really, like think through the acronyms, okay, before yeah, anyway.
SPEAKER_03I feel like Jensen even said something like a gas. They're gonna move from SAS to a gas, to a gas, not an agent as a service, yeah.
SPEAKER_01Anyway, uh, we keep this like you know, child-friendly, you know, because I think a lot might want to play this with kids in the car, okay? So yeah, there yeah, everything is OpenClaw. He showed a like a graph of GitHub stars uh for like Linux versus something else versus OpenClaw. OpenClaw, GitHub Stars is like a vertical line. Literally, like it has a vertical line that uh Linux took 35 years to achieve the same number of GitHub stars or something like that.
SPEAKER_03Yes, well, to I will say to that end, Linux existed before GitHub stars. It's kind of like how your favorite old bands aren't big on Spotify. You're like, yeah, because in the era in the 70s, they weren't on Spotify, you know.
SPEAKER_02But I completely agree.
SPEAKER_03But definitely, obviously, open claw, definitely going vertical is really cool. Uh, and for for all the good reasons. And so I thought this the Nemo Claw thing was interesting. Um, the idea, I think, you know, I I actually loved it. Let's see if I wrote it down somewhere. I don't know if I included it in my notes. Um, but basically Jensen was like, yeah, uh yeah, open claw, it uh where is it? He let's see if I have this quote because it was so good. He's like, Oh yeah, yeah, it can access sensitive information, it can execute code, and it can communicate externally. All right, chew on that again as if you're an IT security person. It can access sensitive information, execute code, and communicate externally. So obviously, he said, like, obviously, this possibly this can't possibly be allowed. So it sounds like NVIDIA came in and helped their version their version, you know, I don't know if they just forked OpenClaw, but then they tried to like patch it up and make it secure and give it guardrails and stuff like that so that it's like enterprise ready, which I think is pretty cool. Would love to learn more there because I I definitely think I like I've been doing a lot of agentic AI myself, and it's all like internal software, if you will. I'm just trying to like automate my life and my work because there's so much like manual stuff that I do or things that I would love to do that I just can't, but if I can write throwaway software to do it for me, awesome. Let's do it. But I'm definitely scared of the idea of open claw, you know, just having access to like my um iMessages and stuff like that, you know. Uh so I I definitely can see the real, this is like a real need. I know San Francisco people probably don't love to think about it, but like for enterprises, this is like make or break. This is like Microsoft, like right up their alley, you know, like enterprise stuff.
SPEAKER_01Yeah. You know, I have a I'm gonna like say something that is uh not popular opinion uh and kind of contrary to what everybody believes in this software world. Because everybody's like, oh, this you can write throwaway software, anybody can build these platforms. So what's the use of SaaS and this and that? But yeah, I buy it, but like I feel like sometimes the cost of building a tool like this in just terms of token usage uh is actually outstrips what I can pay for a SaaS service.
SPEAKER_02Yes.
SPEAKER_01Like I was trying to you know build a bunch of these news aggregator things, right? Like we we read so much of the news, and I wanted to have a method in which I could like parse it and collate all these sources and show it to me. And I built all this stuff, like I made like Claude Code do all this stuff and it writes into my Google Docs, and I can just like come and look at it every morning and I know all the news. It's very cool, but then I had to upgrade to the like uh you know the max plan because I used all these tokens and it's like it's$100 a month and it's playing all that stuff. But then I was like looking at some you know services that do this, and it's like Feedly Pro, for example. Like has it's like I got it for like a hundred bucks a year, and it does all this AI aggregating stuff. And so it's just like there is a marginal benefit of making tools yourself with these coding tools versus having a tool that I can pay for without the token cost. So there is a cost to doing software tools yourself, you know?
SPEAKER_03Definitely. There is a token cost, a financial cost, and there's obviously a time cost and an opportunity cost. You need to focus on what makes your beer taste better. And so, like writing your own news aggregator or your own CRM or your own back-of-house stuff, it's like, are you really gonna want to become an expert in that or you're just gonna wanna do what you do best, right?
SPEAKER_01Yeah, exactly. So if there's something that doesn't exist for what you do, because maybe it's a niche tool that or a niche area that you work in, and there's no tooling for this, but you kind of know what the tooling flow that will help your work is because you are an expert in that domain, then it makes sense to go agent code yourself something up, and it's a great productivity enhancer. But just like declaring like SaaS dead or something because of this is like a is like a weird stance to take because I don't think everybody can code up a feedly or or a CRM like HubSpot or something. Totally.
SPEAKER_03Nor do they want to commit people to maintaining it and adding features and stuff. But now what you need to do. Yes. So what happens? So I I worked at a company once, uh, a very large grocery retailer, and they made all their own back of house stuff. They should have been running it just on like SAP or whatever for like inventory, pricing, all that stuff. But they're like, oh, we have a couple bespoke features that are really important to us and because we operate as a business differently than other people. And if we go use like SAP, they don't really support it out of the box. So we'll just have to pay someone gobs of money to customize on top of SAP because we are just one customer. So SAP is not going to build this feature for one customer. And at that point, if we're gonna have to build a bunch of software on top of SAP, why not just build our own software? Now, it came, there's there's trade-offs. So it's like, okay, now you have to have teams of people for the rest of your company's existence to create this software which you could just be using SAP. But they they did have this like dilemma where it's like, yeah, but we want these custom features and we're just one customer now. Anyone like that can just go to SAP or whoever and say, excuse me, I have these custom features. And now SAP, if they're using AI internally, they'll go, cool, we'll just create that for you because yes, you're only one customer, and that used to, it used to not be worth it because we'd have to pay a few engineers to build that, but you're only going to upgrade a little, like pay in a little bit. But now we can just have one intern vibe code that essentially, or whatever. And now the cost of writing your custom features has come down. So we can customize this for you. And so it's like win-win that all the customers who could never get high enough priority on the product manager's backlog can now get the bespoke things they want. And the the SAP or whoever, the SaaS company, can have potentially more customers and serve more of their needs because it's cheaper to write software.
SPEAKER_01Yeah, so this is a net positive for the software world in a sense, right? Because we had this recent SASO calsocalypse or whatever, like the SAS Mageddon. But yeah, like it it this is the case. And Jensen was also saying about how each company needs to have its agentic strategy in place. Like, how are you going to deploy agents agents within the company? I think that's a good point. Because, like you were saying, there's a security aspect. And the second thing is like, are you going to use it for like core development or are you going to use it for like uh bells and whistles products that can be quickly churned and satisfy one particular customer? Maybe they can even tier it as like a like a custom support package. Exactly. Exactly. That exists.
SPEAKER_03Yes. And I agree with you. I thought it was pretty profound to think through like, oh yeah, every just like everyone needed like a web strategy or or whatever. Like now everyone needs to have an agency strategy. Now I will say the only if I could give you know uh gentle feedback to NVIDIA, the the framing around like, um, hey, this uh throughput versus interactivity chart defines your revenue going forward. Every CEO is gonna think deeply about that. I thought, like, yeah, that totally resonates for Dario and Anthropic when your business is selling inference as a service, like selling your model as a service. But most businesses at the end of the day are not selling like tokens as a service, where right, they're still selling tractors or groceries or uh hotel rooms or something, or Starbucks, like coffee, right? So, yes, I I definitely believe that every domain will be using AI as an input and there'll be transformed AI, but the whole like your business model will now be a reflection of how many tokens you can generate, and therefore you need to think about that is like it's just not right. It speaks to the 60% of their customers that are like hyperscalers in AI labs. Because if you listen to that Dario and Dwarkesh podcast, literally Dario said this where he's like, Yeah, I've got only so much money I can spend per year, and I need to spend enough on RD so that I can unlock like the future models. And then I need to spend as much as possible on inference because if I don't spend enough on in like, because that's where I make money, right? The RD stuff is just for next year. And and he's literally like, he'll make as much money as he has inference compute. If he has this much inference compute, he'll make this much money. If he only has this much inference compute, and of course, if he doesn't calculate it right, he could like not make enough money. And that makes a ton of sense for a few customers, but for those other 40% that are just your enterprises, right? Like they need to have a different message and talk to them. And and frankly, I think, and this is a conversation from another time. We've been now we're going on Jensen time, we're probably almost on the three hours at this point. But um, I need to hear more about local AI, on-premise AI. Like, um, talk to me about why I want to buy an RTX 6000 Pro because I want to spend five million dollars a year and just get as much inference as I possibly can. I don't want to spend$5 million and realize that I blew it all in the first month because Opus is expensive, right? And so, like, maybe I do want to buy a rack, or maybe I want to buy DGX stations like we talked about before. So I'd love to hear more from Nvidia on the 40% of customers and how they're thinking about like CapEx versus OpEx and planning their budgets. And in light of this, AI is going to transform their business. And I maybe here, last thing. Um, I love this quote. Jensen said, hey, every engineer going forward is going to need a token budget. They'll make a few hundred thousand dollars in base pay, and then I'll give them another half of that on top as a token budget so they can be amplified 10x. It's now a recruiting tool in Silicon Valley. How many tokens come with my job? Because those tokens make them more productive. I can relate to that because I use the$200 max plan on Anthropic cloud coding all the time. And if like if that goes away, I'd be like, uh, I'm going to a new job where I can have that, you know? Um, and and by the way, I'm making stuff that runs in AWS and it's using Anthropic's API. And and in one fell swoop, all of a sudden I'll just use 50 bucks and I'll be like, dude, you re-ran and reprocessed all my data? Come on, crap. I need to put I need to put some guardrails around that because that was super expensive. But I can see like all of this stuff is making me more productive. Um, but I will say, on the other hand, like, okay, do people want a token budget that they're gonna go blow with OpenAI or with Anthropic? Or could you just say, like, hey, for every engineering team, we're gonna buy you a rack or a DGX station or something, and then you divvy it up and keep it as busy as you can, right? Because maybe you can actually know exactly how much you're gonna spend and get all the tokens you need out of that.
SPEAKER_01Yeah, that is the way it's gonna go. Because for one simple reason, like if you're gonna spend like I've I've been reading all these people who are saying, like, oh, that that that small team of people is like using$500 a day,$1,000 a day on token costs. And it's amazing because they don't have to hire a person. Because, you know, I mean, that's$1,000 a day is a little, you know, a little much because you know, in$300,000, I guess you could hire another person. But the only thing is like this AI agent doesn't sleep, so it's kind of a benefit. Like a$300,000 employee needs to sleep and needs a little work-life balance. Yeah, totally. This thing doesn't sleep, so it's a good bet, I suppose. But then like ultimately, there is this becomes an operating expense. If you buy a DGX system and staff up your uh on-premise AI, that is going to be a depreciating capital asset and longer term makes much more sense as you know, this is how you deploy it. And this token budget is not really a thing. It's gonna become more like a like a tool usage, a license uh tool usage. Like you buy so many licenses for your company and you scale up the licenses if you're using too many, or you scale them down per year, right? So the same way you will build up certain infrastructure, if you need to add infrastructure to that and people are using more tokens, you'll add another DGX rack, and that's how you're gonna go. Nobody's gonna negotiate, like, I'm gonna get you 10,000 tokens per per day, and this is how your job is signed up. You know, nobody really knows how these tokens are even like do you know when you which job of yours is using more tokens versus less tokens?
SPEAKER_03Like it's very yeah, it's very hard to see that. You don't know that upfront to I'll accidentally burn$50 a token, I'll walk away from my desk, come back, refresh, and be like, oh crap, I had no idea, you know.
SPEAKER_01Yeah, exactly.
SPEAKER_03So I don't buy this just dent. I do I do believe in Silicon Valley, you know, but I think the rest of the world's gonna think a lot differently. It's basically like, you know, go talk to a CFO somewhere on the East Coast or in the Midwest or in Europe or somewhere, you know, and they'll tell you like we're thinking about this differently.
SPEAKER_01Awesome. We've come up on an hour. We did it. We didn't hit two and a half like Jensen.
SPEAKER_03There you go. All right, man. Time for me to go to sleep. This was good. Uh, people, if you listen this far, we'd love to hear your feedback. Um, thanks for listening to Semi-Doped. Go, you know, follow us on YouTube and comment, send us emails. Thank you for listening to Spotify, Apple Music, everywhere you find us. Thanks for checking us out on X. And uh yeah, we're we're always here to hear what you want to know. So, so send us requests for future podcasts. We'll catch you next time.