Is Intel Finally Back with a $300B market cap? OpenClaw can Dream? Artwork

Semi Doped

The business and technology of semiconductors. Alpha for engineers and investors alike.

Semi Doped

Is Intel Finally Back with a $300B market cap? OpenClaw can Dream?

April 10, 2026 • Vikram Sekar and Austin Lyons

0:00 | 34:23

In this episode, Austin and Vik discuss if Intel is finally back with CPU partnerships with Google, and heterogeneous inference with SambaNova, while market cap soars above $300B. Vik tries to get his OpenClaw instance to dream every night.

Chapters

00:00 Anthropic's New Direction: Chip Development
02:30 Navigating Subscription Changes and Token Costs
05:25 Exploring Alternative AI Models
08:10 The Economics of AI: Rent vs. Buy
10:56 Intel's Resurgence and Market Dynamics
15:23 Intel's Strategic Partnerships and Market Positioning
19:37 The Role of IPUs in Modern Computing
25:08 Coexistence of x86 and ARM Architectures
29:55 Innovations in Chip Architecture and Future Prospects

SPEAKER_01 0:00

Now that they cut me off from using my $200 plan to run OpenClaw, I need to go buy tokens. And if you go buy tokens from Anthropic and use it for your own OpenClaw setup, it's gonna burn through stuff like crazy. I did it a little bit, it used up like five or seven dollars in five minutes. I can't do this, man.

SPEAKER_00 0:25

Welcome to another semi-doped podcast. I'm Austin Lyons of Chipstrat, and with me is Vic Shaker from Vic's Newsletters.

SPEAKER_01 0:32

Anthropic has been doing so many different things right now. I don't know where to start anyway. But you know what? The biggest thing, the problem is right now, is that they cut off my Cloud Max plan from working with my OpenClaw setup. So I'm like all out of tokens now.

SPEAKER_00 0:48

Yes, yes, yes. What okay, tell people in case they don't have context what you're talking about, but then tell me what are you doing? How are you getting around this?

SPEAKER_01 0:56

Suddenly on April 8th, they sent an email saying, like, hey, if you're using your subscription plan to use agentic harnesses like OpenClaw, you can't do that anymore. You need to go and buy API credits like the rest of the users or whatever. You know, seriously, like for $200 a month, you can do so much OpenClaw on a frontier model like Opus. I had like such a good time. Like it was it is so good at doing a lot of stuff, you know, and setting up like some coding things and being my virtual assistant and a very clever virtual assistant. And, you know, it scrapes all my new stuff for me and it keeps me really abreast on all of the stuff that we need to talk on the podcast or write on a on the Substack and all that. So it helped me a lot. Like Opus was a great model to run at the frontier level models. But now that they cut me off from using my $200 plan to run OpenClaw is a problem now because I need to go buy tokens, and if you go buy tokens from Anthropic and you know use it for your own OpenClaw setup, it's gonna burn through stuff like crazy. I I did it a little bit, it used up like five or seven dollars in like five minutes. Yeah. I can't do this, man. I'm not, you know, I'm not that, you know, that level of billionaire to like just burn tokens this way. I can't do it. So what I did was Yeah, I mean, I love frontier models and all, but I consider my own intelligence not frontier. So I'm gonna make my open claw kind of match my own level of non-frontier intelligence. And Fireworks AI has this thing called like Firepass, which allows you to use like $7 a week. And it gives you basically unlimited tokens to uh use Kimi K 2.5 Turbo, which I'd say is slightly worse than Sonnet, but you know, it does something, it does a non-zero amount of work for me, which is just fine actually for what I can afford to spend on agentic AI systems like this. And uh what I did recently this week was I turned on this thing called dreaming mode in OpenClaw, which is in its last latest release. So what it does is all the instructions and experience your agents have with you, like you're telling it, hey, do this, do that, I like this, I don't like that, all its daily experiences are logged to a file. And during dream state, it goes and pars through all the experiences that is written to a markdown file and then converts them into like longer-term memories that is written into memory.markdown file. I am betting on the fact that the more I use my open class setup and all my agents within it, I'm going to get better at telling it what I want, and it will learn better what I would like to do with it. Uh and our relationship, AI human relationship, is going to like blossom. And then it's going to do exactly stuff that I want for a reasonable amount of token cost. Because not every project of mine is like frontier level intelligence kind of you know task.

SPEAKER_02 3:54

Yeah.

SPEAKER_01 3:54

I'm hoping this works. Let's see how uh, you know, I turned it on like yesterday, but I don't know how much it has to dream to be useful. Let's check back in a month. I don't know.

SPEAKER_00 4:02

Totally let's. Yeah, conceptually it's super interesting. So there's there's a lot here to unpack actually in in your anecdote alone. So, you know, first of all, what I'm what I was hearing you saying was that when you started with OpenClaw, it got to log in and use your Claude code or Claude Max subscription. Let's say $100 a month, $200 a month, $200 a month. And you got a fixed amount of tokens, although you didn't know it, but behind the scenes, your $200 bought you some amount of tokens, right? Yeah, yeah. Um and but then they said, oh, wait a minute, probably they're subsidizing it. So you probably got $1,000 worth of tokens for $200 or something. And then they're like, okay, that doesn't matter when it's just humans using it because you're not going to use $1,000 worth of tokens. But now when it's the machines and the agents going all the time, it's like, oh crap, people are burning through this $1,000 of tokens for $200 or whatever. It could be way more. Yeah. Yeah, yeah, yeah. So now then they told you, hey, Vic, you cannot let your open claw pretend to be you and log in. Like you, the human, can keep using it, but your open claw has to use the API. And by the way, those are fairly priced on the market. And you went and used it, and then you're like, oh crap, uh, that's gonna cost me a dollar per minute at this rate. You know, like I I can't do it. I was actually, you know, maybe I was using $20,000 a month worth of tokens. Um, but then what's interesting is now you're talking about, okay, well, I loved Opus. It's great because you're using Opus in the past tense, like goodbye Opus. We had a nice period together. I loved you. But I need to get to wean myself from you because you're too expensive now. Therefore, I need to get a cheaper drug or get a cheaper model. Now, what's interesting is I thought you were gonna say that you went to like open router and just started to try different, like cheaper API and paying via API. So instead of paying whatever, you know, $25 for a million tokens, you're like, I'll pay $3 for per million tokens. But but the service you talked about is a you're you're you're happy about committing to a fixed $7 per week. And but you're still sort of getting unlimited tokens, it sounds like, which I of course raises the question is this fire, fire, what was it called? Fireworks, are they also going to be losing money, you know, or is it because it's like a cheaper model, but it's it's like the same problem. But now you're you're you're like, woo-hoo, I'm back to unlimited tokens, but I can wrap my head around $7 a week, $28 a month, that's fine. Which of course, then that leads to the agent agentic AI agent computer conversation that we've been having, which is like, oh, interestingly, the value prop that you're most excited about is a fixed dollar amount for unlimited tokens. That sounds a lot as if like buying your own token generator, your own DGX Spark or whatever. Of course, there's problems with those, which we can unpack another time, which is like, does it have enough compute, does it have enough memory to do this locally? And so I thought it it was interesting to conceptually work through that because it's like maybe people don't like paying consumption based for their agentic AI, like per tokens.

SPEAKER_01 7:13

Yeah. No, two things, right? Yeah. I wanted to say one, I did consider running Gemma because the Gemma 31 billion parameter model needs like about 30 GB of RAM, which actually I don't have, but it is an easily attainable amount of RAM. If you buy a Mac mini that's like specta or something, it's not easily attainable. It is an expensive proposition, but it's a fixed fixed cost that I can maybe swallow for now. Yes. And uh pay it in the name of science and see what happens. Infinite inferencing and if it can recover my fixed cost one time, like I don't know how much I spent on it, 10 grand, I don't know, five grand.

SPEAKER_02 7:51

Yeah.

SPEAKER_01 7:51

Um if it can if I can make something with it, either via Substack, podcast, or running some kind of I write my own SAS tool because like AI SAS tool now. I don't know. So yeah. Whatever productivity enhancements that can make me five grand, period, over the lifetime of that machine, which could be five years or easily, is recovered. You know, it's it's a win already. So I'm willing to do that. I don't have that yet, but I really did consider running Gemma. And I I consider running smaller versions of Gemma as well. They have like the effective E4B and the E2B models. They seem to be really capable. And I think this is a trend that's going to continue to happen because these models, uh, the Gemma 4 models use like they have a lot of intelligence for the number of tokens or the amount of resources they consume. So that is a very promising trend, Gemma 4B. I just need a Mac mini, which I don't have right now. But if I did, I would totally run local inferencing, by the way. Maybe it's intelligent enough for my purposes. Second thing is, I really did go to open router and got a bunch of credits that I bought like $20, $25 of credits. And I was I've been using this Xiaomi MIMO V2 Pro, which is a substantially cheaper model. And if you look at the technology models that are like ranked by diff subgenres, like I don't know, the best technology model. This is number one on open router. Okay. It is the most number of tokens being used by people on open router. It's a Xiaomi MIMO V2 Pro. And they have the MIMO V2 Flash, I think. They have like a lower version of it. And it is supposed to be Opus-like intelligence at like a fraction of the cost. Yeah. Interesting. So yeah. Which so I don't have to do the fixed costing. I could go do that. I have credits to do that. I have my open class set up to change between all these different models. Okay. So I might play around and see what happens.

SPEAKER_00 9:41

Totally. Well, I mean, you ought to use as much of the free token, not free, but the unlimited tokens at the Sonnet level that you can, and then when needed, route to an Opus level. The question is, is any, is that Xiaomi one? Is that open source? It must be. I think it is, yeah. It is. So then the question is, are any of these open source quote unquote Opus like models actually OPUS frontier intelligence? I can think of it. And what is your use case? Yeah, and what is your use case? But is it good? Is it good? Is it better than your unlimited tokens, but you know, you're willing to pay? Yeah.

SPEAKER_01 10:19

And I think I was over-indexing indexing on how much intelligence I actually need for my tasks. Sure, totally. Don't think that you need Opus all the time. You really don't. Like there are models that will my virtual assistant, for example, is like really great. Like it pops up on my, I send it up with Discord. So it pops up in Discord, like, hey, you know, you had this follow-up email just letting you know, did you send it up yet? I'm like, oh no, oops, forgot. So I I go and send it and uh I just tell it, like, yeah, I'm done. So mark it off the list. So it's like, yeah, it maintains a markdown file on my Google Drive that you know keeps checking off stuff. And I'd be like, wake up in the morning, like, okay, tell me what I have on my plate today. So it looks through the whole week stuff and be like, hey, you did really well yesterday, highly productive day. But if you can knock these three things out in the morning, it'll free you up for like deep work later. You know, this kind of intelligence I get from like $7 a week. It's enough for me. It's like a massive productivity unlock for just like $7 a week. It's fine. I think it works.

SPEAKER_00 11:19

That makes a lot of sense. One other thing that you you had said about like, oh, if I buy like, you know, a $5,000 agent computer, that that's gonna be more expensive than your $7 a week unlimited tokens. Um, but what of course, what feels nice about it, this is where my head went, is like, oh yeah, but you get a physical thing, like you have an asset sitting on your desk that you're like, I feel like I bought something, even though it's gonna depreciate, it's gonna cost more per week. But then I thought, like, well, wait a minute. When you're done with it, you'll be able to sell it. And if you wanted to, recover that value. And unlike a typical used car, what did we learn like through COVID era? Was like, oh, you can sometimes sell a used car for more than you bought it, or it doesn't lose any value because of what? Because of supply crunches. And of course, you've got to buy an agent computer with a ton of memory in it. So that alone might actually mean you buy a Mac mini today and a year from now, you sell it, you use it and generate all these tokens, and then you sell it for the exact same price, right?

SPEAKER_01 12:20

So I totally did this. Um, you know, before the crypto boom, I've decided, uh, you know, around 10 years ago, I decided, okay, fine, I'm gonna play video games now. Because I hadn't played video games the longest time, and I was like, what am I missing out? Like everything looks so good these days. So I bought a big graphics card and put on my PC, and I played a like battlefield or something for like a year or whatever. And then I was like, okay, I ran out of time. Okay, life got in the way. And then the crypto boom happened, and then I'm like, I think I'll just tell my GPU and I sold it for like I bought the $300 GPU, I sold it for like $5, $30. Amazing. I was not even scalping it. That's how much people were paying for this stuff. I didn't even start. I'm like, how about $500? And they're like, great deal. Here you go. So I made a profit out of it. So now that's an investment. Invest in memory today, because haven't you heard memory prices are only going up? There you go.

SPEAKER_00 13:09

Yes, yes. So invest in a token generator.

SPEAKER_01 13:12

Yeah, this is the rent versus buy deal. You know, some people are renters, some people buy, you know, rent versus buy. What do you want to do? You want to pay rent to get tokens? It's fine. It gives you flexibility. You know, you don't like it, you stop. Yeah, yeah, yeah, yeah. You can't stop living in a house though, but yeah, you get the point. It's like a it's like a shaky analogy.

SPEAKER_00 13:33

Totally, totally. All right. So, okay, what should we talk about next? Yeah, you had an interview with uh Rainier Pope from Mat X, which is cool. Yes, everyone, if you haven't listened to it or read it, go check it out. Um, so Matix, if you don't know, is a startup that's making chips tailored to LLMs. Um, so I won't rehash it all here, just go listen to it. Rayner tells it all much more interestingly than I do. But he was at Google, he and his co-founder, and they left Google one week before Chat GPT launched. So they had this idea of like today's GPUs aren't like built specifically for LLM inference, and we think you can do it better, you know. But they actually started thinking this way before LLMs had even sort of been productized, if you will. And so the bit of a product bet and a bold bet. And but then, of course, a week later, after they left their jobs, then ChatGPT launches, and of course, for at that point the writing's on the wall. But go listen to it. It's very interesting. We get into some interesting technical stuff, but not too technical. So that's my plug for that.

SPEAKER_01 14:35

Yeah, Rainier's great to listen to. He likes I've seen him on uh the Cheeky Pine podcast before, and he's like a really good explainer of things. So yeah, definitely listen to it.

SPEAKER_00 14:46

All right, what else happened recently? Intel. Intel's back. Yay, yeah. Intel's back. Intel's having a good week. They've had a lot of announcements in the past week or two.

SPEAKER_01 14:56

I got the news today. Like Intel's market cap apparently crossed 300 billion, which is 3x up from about six months ago.

SPEAKER_00 15:06

It's it's amazing. But it's probably like flat for 20 years and then probably down, down, down, and all of a sudden, probably like up next six months.

SPEAKER_01 15:15

I mean, yeah, it's been up uh 50% uh just like this this month in April. That's a good sign, right? No doubt I was looking at it, like it's it's $60 right now, it was $40 like sometime back, a few weeks ago. So yeah, Intel's doing great. I think this is coming primarily from uh a couple of things. I think the Terrafab announcement was great. And your call uh in the Terrafab discussion we had in an earlier episode was cool because you were like, wait, is Elon going to like build all the chips for his Terrafab project by placing orders with Intel? Because that's a big deal. Even if it's not one terawatt, if he can like uh make it even like a 10-20% of that, that's still a big deal. It's a lot of chips, and that's gonna be like a big customer for Intel, and they really needed this big customer. So I think your call there was cool because I think there are a few other industry reports that are echoing what you're saying now. So maybe that's a really big positive for Intel. So that's one reason why uh it's it's going well. Then the other reason, which is like very recent news, I'd say like yesterday, is that Intel and Google have signed a new multi-year deal for like Xeon CPUs. So Intel Xeon is still back, yay, like it's not going anywhere.

SPEAKER_00 16:34

Yeah, which is good. I mean, okay, so I will say on the the Terrafab thing, you know, I'm just only speculating and saying that's the best case scenario for Intel. Of course, Intel's announcement was quite light on details. Who knows? Like they could literally just be helping Terrafab with RD, for example. We have no idea. But sort of like Occam's razor, like the the most simple explanation maybe is that one, well, what is Intel trying to be in the business of? It's like Foundry and they're trying to build chips for people. So if it's literally just for now, like we can help you build some chips or package some chips, that could make a lot of sense. And of course, it's very beneficial for for Intel. But on the uh Intel Google announcement, CPUs aren't dead. You know, they talked about deploying Intel Xeons, and then they also talked about the IPUs, which we can get into for a minute. But I will say that the CPU one, I need to go read it deeper because I had a lot of stuff going on today. But I was a little underwhelmed in that I thought this was gonna be something about CPU racks for GPUs or for TPUs or something, very tied to the AI boom. Like we need more CPUs for AI, which I will say you need more CPUs as like an extension of the head node in AI. But of course, as we've been talking about with agentic AI, we're just gonna need more CPUs, period, because now Vic Vic is now scraping X or having his virtual assistant wake up and hit some sort of CPU, some sort of cloud native traditional CPU somewhere, right? So all those data halls that are full of serving SaaS products and things, like there's just gonna be more CPUs anyway. But I tell this to say I thought in Intel's announcement, there's a lot of verbiage that made it feel like it was very tied to AI specifically. But then they talked about Google's already deploying Intel Xeons, and I went and clicked on something that was like Google Cloud Next, we announced our C4 and N4 machines, which are like Intel Emerald Rapids. This was from two years ago. And these are the like the specific sort of SKUs from Intel's portfolio that were mentioned in this. And so I'm like, wait a minute, this press release is about Google buying more Intel 7 CPUs, which are like older generation CPUs, which is like there's lots of good reasons for this. And Intel has had a CPU supply crunch where there was demand for their older CPUs and they couldn't make enough. My initial excitement of like, yeah, Intel's in the CPU game for AI, but then I'm like, no, this feels just like more the the CSP, the cloud part of Google, saying we want more CPUs and we're gonna keep buying these older ones. And then I was like, wait, that feels like way less exciting. It's you're still selling CPUs. And if everyone forgets, how does Intel pay its bills? How does it fund Foundry? How does it pay salaries? It's with CPUs. So that is good. But I I was a little like, maybe this isn't quite as exciting as I thought it was. But you tell me more on your read, and what do you think about the IPU conversation?

SPEAKER_01 19:37

So the reason it seems like this is more AI, tied to AI, and I they may even deploy racks of Xeons. I don't see why not. I mean, they could put it in the data center in Google and have them have more CPUs for every GPU there is in the rack scale. I don't see why it's not possible. But uh the IPUs are actually infrastructure processing units, which are essentially separate chips that they are co-designing with Intel that offload storage, networking, and security functions from the host CPUs. So which is which is cool because now you can put in more cores and more compute-related stuff than having to deal with all this maintenance administrative work, offload it to an IPU and then have like serious processing within. That's that's kind of interesting. That kind of tells me that then squeezing as much as possible out of a CPU, whether it's single core performance or more cores, is important.

SPEAKER_00 20:35

Yes. So this IPU, it kind of reminds me of AWS Nitro. It kind of reminds me of DPUs, and all of all of that hardware is about, like you said, offloading things from the CPU so that it can do more of maybe agentic AI or agentic AI or just serving APIs or whatever. Do like conceptually, I had this idea of like a lot of the network tasks should live in the network. And so that's what I like about these like IPUs or DPUs. Is it's like, dude, if it's anything related to networking, permissions, maybe even potentially, like handle all of that in the network. Don't even let that hit the CPU. Like that makes a ton of sense to me.

SPEAKER_01 21:15

Yeah, like it's like the DPUs, right? Yep. And this is needed for uh handling so much storage requirements now, that is with uh SSD storage and all of this stuff can be offloaded to DPUs or IPUs in this case. I'd say they are similar. And you know what else is like a big thing that I I got out of this because there's this, I don't know where I read this. Everybody was saying like ARM is going to take over the data center. 90% of all deployed CPUs in the data centers or for AI applications are going to be ARM based. I don't think that's going to be the case. I think X86 is still going to be used. And this is one of those examples where you can have a lot of stuff. Like there is no reason one has to exist over the other, they can coexist.

SPEAKER_00 22:00

Sure. Yeah.

SPEAKER_01 22:01

You can buy some ARM CPUs, you can buy some XPUs, or you can put your own Axion. Axion is not like a very cutting-edge CPU, okay? It is power efficient. It's great for cloud deployments and stuff like that. It's not really a force of power when it comes to CPUs, which Xeons are, right?

SPEAKER_00 22:19

Yeah, yeah, yeah. Sure, totally. It's definitely an expanding pie. It doesn't have to be one versus the other. And actually, you know, if you were an ARM bull, you could say, well, of course, people can port to ARM, and now it's easier than ever to port to ARM because you could use LLMs to help you, right? Now that essentially writing software is almost free. On the other hand, even when I had a conversation with Mohammed Awad at ARM at their recent event, and I was kind of asking on this topic, and he he made an interesting point, which is like, yes, obviously they see people porting things. And even Meta stood up and said that, like, yeah, it's easier than ever to port thanks to LLMs. But for customers, if you've ever worked in an enterprise and they're like moving from one really old system to a newer one, it's like a big ordeal, regardless of how technically complicated it is, because you just have to pick up old workflows and you're going to move them to a new system. And now all of a sudden you're realizing we should sort of refactor this workflow anyway, because it was sucky and it was just baked around maybe how the software worked or the UI worked. And so anyway, I say all this to tell people that like you should never just be like everyone can just flip to ARM from GPUs to another thing. Like when people port, there's a lot of times there's a business process built around it, and you have to make sure that your business process doesn't go down. And these things they can take years and they can take lots of people. And so I say all that to say I agree with your point that like people can can not everything's gonna go to ARM overnight, and people will continue to buy X, and then it's also it's all reality is complicated and messy. And so even if you're in an IT department and you hear the specs of something is really great compared to something else, even if that's a fit, you still have to ask yourself about like the opportunity cost and frankly the personal incentives of like, do I want to take on that headache of migrating this old thing to this other thing, even if it's like way easier than ever, you know? So I only say that. I know our listeners probably know that, but I do talk to people who haven't worked out in industry much. And so it feels like, oh, let's just compare specs, and whatever specs better wins. And it's never that simple.

SPEAKER_01 24:28

Yes. So when I was saying that you can put like uh x86 and ARM together, I guess they are not fungible in the sense you can run whatever on whatever. But it's like some workloads can be deployed to x86 and some can be deployed to ARM. They don't have to be the same workloads, but they can coexist as hardware in the data center, but route differently from a software perspective. I just wanted to clarify that because yeah, I did not imply this is a good point you brought up because it's not like they are interchangeable, but they can coexist for different workloads.

SPEAKER_00 24:58

Totally. It's like we'll probably continue to see more orchestration across XPUs and GPUs, multi-vendor, and across multi-vendor CPUs, X, ARM, totally.

SPEAKER_01 25:08

Yeah, yeah, all of it is going to happen. Okay, so if this was cool news, but like Intel had some more interesting partnerships. Do you want to hear that? Yes, please. So they they announced this partnership. So Intel and Samba Nova have been working together for quite a while, but they announced this new inference architecture, which is heterogeneous, heterogeneous. I don't know. Somebody needs to try to use the word.

SPEAKER_00 25:31

I hear other people say hetero. Yeah, I know I need to Google it.

SPEAKER_01 25:34

Yeah, we need to come up with a standard way of saying this word on this podcast. True. Basically, they are putting Xeon CPUs with Samba Nova RDUs, which are like reconfigurable data units, which does the decode. So think about it like Vera CPUs with Grok LPUs. It's a unique architecture to do decode in a very quick way. SAMBANO RDUs are the equivalent of LPUs in a functional sense, not in a one-to-one sense. They're very different architecturally. But from a functional sense, they are they're capable of like faster decode because it's a unique architecture that is built not just with SRAM, because Grok is all SRAM and RDUs are different because they have they have SRAM, they have HBM, they even have DRAM. So it's built with all of them. And the key secret sauce that underlies SAM Bonova here is that they predetermine the data path well beforehand, which means that they're like, okay, I'm gonna get this information from HPM, get this information from SRAM, and this other little piece from DRAM. And we will configure this path well beforehand and get all this data at the right time in a deterministic fashion, because we do know the latencies from every different kind of memory hierarchy. So we can reconfigure this data path for a given task early. And so this is a very powerful concept. So this allows you to use fewer chips to do decode. Like uh 256 chips can do what uh Grok LPUs need thousands of chips to do because they just don't have memory. SRAM is simply not there, right? Yeah, so this is fantastic for you know doing a different style of uh mix and match architecture is coming out now with LPUs for a particular application. Now, Sambanova RDUs with Xeon 6 is a very interesting approach to doing this. What do you think of this?

SPEAKER_00 27:31

We well, I think lots of things.

SPEAKER_01 27:34

Good. Let's get it.

SPEAKER_00 27:36

I think the sort of deterministic memory management is very interesting. Normally, when there's caching and different memory tiers, you have cash cache misses and things like that. And so it gets non-deterministic. So the idea that it's determinic because deterministic is very interesting. I think that matters a lot when it comes to LLMs in terms of it feels like it should matter a lot in terms of it's all about data movement, weights, KV cache, that kind of thing. I think you make the interesting point that the problem, of course, with Grok is great, great chip, great compute, deterministic, that kind of thing, because it's SRAM only. The only way to scale the compute is physically bound to the memory and the and they're coupled. So the only way to scale the memory is to also scale the compute with lots and lots of chips. So of course, when other people come along and make different memory decisions, then you can scale your memory maybe with fewer chips and therefore only need one rack instead of 10 racks. And so I think you know, Salmanova's yet another example of people putting pressure on Groc's initial configuration, which was designed before LLMs. So I'm sure I'm confident that their future L LP30, 3540, whatever they're titled on NVIDIA's roadmap, will obviously make different memory hierarchy decisions. Um, but then as we see yet another example of someone saying, yes, we should have uh multi-vendor silicon, it raises all sorts of interesting questions around the software orchestration across that. What's Intel doing? Nvidia has dynamo. Is everyone gonna just settle on like one orchestration layer? Or now, like, because the promise of multi-vendor is exciting here. You can sort of get like the right chip for the right part of the workload, depending on is it memory bound or compute bound or whatever and the memory decisions they made. But now everyone's saying, like, yes, let's disaggregate, which is good. What how are they doing that, this software layer? And then I think finally it also then raises the question well, now that everyone can slot in good, but at the end of the day, I don't think there's gonna be 10 winners. So there's only gonna be a few combinations. This GPU here, this chip there, this chip here, this chip there. It's there's not gonna be a hundred. So it's exciting to see Intel doing this and Samba Nova doing it, but I'm I'm kind of like, who is it gonna be Crocs, Cerebrus, Samba Nova, Etched, Mad X, TALUS? You know, like who's gonna, how's this gonna shake out?

SPEAKER_01 30:07

Yeah, I don't know. Maybe there may there are gonna be many solutions because really there is a price to performance ratio, there's the kind of workloads, there's like uh deals that company can make, there's IP like boundaries that people can't cross. There's so many factors that come in that maybe pushes multiple solutions ultimately. So yeah, we don't know. We'll see.

SPEAKER_00 30:26

There'll be multiple solutions, but the cost of manufacturing a chip is so high today that you have to have a certain volume to break even. So there's only so many customers. There's five, there's six or something, right? So there's gonna only be so many combinations, and only certain people are gonna move enough volume where they can afford to make the next chip, you know?

SPEAKER_01 30:48

Yeah, I can see that. Maybe the what doesn't cut it in the server would make it to consumer AI eventually, because we're gonna have those desktop inference machines that uh we all want and love so much because we don't want to pay API costs.

SPEAKER_00 31:02

Yeah, yeah. Yep. And if yeah, if your IP, if you didn't make it in the big leagues, like go play in the miners and go, you know, use the same IP, have a cheaper SKU, maybe your margins aren't as high, but I can see like the Talas chip, which is like a hard-coded LLM.

SPEAKER_01 31:16

I know it does like one particular thing, but you know, I don't know. Maybe you can embed it in a children's toy in the future and it uh talks back to you in an intelligent fashion. It's a good enough application. You don't need frontier intelligence LLM models for that kind of stuff. Maybe some rules about what it can and cannot say. But other than that, you know, it should work, right? So I'm seeing like things like this.

SPEAKER_00 31:37

There you go. I'm gonna, yeah, I'm like, I'm only buying my children wooden toys from here on out. Yeah. Because this thing's got an LLM in it and who knows.

SPEAKER_01 31:46

Yeah. No, one thing I wanted to, like, when you're saying deterministic about Samba Nova, like I I honestly don't know the answer to this. I don't know how much more deterministic Samba Nova is compared to Grok's uh VLIW architecture, the very large instruction word architecture. Because I know VLIW is like really deterministic to the last clock cycle. I don't know how how good or how deterministic Samba Nova is.

SPEAKER_00 32:11

Yeah, okay, fair enough. Yes, because I don't know. I it sounded like deterministic with the memory management across the different types of memory, but we'll have to just talk to them sometime and learn. Let them tell us more.

SPEAKER_01 32:25

Yes, I I don't know all the details for them. I don't know. But you know what I see as an advantage? Whatever it is, VLIW and Grok is not an expandable approach. Like it's not simple to make a Grok chip with a DRAM or HBM with it, I would imagine. Or maybe the compiler team is so amazing that they can write a compiler to do anything. Because they did write the compiler to make the original Grok chip. Maybe they can do more, I don't know. But I doesn't appear that the complexity of what I understand of DLIW and the deterministic per-clock cycle compiler-based approach is that it's very difficult to just expand and do these kinds of things. And that fundamentally limits a Grok and LPU approach, which is why I don't know, this is controversial. I never thought that this whole thing was uh some galaxy brain move by Nvidia. It's a great chip, and I'm sure it has more potential in the future, like GROK LPUs. But it's relative inelasticity is its own Achilles heel. And that's what I've always felt. But maybe somebody can correct us and let us know how we are wrong. Because definitely I'm no CPU expert, but from just what I'm thinking about it, I'm like, if it's so difficult to write a compiler, you can't expand it into HBM DRAM, but you can with Samba Nova. That sounds like a positive.

SPEAKER_00 33:41

Yeah, it sounds like if it's true, something that Sabonova should lean into as far as a narrative and differentiation and product positioning.

SPEAKER_01 33:49

Yeah. Totally.

SPEAKER_00 33:51

Totally. Well, we covered a lot of ground. We talked Intel, we talked, we talked a lot about your open cloud, which I think is really interesting. And it's fun to continue to listen to your struggles and your experiments. And I can't wait to see if it has good dreams in a week from now, if it's more intelligent than it was. But I think we should wrap it here. So that's it, everyone. Thanks for listening. If you're enjoying semi-dopes, share it with your friends, subscribe to our newsletters, and thanks for all your comments on YouTube.

Austin Lyons

Host

Vikram Sekar

Host