Huawei's Tau Scaling Law: Is the "EUV Killer" Real? Artwork

Semi Doped

The business and technology of semiconductors. Alpha for engineers and investors alike.

Semi Doped

Huawei's Tau Scaling Law: Is the "EUV Killer" Real?

May 29, 2026 • Vikram Sekar and Austin Lyons

0:00 | 38:51

Huawei dropped a paper claiming 1.4nm-class performance without EUV, and the internet immediately declared ASML dead and US export controls useless. Austin and Vik recorded one day after Memorial Day to unpack what Huawei actually announced at ISCAS 2026 — and why the "EUV killer" headline gets the story backwards.

They walk through the tau scaling law (tau is delay, and the idea is to attack it at the system level instead of the transistor), logic folding via hybrid bonding, the Kirin 2026 that doubles transistor count without shrinking, and who can actually manufacture stacked logic. Then the other tau knobs: a unified memory bus and near-packaged optics. Along the way: BESI vs EV Group, die-to-wafer vs wafer-to-wafer bonding, and why hybrid bonding isn't export-controlled the way EUV is.

The takeaway is the opposite of the headline. Tau scaling is rational engineering under constraint, it's bullish for ASML (two DUV wafers per product, not fewer), and the moment EUV-enabled fabs stack their own advanced-node wafers, the gap widens instead of narrowing. Bullish advanced packaging, bullish EDA and multiphysics.

Chapters:
0:00 The "EUV killer" paper that broke the internet
2:28 What Huawei actually announced at ISCAS
4:00 Tau scaling: optimize delay, not transistors
8:58 The equation and the 10x AI claim
11:05 Logic folding: stacking logic on logic
17:24 Who builds it, and can hybrid bonding be banned?
24:16 Why this is bullish for ASML
29:49 The other tau knobs: memory and optics
35:18 Takeaways: packaging, EDA, multiphysics

Follow Semi Doped:
Get more of Austin and Vik daily, free!
Sign up: https://www.semidoped.com/

Follow Chipstrat:
Newsletter: https://www.chipstrat.com
X: https://x.com/chipstrat

Follow Vik:
Newsletter: https://www.viksnewsletter.com
X: https://x.com/vikramskr

SPEAKER_00 0:00

The fact that we are stacking logic on logic has two implications. The first one is it is entirely centered around hybrid bonding because that is the secret source that allows them to increase transistor density.

SPEAKER_01 0:17

Welcome to another Sunny Doped episode. I'm Austin Lines with Chipstrat, and with me is Vic Shaker from Vic's Newsletter. Hey Vic, what's up?

SPEAKER_00 0:26

Yeah, how's it going?

SPEAKER_01 0:28

It's going well.

SPEAKER_00 0:29

Yeah, it's a nice time to chat. We usually do this later in the week. So usually we record on a Thursday, but today we're like doing Tuesday. And it's perfect because we were actually going to talk about something else, but then there is this paper that dropped from Huawei that said something to the effect of, oh, we're bypassing EUV, we're gonna be in like TSMC's 14 angstrom equivalent by 2031, and everything online is all at once. Uh this is hitting me in my face saying Huawei has you know circumvented EUV, ASML is dead in the water, US dominance is gone and leading edge nodes. I mean, at least that's the sentiment I get. And considering we recorded um the deep dive on lithography last week, so I know I I texted you and we're like, okay, like can we just talk about this early this week? Because it's so fresh in my mind, we should just do this.

SPEAKER_01 1:29

Yes, yes. So this is perfect. Let's talk about it. And in fact, it was Memorial Day in the United States yesterday, so I was like hardly even online, but the internet was blowing up. So I'm I'm glad that you text me. I'm glad that you read it. I look forward to learning from you through this conversation. And I will say, I wonder how um the United States government, anyone in those circles, like the uh Commerce Department stuff, how they're feeling. Because of course, lithography is EUV lithography is the big geopolitical chip on the table. Um, and I bet some of them have to be freaking out if they heard or saw an X, like, oh yeah, that, you know, what you used for uh controls is now rendered useless. They have to be also like, wait a minute, is this real or is this marketing? So that's the goal of this is to unpack what actually did Huawei announce? And was the marketing speak and the interpretation different than the technical paper and the technical innovations here?

SPEAKER_00 2:28

Yeah, so uh we wrote this down um in the semi-doped uh newsletter. Uh this is kind of where our first reactions go. So if you're not signed up to that, you know, uh as the listener, you should it's it's free on Substack. You should just go to semidope.com and just like sign up because whatever news comes our way, we just try to like write up quickly the first thought that comes to mind. Uh, usually it's the most natural response. But uh yeah, after that, um I think I got a few questions saying, hey, what do you think of this piece of news? Uh and then I responded and said, Yeah, um, this is really interesting that you know Huawei is has this approach of trying to improve performance of chips overall, but without trying to go to the you know the fancy machines, which they can't get a hold of because they're all under export control. So without much ado, I think we should first explain what the whole claim was. So there is this conference uh called ISCAS 2026 that I believe is being held in Shanghai uh this week. And um Hei Tingbo, who's like a uh Huawei uh director and the head of High Silicon, which is the chip arm of Huawei, uh, he uh I think gave a talk. So it was that I I did not hear the talk. Do you know if Hei Tingbo is a he or a she? Because I might totally get this wrong.

SPEAKER_01 3:53

Um I saw a picture of a woman, so I think the she gave whoever gave the talk was a woman, I'm pretty sure.

SPEAKER_00 4:00

Okay, okay, I'm glad I asked. But uh anyway, the point is the whole idea is that um there is this new guiding principle called the tau scaling law. And this the spiel is at least that it's going to replace Moore's law because you know, Thai, the the whole um over time Moore's law has stopped scaling and it's we've been you know squeaking along ever since we hit kind of the EU V nodes below 7 nanometer. And the whole idea is like okay, let's look at something else. And so this is where I actually like this framing. Actually, the tau scaling law has a much more fundamental scaling that I truly appreciate. And the I need to explain why why this tau thing, first of all. Tau is a measure of delay. Uh, it could be delay on the chip, it could be delay through the interconnect, it could be delay between racks, it could be delay, I don't know, on the between entire data centers or anything. So the whole idea of basically going to smaller transistors was essentially to minimize this delay. The smaller you made a transistor, the delay got you know lesser from the input to the output of the transistor. And that only meant it went faster. So for the longest time, the only way to make things go faster was to reduce the delay. So Huawei's interpretation is to stop thinking about dimensions of the transistor, which they really can't scale without EUV machines. Uh but why not go down one further level and ask what was the transistor solving anyway? And the answer was delay. So then the next logical question is okay, we can't improve the transistor delay anymore because we don't have the machines. So where else can we improve the delay? Because it's not like delay comes only from the chip or the GPU or the CPU, whatever it is. Delay is everywhere. Delay is in software, delay is in interconnects and how you hook up memory, what memory protocols and handshakes you do. So they were like, okay, let's reinvent everything and think of this from a whole system perspective. So this is what they call their tau scaling law. We will now scale down tau at the system level, not so much at the transistor level that has been done historically, but over the entire system. It could be a phone to begin with or an entire AI data center, we are now scaling down delay.

SPEAKER_01 6:30

Okay, yeah, that makes sense. I mean, so so summarizing it, they're basically saying, hey, uh Moore's law is dead. And by the way, even if it's not dead, we can't shrink transistors anyway because we can't get our hands on EUV tools. So if we want to continue to increase performance and we can't reduce the geometric footprint of each transistor, how can we increase performance? And so they zoomed out and said, well, wait a minute, maybe the geometric scaling of transistors was actually ultimately about reducing delay. And so then they're trying to reorient around tau, this time delay resistance capacitance product, and say, okay, fine, we we don't have the we have one knob that we can't turn, but what are all the other knobs that we could turn to continue to reduce delay? And I do like the point of not just delay at the transistor level, but extreme co-optimization or STCO, DTCO, which we could talk about. Um I saw you had a tweet about this, um, which is just saying how can we look up and down the whole stack from transistors and devices to circuits to systems to racks to interconnects to the whole data center to software on top of it, and how can we try to co-optimize amongst all of those?

SPEAKER_00 7:48

Yeah, so whenever they say this is kind of a law, uh it's always nice to see some equation. And I read the whole paper, actually. Uh it's an easy read for a paper, actually. And they had this nice equation which says the tau of the transistor, uh tau of the system is basically the delay through the transistor, delay through the circuit, uh, delay through the chip, and delay through the system, right? And what they want to do at every subsequent generation, like the tau of the next generation, is the tau at this generation divided by some factor alpha, where they think that that alpha factor is like um 1.3x uh a year for mobile and 1.5x for auto and maybe even 10x for AI workloads. So think about that, like optimizing across the system, they are thinking that they can reduce the delay by 10x for AI workloads. That is a significant improvement, and that is why they feel like um they can get a 1.4 nanometer class performance by tweaking other parts of the system, not just the transistor.

SPEAKER_01 8:58

Okay, so this law, like you know, most of these laws are not actual physical laws, but they're observations. So they must have had in their paper, were they showing just like chips that they created and measured these constants, and and that's where they're seeing the scaling? Or where did this data come from in this like observation, empirical observation?

SPEAKER_00 9:20

It's not anywhere. I think so they have some silicon. Okay, so let's get to that in a bit. But their optimizations were interesting because they happen basically across what from what I could tell across three dimensions. The first dimension was that they just want to make transistor density more, right? That's the whole thing that EOV does. EUV lets you pack in more transistors per unit area of the chip by making transistors smaller. So they stepped back and like asked, okay, we can't make transistors smaller, so how do we scale up the number of transistors in a chip? So they decided, okay, fine, we'll just take two chips and stack them one on top of the other and like hybrid bond it. Hybrid bonding is a packaging technique that's very interesting because you can have like millions of connections between these two logic chips, and uh the way it works is that you just heat them and like put pressure and they literally stick to each other. In the most simplest way, that's what hybrid bonding is. So you can have like very, very, very fine uh connections that are like closely spaced. Like the pitch between connections is something like 1.5 micron across a massive area of a chip. Think about that, right? So hybrid bonding is a very fine uh pitched packaging technology, which is probably the most advanced packaging technology you can get. So that first dimension was okay, let's just stack two chips together. So in the space of one chip, we get now two chips. Hooray, you know? That's one way to get transistor density. Yeah, that's kind of cheating because now you also use two times the silicon area because you've got to sandwich two wafers together. But you know, considering the cost of EUV, which we discussed in the last podcast episode, uh maybe it's not a big deal. Just saying.

SPEAKER_01 11:04

Yeah, yeah, yeah, totally. Okay, so they're so you're saying they can't, so they've got a chip, they can't increase the transistor density because they're at their fundamental limits with DUV and multi-pattering and whatnot. And and historically, by the way, how um the industry is kind of quote unquote getting around Moore's law is like systems of chips, you know. So it's like, oh, whether you're using chiplets or whatever, so it's like, okay, well, let's use like 2.5 D integration and let's put chips next to each other, and then we'll have co-ops, you know, interposers and connecting things. Um, but but you're saying that Huawei said, oh no, wait a minute. Well, what if we like instead of putting those transistors far apart and have increased delay because now you we've got to route between them, what if we just try to decrease the delay by stacking them in three dimensions to sort of increase the density, if you will, in a unit volume, really?

SPEAKER_00 11:57

Yeah, they call it logic folding, okay, which is a nice name, but it's really, if you think of it, no different, I feel, compared to what Intel Foveros is, or how uh AMD stacked SRAM with Vcache. I guess it's not technically lit, you know, logic to logic stacking when you're talking about AMDs Vcache, because they put SRAM on top of a logic wafer to boost like L3 cache on it. But in principle, yeah, they hybrid bonded uh SRAM wafer onto a logic chip, and that's kind of what this is all about. So logic to logic stacking is is not easy, right? Because what about how you're gonna get heat out of this thing? That's one example. Like thermals are quite challenging. So there are a lot of challenges to doing this stuff actually. And you've got to align it, like think about it. Like the the connections are like 1.5 micron pitch, which is actually ridiculously tiny. And so the alignment between the bonds needs to be perfect. And hybrid bonding itself isn't is a crazy packaging technology because the surfaces that you're bonding needs to be very flat and defect-free and all that, because when you squeeze it together, if there's like a dust particle between the two of them or whatever, like you know what happens, like now you've got an open connection between the two sandwich chips, and that's bad. So yeah, it's it's a challenging process, and that's the whole question is like does um China actually have the equipment to do this? Yeah, funnily they do, they do because for two reasons. One, they do have this expertise because they have been doing uh memory stacking for NAND at YMTC using wafer-to-wafer stacking and hybrid bonding of chips. That's how NAND chips work. They have even done hundreds of layers of NAND. Um, so they are familiar with it, but memory is a little bit of an easier problem because memory has so much redundancy that you can kind of route around stuff like have uh failovers in the memory architecture and stuff like that. So stacking is a different problem in memory than it is for logic. Stacking two GPUs on top of each other is a significantly harder problem than trying to stack 400 layers of NAND memory, you know?

SPEAKER_01 14:18

Totally, totally. So, okay, so you're saying historically stacking things is not a new concept. Even stacking logic is not necessarily a new concept, but normally when we're stacking first, like let's say logic on an interposer, that interposer is passive. Um, so it's it's not as big of a deal. You're just routing through it. And then even if you're stacking like memory on logic, um, that's a little, or and of course, memory and NAND and HBM, a lot of these things are already three-dimensional. So we're already used to figuring out how to create things in three dimensions and stack them. But it is a bit of a different beast when we stack logic on top of logic because they're both active, they're both powered, they're both giving off thermals, and you have to make sure all the connections are correct. And there's like today, the way these things are built, there's not this built-in redundancy where, like, oh, if something fails, just route around it. Um, but conceptually, it's still to the industry, logic on logic is not a new thing that Huawei has invented.

SPEAKER_00 15:24

No, it's not, and it's been around in principle. Uh, so that's what makes it interesting. Like, it's a challenging problem, and it's it's impressive that they do have silicon that uh they uh I it's called the Kirin 2026. This is a mobile SOC processor, uh, and they actually have this implemented and they have plans to keep going uh in the future. So they've already, you know, let's say I don't know how much the paper has all these numbers. Oh, yeah, here I have them. I think. Yeah, they've they've managed to like double their transistor count, yeah, obviously, by stacking. And uh so they they kind of jump nodes in in principle because you know, we spoke about this. There is no such thing as like five nanometers or two nanometers anymore because the transistor architectures have changed. So this is just a nomenclature now, anyway. So you can go to the same class node by doing other things, like gate all around was one of those other things like you could do to go to two nanometers. So Huawei's approach is like, yeah, we'll just stack transistors and we'll get the same transistor density. Um, in principle, this is like you know, C FET, maybe, like a complementary FET, where people were like, why should I put an N MOS and a C MOS, uh N MOS and a P MOS transistor next to each other? Because if for a C MOS, a complementary MOS, you require P type and N-type transistors, what if I put them on top of each other? You know, you can save space. So this is along the same inspired lines, not exactly the same thing. But basically, you can why don't why not put a whole transistor uh wafer on top and uh stack them up like that? And you can jump generations forward. So this is, they've done it actually, and this is very impressive engineering. Um, all kudos to them. I'm not gonna take away from their engineering achievement here. So that is the whole logic stacking aspect of it. There are two more things that I think are very useful, but I want to get to that after you ask me this question. Yes.

SPEAKER_01 17:25

Thank you for letting me graciously interrupt. Um, so on the logic stacking, so one of the things this reminds me of, of course, is Deep Seek, in that they could not get enough compute. They could not get enough compute and enough like memory bandwidth with the chips that they were given. Allegedly, okay, maybe they did find their way to some H100s, but allegedly they had these strip down, which caused the Deep Seek team to have to innovate in other dimensions because they were constrained on one. And so then they got to be the first to think deeply about other things like, you know, how do we offload some communication, overlap some communication and compute, do other little tricks so that we still unlock the right performance. And so what I'm thinking about is okay, if Huawei is constrained to not use um EUV, and therefore they're thinking about like, okay, how can we reduce delay in other parts of our system? And one of them is it's forcing them to go to logic-to-logic stacking, maybe sooner than the rest of the industry feels that they have to. My question is, who is manufacturing it? And is this giving them an advantage in just getting more practice manufacturing logic on logic? Like, will they be able to sort of run ahead a little bit because they are forced to build this for their customers sooner than, say, a TSMC?

SPEAKER_00 18:48

Yes. Um, this brings actually this is a good question, and I'm glad you asked this now before I went on to talk about something else. Because the fact that we are stacking logic on logic has two implications. The first one is it is entirely centered around hybrid bonding. Because that is the secret source that allows them to increase transistor density. Um you can increase the density two times. Can you stack it three times? I don't know. Can you stack it four times? I don't know. Like, so what is the limit on hybrid bonding here? How many layers? That's something I don't know, but remember it gets like extremely difficult because um if you are stacking logic on logic, you would draw, I think you would want to do die-to-wafer stacking, not wafer-to-wafer stacking, because you will get wrecked on yield. You know, as it is, these logic chips are kind of big, and yield is such an important thing because you know you don't get all that much as people imagine, especially at the very cutting edge. But maybe 7 nanometer nodes is okay. So that is one aspect of it that it's entirely based on hybrid bonding. And the question is, are they capable of it? And the answer to that is at this point, uh memory-to-memory wafer was all like uh so memory uh stacking is all like wafer to wafer, but logic needs die-to-wafer, and that is kind of new even to like BESI and these companies that specialize in hybrid bonding, and even they have product releases that are like very recent. Okay, so die-to-wafer bonding in logic chips is very cutting edge. And luckily, from what I was looking at, this there is no export restriction uh on bond hybrid bonding machines. You have extremely high uh limitations on what you can do in EUV, but not so much on hybrid bonding machines. So that's that's that's one thing. So they do have the machines that they can do this with. And if you look at like uh Bezzi's business, 35% of all their business is actually China-based, if you see the last quarter. So that is actually a large fraction of their machines actually do go to China. So I'm pretty sure they've stacked up on some BESI machines before they let this cat out of the bag. Because if you already have a silicon, piece of silicon that's stacked up, you and and working as in the Kirin's uh mobile SOCs, you can believe that they've been working on this for years. Okay, this is not an overnight achievement.

SPEAKER_01 21:30

Sure, sure. So you're saying SMIC is the manufacturer here, and they can't get EUV machines, but they can get hybrid bonding machines, and hybrid bonding is the secret sauce here to logic stacking. So, do you think that someone's gonna try to go say, now you can't buy hybrid bonding equipment anymore?

SPEAKER_00 21:51

So Huawei never said it's SMIC, by the way. Everybody agrees with this. Okay, okay. Because it's a reasonable assumption, and their stock went up and all that, which is cool. But uh and I don't think it's SMIC who will do the stacking aspect of it either, because there are specialized people uh in packaging in China whose you know whose names we don't have we shouldn't get into I I I'm writing an article on Substack on this, so all those details will be on there. But that comp there is another company that will do this uh the hybrid bonding because there is a whole learning curve on learning to do hybrid bonding. So there are companies who have patents just on hybrid bonding, and that is something that is not easy uh to do. So that's a separate skill set. China is working on that as well. So that's the one important thing is that it's all hybrid bonding based. And uh the next question is like, okay, does China have any local hybrid bonding machines? The answer is uh yes, they do, uh, but I don't think that they are in the same level of sophistication there is um for you know in their wafer-to-wafer bonding, for example. So that is something they still rely on BESI. And to answer your question, yes, so they they can impose restrictions on it, on China. I I I presume. Um but it really comes down that if they can get to do EV group, uh wafer-to-wafer bonding, there is a company called EV Group, which is an Austria-based company, and they are not in this axis of like export restrictions that you know BESI is in, because BESI is an is a Netherlands-based company, and you know, they're in the same boat as ASML and stuff like that. So there's these countries have a lot of export restrictions. But if they can do wafer-to-wafer bonding, who knows? Maybe they don't are not subject to export restrictions because Austria is not part of this thing.

SPEAKER_01 23:52

Yeah, well, so okay, I did not expect that this conversation would get so much into hybrid bonding, so that's cool. And I'm like, I need to go learn more about hybrid bonding in the market and who all the competitors are. Um, and then two, yes, these poor European countries, they're like, we invented something, we're awesome at wait for wait for hybrid bonding, and then you know, they're gonna get caught up in the crossfire of geopolitics.

SPEAKER_00 24:16

You know, the one thing is this I uh got I anticipate is like, oh wait, if if uh China can't do EUV, is AML, ASML affected by this and this news of tau scaling. No, because first of all, they were never buying EUV machines from ASML. They can't, okay? So there's never a business to begin with. Secondly, I will argue that this is actually good for ASML. Because remember now they have to make two wafers using deep ultraviolet uh you know, DUV for every transistor. So they need more DUV machines, which is a positive for, I would say, SML, right? There you go. I like it.

SPEAKER_01 24:54

That's a positive spin.

SPEAKER_00 24:56

Yeah, that's a positive spin on it. Uh so that's the whole thing about uh you know how this whole logic thing works.

SPEAKER_01 25:03

Okay, so then really quick, logic to logic, maybe last thing. We're talking about uh companies who build the die and other companies that package them. It's kind of like the front end and the back end. Um, of course, Intel Foundry can do both. So they can make the wafers, they can also do the advanced packaging. And you mentioned Intel Foveros direct. Um, so do you can you say like 10 more seconds on Foveros? And if this is a direction that Intel Foundry could support with the logic on logic stacking and packaging?

SPEAKER_00 25:36

Yeah, uh, I don't see why not. That is essentially what Intel Fovoros is, as far as I understand it. And what this shows is that this is the other thing I wanted to mention. Um, it comes to me now that you asked a question. Uh, is basically there is no reason that any US fab uh like Intel or anybody else like TSMC shouldn't start stacking wafers now. Because if you stack a 7 nanometer wafer and then you get tau scaling to work, imagine what will happen when you stack a 5 nanometer wafer or a 3 nanometer process node or a 2 nanometer process node. You're going to you know leapfrog past what uh China can do with tau scaling. So they may be able to catch up, but what this will do now is drive the ability to do hybrid bonding uh in the advanced EUV nodes because why not? It's not an overnight thing, it's a very complicated thing to do. Uh and but it you know, think of it long term. If you can stack two nanometer node wafers, uh that is an amazing amount of compute in a small area. And uh we may get to CFET before that, maybe we don't need it. We may do hybrid bonding of gate out around FETs before CFET shows up, don't know. Or we may hybrid bond CFET wafers together. Ultimate, the ultimate density move.

SPEAKER_01 27:07

Totally. We should do it all, right? Like the front-end folks should keep working on the transistor innovations, and the packaging folks should keep improving stacking and hybrid bonding and then slam it all together. And but I do agree with your point, which is like, okay, let's say I'm like, oh, I can do a billion transistors in this little area, and now I can stack them, so I get two billion, and then you're like, oh, well, I'm on an advanced node, and I can do 1.5 billion transistors in the same area, and now I stack it, and now I have three billion transistors, right? So it compounds totally.

SPEAKER_00 27:35

Yeah. So the whole tau scaling thing is not a I will replace EUV technology and leapfrog around you without the right tools. It is a temporary measure where yes, you can bump up the performance of silicon uh with this technique. But if the people who are EUV-abled do start doing the same thing, and they will, because that's how the industry works, they're not going to sit down and not do something about it, uh, then you know the gap widens, it doesn't narrow, the gap widens. That's a good thing.

SPEAKER_01 28:13

Yes, and maybe to make the point yet a third time, if someone after Huawei's big tau scaling announcement, if someone came to that to their silicon manufacturer, whoever that is, and said, Would you like EUV as well? I'm sure Huawei would say, Sounds great. Let's have that and tau scaling.

SPEAKER_00 28:31

Yes, exactly. That's what you would do, right? That's the logical thing to do. So, yeah, absolutely. So that's the whole that's the whole aspect about logic uh folding. And that is mostly the discussion that is going on here. Uh but their uh their paper actually talks about a few other dimensions that at least is worth mentioning. Um and that is uh basically what they call uh the unified bus for memory. So because they say that look, if you have all these different memory standards talking to each other, and then you have to have all these handshakes and converters and gearboxes and all of this stuff that adds latency, uh then you know you're wasting cycles here, okay? You're wasting tau. Don't waste tau. What we will do is we'll have a universal language, which everything in the rack or the system or the data center, I don't know, the earth, if you could, will all speak the same language so that there's no like translations happening. And so that is one way to scale down the entire thing, uh, the delay and speed up stuff. So very good one. I mean, I love it, right? It's a it's a very good thing to do anyway, regardless of whether you have EUE or not.

SPEAKER_01 29:49

Totally, totally. So, I mean, isn't that ultimately like, weren't we trying to do that with like RDMA and things and just say, like, how do we make it so that GPUs can communicate with other GPUs to talk share their memory directly without so many handshakes? I mean, was the industry already on this path? And does this just speed it up and say, hey, there's more latency to get rid of?

SPEAKER_00 30:10

Exactly. That's that's the whole point. The industry has been there already. Again, this is not a new concept, you know?

SPEAKER_01 30:17

Just a different prioritization.

SPEAKER_00 30:20

Yeah, exactly. When Jensen talks about extreme co-design, I mean, what do you think he's thinking about? He says, This is what he's saying, right? Don't uh just think about one thing like memory in isolation and work with your own standard. And when you try to plug it into a system, it has to talk a different language, and now everybody's like, oh, can you convert this language to that language? Don't do that. Let's look at the whole thing as one picture and then optimize everything for that you know system level optimization. So I this is this is the you know STCO argument, or you can call it extreme co-design, whatever fancy word you you you want to use. Tau scaling seems like the fanciest word I've heard yet.

SPEAKER_01 30:59

Yeah, very good. Yeah, their marketing folks did great. So, yes, you may have heard the term STCO system technology co-optimization. Jensen took it up a notch with extreme co-design, and now Huawei is trying to one up with tau scaling, which I mean it does sound pretty sweet.

SPEAKER_00 31:16

It is pretty sweet. It is pretty sweet.

SPEAKER_01 31:18

Yeah.

SPEAKER_00 31:19

No, one other thing they want to save uh tau on is uh networking. Uh, because they are like, why don't I just do near-packaged optics and eliminate DSPs from the entire system if possible? DSPs are terrible for latency, they are tau killers. You know, DSP is the tau killer, like you know, like fear is the mind killer in Dune. If you've ever read the books, yeah. DSP is the tau killer. Uh yeah, because you know you have to wait for all the bits to arrive, and then you have to wait for the parity bits to come, and then DSPs look at it and like, oh, are these bits correct? Oh, they're not correct, cool. Then I have to correct for the error, and then does all this computation. You need a like a leading edge node to do this DSP stuff. Sucks power, sucks latency, and they want to do away with it. They're like, just get rid of you know DSPs, don't use all this pluggable stuff. Use as much as close as you can to CPO, which is like maybe near packaged optics. Maybe you don't, maybe they'll they'll try to package the optical engine right on top of this stack. Stack it up. Let's go. Logic, folded logic, whatever. Yeah, I don't know. But anyway, at least in the near term, they can put the optical engine as close as possible to the actual compute silicon. That's one way to reduce cycles. So they're looking at all of this stuff. Obviously, you can do software optimizations, all that kind of stuff. So at the system level, they want to squeeze as much performance as possible, which is the only logical thing you do when you don't have access to leading edge silicon. What do you do? You do everything else. Exactly. That's what sound scaling is.

SPEAKER_01 32:56

Okay, gotcha. Gotcha. Okay, yeah. I mean, yeah, of course, you know, NPO and CPO make a ton of sense. Everything comes with trade-offs, though. So there's the whole like, does the uh supply chain support these things? Are they ready for it? Can they build it reliably? Do you have multiple sources? So I think that, you know, I it it feels a little bit academic in that on one hand, like anyone could sit down and look at a system and just say, what are all the different ways we could wave a magic wand to reduce tau? Um, I don't know if they talked about like all the practical bits in the paper or if this was more of just like whiteboarding out where the bottom is.

SPEAKER_00 33:38

Very high level. It is interesting that they have some silicon to show for it, which I love. But it's also a lot of like high-level hand-wavy equation-y stuff. It's not very complicated equations. You can read the paper, it's it's online, you know, you can find it. It's not a very complex paper, it's very marketing-like, but uh, it's just it's a good read. Because I think it is uh it's another knob the industry hasn't entirely paid attention to. I think we are getting there, and this is one of those signs, right? We we realize that we need more speed, we realize co-packaged optics is coming along, uh, we realize that memory bottlenecks are the biggest problem. It's not compute flops, it's memory and the interconnect that is really holding back everything now. And this is the next step. Like, okay, how do we squeeze and make uh you know active silicon in CPUs, GPUs, or whatever that is more dense? And that the insane way to do it is like start stacking them and hybrid bonding them. But you know, this whole thing is insane. So what's new? AI is insane to begin with. So what's new?

SPEAKER_01 34:46

Totally, totally. Yes. Never before, right, have we tried to co-optimize on such a grand scale and such a miniature scale? And so I do like looking at a different constraint to optimize around up and down. So I guess maybe final takeaways like what does this mean? And you kind of alluded to it before, like, what does this mean for everyone else, for TSMC, for ASML? Are are there other than the fact that they're not dead and EUV is not going away, um, are there any other takeaways?

SPEAKER_00 35:18

I don't have too many unless there is something that comes up that I haven't thought about. But whatever I thought about I already said, in the sense, as a broad summary, it's just like I think going to stacking chips is a positive for ASML because you need more machines to make more uh chips for the same product. You need to make two times as many wafers, which is a good thing. The other thing is that you will see most of the industry now starting to optimize across the entire stack. That's already happening, nothing new about that. Uh and then you I'm guessing that we will start seeing some activity around stacking up wafers. Maybe somebody's gonna try to you know use Intel Foveros and kind of try to stack GPUs. Stack GPUs, I don't know. That's just a guess. But yeah, I think you know, once once Huawei talks about this uh logic folding idea, more people are gonna do it. And that's that's a good thing. Gotta try more complicated stuff. Uh that's how we move ahead. Totally.

SPEAKER_01 36:21

I guess what's coming to mind to me for all obviously this is bullish advanced packaging because it's just more complicated connections. And then um also maybe bullish EDA and like multi-physics. Oh my god, great point. Actually, that's a great takeaway. Yeah. Yep. So we we're basically at time, so I won't get into it too much, but really quick at a high level, I guess where I'm thinking is like, okay, now this is a three-dimensional problem that involves thermals, it involves mechanical stress, it involves electricity, it involves optics, potentially if you're talking near uh NPO CPO. And so this becomes more and more of a challenge. I think this, of course, reminds me of like synopsis Ansit's multi-physics engine of just like it's gonna be less and less of like the silicon guide does this thing, and the packaging guide does that thing, and the thermals guide is that thing, but all of it needs to be brought together to figure out how do we stack logic on logic and remove the heat and still meet time enclosure and still have reliability and so on.

SPEAKER_00 37:25

Yes. It is hard enough. EDA is a hard enough problem already where we are on single-layer transistors and you know how you scale them up uh to make the GPUs they are today. There's a lot of advancements in the use of AI for EDA, and um the whole uh EDA industry is actually very bullish right now because you can basically sell licenses per agent rather than per person, and it can do a lot of stuff now. And what this adds is a level of complexity that is, you know, we haven't seen so far in the transistor world when you start stacking, you know, entire wafers and running complete logic, you know, across uh two wafers stacked or maybe four wafers stacked in the future. That's crazy. So there's gonna be a lot of challenges there. And the paper actually does mention that this is a challenge. So yeah, that's that's very much a good point to bring up.

SPEAKER_01 38:23

Nice, cool. All right, well, folks, with that, we hope you liked this episode. Um, check out, of course, our sub stacks. Also go to semidope.com. You can sign up for our free newsletter. If you like Vic and I and our takes, we try to give takes there every single day. And uh sometimes I come in later than Vic, so I just get to take a take on his take, and he doesn't get to respond before I hit send. So we uh we try to keep it lighthearted and fun too. But thanks for listening, and we'll catch you guys next time.

Austin Lyons

Host

Vikram Sekar

Host