TSMC Arizona outage saw fab halt, Apple wafers scrapped

(culpium.com)

188 points | by speckx 10 hours ago

11 comments

Animats 8 hours ago
This was apparently a Linde installation custom built for TSMC in Arizona.[1] Nitrogen, oxygen, and argon are extracted from air on-site and purified. That's Linde's primary business; liquefying and distilling air. This isn't some little local company or a company operating outside their area of expertise.
Those gases are storeable, so it's surprising there wasn't enough tank capacity to deal with outages.
The site plan [2] shows "Gas Plant 1", and future "Gas Plant 2" and "Gas Plant 3". The gas plants are across a small road from the fab and feed the plant directly. Once Gas Plants 2 and 3 were built, there would be redundancy, but at this stage, there isn't a backup. The plan doesn't show a large tank farm, so they can't store gases in bulk.
[1] https://www.aztechcouncil.org/utility-company-makes-progress...
[2] https://semiwiki.com/forum/threads/tsmc-phoenix-arizona-fab-...
[-]
- sevensor 6 hours ago
  Unless things have changed a lot since I fled semiconductor manufacturing, you would still need silane tanks at least. I’m as surprised as you that they don’t have buffer tanks.
  [-]
  - cosmic_quanta 3 hours ago
    Why did you use the term "fled"? Any interesting story?
    From the outside, I would love to participate in semiconductor manufacturing.
- NoiseBert69 8 hours ago
  Linde is huge. They can produce and offer all gases in all available purity classes.
  [-]
  - astrange 6 hours ago
    Is there some reason you'd want gasses in a lower purity class?
    (Well, it's cheaper.)
    [-]
    - sitharus 12 minutes ago
      The only reason is cost. If you don’t need the purity then save the money.
    - rpmisms 6 hours ago
      Fire suppression, welding gases, etc.
  - jack_tripper 8 hours ago
    Is this an ad?
    [-]
    - RealityVoid 7 hours ago
      I don't think Linde™ needs an add, everyone knows Linde™ is the most reliable partner in producing and storing gases in all available purity classes.
      (joke off, it's probably not an add, but they were excited to share the reason you see Linde on all sorts of gas tanks all over the place. It's actually quite common and if you see it once you see it everywhere. )
      [-]
      - wiredpancake 5 hours ago
        I've never seen Linde ever in my life (Australia).
        What is funny though, at least in the Australia and UK regions, they still use the BOC brand, which is a subsidiary under Linde.
        [-]
        mcbain 21 minutes ago
        Linde was definitely around as a distinct brand in Australia before they bought BOC in 2006, but since then as you said, they now just trade as BOC.
        nandomrumber 2 hours ago
        Competitors in the space in Australia include Air Liquide and Supagas.
        Supagas tend to have better prices for smaller operators, and hobbyists.
    - nebula8804 7 hours ago
      Possibly but also plausible is that its a deep joke that everyone is in on.
      When googling the company, the marketing slogan that comes up is "Linde is Everywhere" but that works on so many levels. They sell air, air is everywhere. Therefore Linde is everywhere.
      They are a company that sells air: that stuff that people breathe. Forget this AI nonsense. Jensen has to constantly pull something out of rear to keep food on the table. These guys sell air. What a business. :)
- ErroneousBosh 6 hours ago
  > Those gases are storeable, so it's surprising there wasn't enough tank capacity to deal with outages.
  It probably depends on the duration of the outage. I'd expect they have some storage, and if they plan on having the compressor plant down for longer that that can manage they'll bring in tanks.
angelgonzales 9 hours ago
This isn’t very big news. Issues occur during bring-up often. Linde’s processes are possibly so power intensive that failing over to generator power is not possible. TSMC is right to put Linde on notice since Linde should have a PFMEA and control plan to eliminate any root causes for downtime. I suspect in the long term TSMC has plans to insource this if the issue persists. Scrap happens sometimes during manufacturing, if the writer only has journalism experience and no manufacturing experience then they may not have a conceptual understanding of acceptable first pass yield. After all, the TSMC logo features failing parts!
[-]
- FaradayRotation 8 hours ago
  In many ways I agree with you, but the problem statement (constrained/exhausted gas supply from vendor) makes it seems like this was not just line down, but the whole factory stopped for a few hours. Line down is a miserable migrane but still managable... while a whole factory stoppage makes a lobotomy seem like a good idea. It also sounds like there was not enough forewarning to park critical customer wafers in a "safe" stage of the process.
  Even so, I also would still call this another monday at a semiconductor factory. Welcome! Here we play a nearly endless game of whack-a-mole. Here's your mallet and your towel. Now whack enough of the moles hard enough until they stop coming back (at least through the same holes). Beware the alpha moles.
  By any road, I am surprised to see even this high-level perspective on a quality event disclosed to the mainstream public; I thought this was not standard practice. I enjoyed the read.
- throwaway2037 5 hours ago
```
    > This isn’t very big news.
```
  The opening paragraph feels a bit pearl clutching to me.
```
    > the company had to scrap thousands of wafers that were in production for clients at the site which include Apple, Nvidia, and AMD.
```
  Eh. So what? I am sure scrap thousands of wafers for all kinds of other reasons. I would be better to know the cost per hour of a total plant shutdown. (Of course, I'm sure the author doesn't have this information.)
```
    > After all, the TSMC logo features failing parts!
```
  Final hat tip here. I never knew that.
- nutjob2 3 hours ago
  > After all, the TSMC logo features failing parts!
  I'm not sure about that, I think the blank spaces are just parts that have been picked. The dies have been cut and the good ones are being removed.
  [-]
  - prezk 2 hours ago
    Normally a wafer would have die-sized spaces for test structures used for optical, electrical, chemical and other tests. Think the TV test card https://en.wikipedia.org/wiki/Test_card
bob1029 9 hours ago
> forcing the facility to shut down for at least a few hours
> As a result, the company had to scrap thousands of wafers
Anything involving wet chemistry, photoresist, furnaces, etc. is very time-constrained. You can't let wafers sit around indefinitely. Certain process steps must be followed up very quickly to avoid scrap.
This is why you dont see redundant power for manufacturing lines. A 3nm line needs hundreds of megawatts to operate. You cant clear queued lots without a fully functional line. There's not much you could save by keeping part of the line operational.
[-]
- tantalor 9 hours ago
  Good idea for a Factorio mod?
  A new failure mode resets output progress back to zero if you lose power or some other input while crafting.
  You could design circuit networks to cut power to non-essential systems so the rest of the factory can keep producing.
  [-]
  - tetha 9 hours ago
    Some mods in modded minecraft had that and it's a very punishing mechanic unless implemented well.
    It eats all of your power and usually also very expensive items very quickly usually. Assume you have like 600RF/tick generated, common with certain generator constellations. 1 tick - 1024 RF and one input consumed, crafting fails due to not enough power. 1 tick wait, 1 tick, 1024 RF and one input consumed, ... This can void 10+ items / second, which can hurt very badly. Even for common items in fact.
    It also tends to kick you while you're down, because it only kicks in if everything else is already failing. Then the only thing to continue functioning is the thing voiding your energy and your expensive items. Or even worse, if you did one miscalculation about your power grid, and then all of your resources are gone, often before you can react.
    It can be interesting in the right packs, but it is Gregtech level hardness.
    [-]
    - bombcar 8 hours ago
      GT:NH has “easy mode” enabled in some regards - it won’t finish the craft but it WILL wait for power (actually keep trying) - so if you fix the power problems you can finish and not lose the mats.
      May or may not apply to multi blocks.
      [-]
      - mjevans 3 hours ago
        I didn't finish GT:NH but if I ever set aside enough time to play a GregTech build again, GTNH is on the very short list.
      - DanHulton 2 hours ago
        Multiblocks power fail and void, but then your machine shuts down until you restart it. This is much better than suggested above, where you'd void over and over, but it can still utterly mess up a large craft being orchestrated thru AE2, which is still waiting forever for he failed craft to submit a part back into the system.
    - hofrogs 9 hours ago
      GregTech doesn't use RF though, at least it didn't. Machines pull packets of amps through the wires from the generators/batteries, the whole system is pretty interesting. Also high-level circuits have to be manufactured in cleanrooms with a pretty complex tech chain.
      [-]
      - tetha 8 hours ago
        Oh GTs power is absolutely not RF. Back in the day, even GTs power could be cruel though. You could over-volt your machines and thus void machines you spend literal days on crafting. And the cables in the process too. And you could lose your entire infrastructure once it rained and you had no roof :)
      - squigz 8 hours ago
        I think GP just used RF as an example and was only referring to GT as a comparison in difficulty.
        GT's system of only pulling power on-demand is very nice though; no wasting fuel
    - squigz 9 hours ago
      Out of curiosity, which mods are this cruel? I've been playing GT (modern) lately and even it doesn't void your machine's items unless you break the machine itself.
      [-]
      - tetha 8 hours ago
        Oh this was in the days of yore of modded 1.4 and early 1.7. I don't remember specific mods, I just remember the pain and frustration of this happening.
        I'm currently playing Stoneblock 4 and have been playing GT:NH and Nomifactory some time ago, and the more modern mods have learned a lot from those old janky things. Heck, back in the day every mod had a different power system and you needed a nonsense amount of conversion infrastructure, unless the modpack did a lot of work to combine all of this somehow, haha.
  - rtkwe 8 hours ago
    Power brownouts are pretty rare outside of the very early game. It's too easy and cheap to massively over produce power for that to really harm players outside the early game so I don't think there'd be much interest. Usually brownouts rapidly develop into full blown blackouts and black restarts as your miners reduce output during the brownout often leading to a reduction in incoming fuel leading to even less power being generated in a self consuming cycle.
    [-]
    - tantalor 8 hours ago
      I would apply it to inputs as well.
      Suppose you can start production with only 1 of each input required for a recipe, but to keep it going you need to keep feeding all of the inputs to finish it. If any of them run out, then the recipe fails, you lose the inputs, and the machine stalls.
      This works better for high latency recipes (>10s) with lots of inputs, like low density structure, modules, and atomic bombs.
      [-]
      - rtkwe 3 hours ago
        Usually the answer is to just slightly overproduce the inputs, only the new planet Gleba even slightly discourages letting items just sit on the conveyors with their freshness mechanic. What's the benefit?
        [-]
        marcosdumay 1 hour ago
        You would want to use circuits and stop production if there aren't enough inputs for a receipt.
        It still looks kinda easy, the machines just do it automatically on the default game.
  - sidewndr46 9 hours ago
    Isn't that what spoilage is?
    [-]
    - helpfulclippy 8 hours ago
      I thought of spoilage as a mechanic that punishes overproduction.
      [-]
      - blmarket 7 hours ago
        It's a constraint to process item within limited time (regardless of overproduction or power outage). Matching with the problem description.
        Surely the reality might be much more complex (like... the yield/quality drop by time function?)
  - dylan604 8 hours ago
    Mindustry has something similar with pumping various gasses/liquids through plumbing. If you accidentally mix them while building new lines, things stop working when your gases get mixed up forcing you to purge the line.
  - ActorNightly 9 hours ago
    Someone needs to make the whole chip manufacturing process into a factorio like game and let the gamers optimize it, then build the factories around that.
    [-]
    - gnatman 7 hours ago
      Like Ender’s Game but instead of intergalactic shooting war it’s international chip war.
- j_walter 8 hours ago
  TSMC has backup generators in their AZ fab. You actually have to have backup power or a few hundred millisecond blip could cause days or weeks of tool down time. You should see what happens when you lose the ability to keep a clean room at temp/humidity/airflow...it's weeks or months.
- sevensor 8 hours ago
  It didn’t happen, but the facilities team at the fab where I worked was seriously considering installing a flywheel to cover power bumps. What I don’t get about this story is how this actually happened. All our process gasses were out in a tank farm and we knew how much pressure we had. We would have stopped the line if there wasn’t enough to proceed. Were they separating air onsite or something?
  [-]
  - jaggederest 7 hours ago
    I was very impressed by the modest little fab I worked at having thousands of lead acid batteries for momentary takeover, and 8 five-megawatt locomotive engines for longer term redundancy. Apparently their steady state usage was 25MW, which allowed still having a hot spare and concurrent downtime for two of the locomotive generator units.
  - bobmcnamara 8 hours ago
    Yes, Linde has an onsite plant and is building two more.
    For some processes, stopping will botch the wafer. In the event of a gas shortage, do plants plan which lines to take down first, and which lines should complete a process step?
    [-]
    - sevensor 7 hours ago
      The way this worked at the fab where I was, was that facilities would have paged everybody, and whoever needed to hold wafers would do so. You could mark your equipment down or unavailable for a particular step. I don’t know what we would have done if it was “hey, we lost dry nitrogen a minute ago.” I think at that point you lose a lot of wafers in wet cleans.
      In the case of a power interruption at the fab, consequences were highly dependent on the equipment and the unit process. A prolonged power interruption to diffusion was the worst case scenario. You’d have 150 wafers in the furnace, and any significant deviation from the nominal temperature profile meant they were all scrap. Worse, if the furnace cooled off, you had to scrap the quartz boat the wafers rode in, too. Other processes had a smaller blast radius but were even more of a headache to disposition. Implant, you’d lose beam and probably lose vacuum too. Then the wafer in the chamber would be dusted and in an indeterminate state, and the rest of the wafers you’d have to sleuth out whether they were implanted or not. Sometimes you’d have a lot sitting in the end station and it wouldn’t be clear whether or not it had been run at all. At least in photolithography you could tell whether or not a wafer was patterned by looking at it.
- Kye 8 hours ago
  A video showing those steps, for the curious: https://www.youtube.com/watch?v=dX9CGRZwD-w
  It's probably not 100% identical to TSMC's process.
agentifysh 9 hours ago
seems like what is often downplayed or silent on American media is the cultural mismatch between TSMC taiwanese engineers and their american counterparts
so it always comes to those out of the loop as a bit of a surprise but from what I've read from individual Taiwanese workers and their feedback its clear that there is significant regret from one side.
and it doesn't seem to limited to just TSMC but another large company as of recent that receive icey reception for their large investment in America manufacturing.
i think this is a big reason why lot of these jobs simply wouldn't stay in america as the consumer would not be able to foot the costs added by "cultural premium" faster than what innovation can reduce.
[-]
- itake 9 hours ago
  Perhaps if the US workers earned OT like TW workers do, the "culture" gap would shrink.
  [-]
  - j_walter 8 hours ago
    TW workers have a majority of their compensation in bonuses, so the OT portion is quite small and many do not even bother to ask for it. The overall compensation between a TW and US engineer at TSMC is also significant. Not to mention the lowest paid hourly workers...where in TW they make 2-3X minimum wage, but in the US it's like 1.25X.
    [-]
    - itake 6 hours ago
      That is not what I heard from my cousin at TSMC. The OT gives the workers a “living wage”. Most of his coworkers charge OT every week of the year.
      He admitted, even with their OT and bonuses, he probably makes more than them w2 salaries.
      But my point still remains: if they want US (or TW) folks to work more hours, they need to pay for those hours.
      [-]
      - j_walter 4 hours ago
        https://www.reddit.com/r/Semiconductors/comments/18x5vr5/sem...
        This reddit post captures what I've seen at TSMC in Taiwan. $120K is normal pay at the director level...engineers make $2500-5000 a month. TSMC AZ starting pay for a new college grad w/ BS is probably just under $100K/year with just salary, with the potential to make over $120K within a few years with full vested bonuses.
        [-]
        itake 1 hour ago
        I think your numbers match what my cousin shared. In both my conversations with my cousin and in the reddit post, it is unclear if reported salaries are take-home or don't include the OT and bonuses, but I don't get your point?
        My point is: Engineers in Taiwan work more hours because they are paid to work more hours (OT). Engineers in the USA are not paid more if they work 35 hours or 60 hours.
        If TSMC wants to address the culture gap (get the Americans to work more), TSMC should pay up.
  - limagnolia 8 hours ago
    Is OT "overtime"? How is it legal not to pay overtime in any US factory? Unless they are salaried (exempt)?
    [-]
    - itake 8 hours ago
      yeah, overtime. My cousin is an engineer at TSMC (who worked both in Tainin and now in Arizona) and is w-2 exempt.
    - TimorousBestie 8 hours ago
      A lot of the workers there probably are exempt under American law.
      I’m not an expert on Taiwanese labor laws but their list of exempt labor categories in the LSA is much shorter than the one in the American FLSA.
  - lazide 8 hours ago
    Hahaha. The work culture between TW and the US is night and day - and it isn’t flattering for the US.
    [-]
    - bnjms 7 hours ago
      How so? Are the Americans relatively lazy or just unwilling to put in tedious but necessary extended hours?
      [-]
      - lazide 5 hours ago
        The entire approach is different. Especially with Taiwanese engineers, their entire focus is whatever work they are doing. Everything else (quite literally), their wives handle.
        Americans typically ask for things like work life balance, non abusive working hours, etc. they also don’t (anymore) have the type of family life setup that allows them to actually focus so much - being pulled into child care duties, or taking care of family members, or whatever their next vacation should be, etc.
        The general attitude is also more ‘yeah whatever’ to some extent.
        The amount of singular obsessive engineering you get out of one vs the other is hard to compare.
        [-]
        agentifysh 3 hours ago
        hmmm this is interesting I was always the impression Taiwanese wives were more progressive and men had to do lot more lifting vs other regional cultures in east asia
        my original thinking after reading some of the anecdotes from TSMC engineers is that they were obsessively dedicated which means extreme hours from North American culture
        its also the same in places like Samsung where the company treats employers very well with perks and long career stability but its not free always requires huge sacrifice I'd imagine similar to Japanese conglomerates.
        I'm not sure which is better in America its definitely transactional relationship but it also comes with stability issues relatively compared to what these East Asian giants offer but at the cost of not being able to switch if and when you find yourselves at odds.
        Not sure what it was like at Nokia but also another conglomerate that ultimately folded under competition and also a country with more stringent labor/life constraints that you would find less enforced in East Asia.
        Getting a bit distracted here but noting how much culture plays a role in these large companies and their management styles.
- coredog64 8 hours ago
  There are like half a dozen semiconductor manufacturers in Phoenix that were here before TSMC arrived. There's a robust pipeline from ASU to these same manufacturers. Can we please just stop with the nonsensical notion that "Americans don't know how to fabricate semiconductors"?
  [-]
  - barkingcat 7 hours ago
    American economics doesn't allow fabrication of semiconductors even if there is the know how.
    Think about how Intel, who pioneered the know how, can't build cutting edge nodes in the levels that they need to make it profitable.
    IBM had to sell their fabs to cater to the whims of "shareholders".
    It's the greed of stockholders that you need to blame.
    [-]
    - astrange 6 hours ago
      TSMC is a publicly traded company just like the others. I'm not familiar with their governance but Google tells me the largest owner (a state development fund) has 6%.
      They have a special advantage because they don't compete with their customers, which leads to trust, which leads to customers paying for their R&D for them.
      Intel on the other hand just kind of sucks at their job. Skill issue basically. (But they aren't /that/ far behind.)
  - itake 8 hours ago
    its not that the USA can't produce semi-conductors. Its that semi-conductor production, at TSMC's scale (both in terms of number of units, yield rates, and depth) currently requires highly skilled workers to work a lot of their hours to "baby sit" the wafer production.
    Maybe there is a world where TSMC can hire enough skilled workers and optimize processes enabling people to go home at 5p, but that is not currently the case.
    [-]
    - fy20 4 hours ago
      We won't be seeing a TSMC plant in France anytime soon then.
    - lysace 7 hours ago
      Yes. This. So, yeah, essentially fundamentally incompatible with the US economy.
      The US is going to have to heavily subsidize the payroll of tens of thousands of very accomplished EEs/etc to make this work. By doing that they will also wreck the HW part of SV.
      [-]
      - astrange 6 hours ago
        There isn't really a HW part of SV. Hardware engineers aren't paid well enough to live there in droves like programmers. There are some of course, but the ones I know are in San Diego or Bremerton or Israel.
        Also, it's completely normal to run a factory 24/7. I think people are just impressed because TSMC is the only one they've read about?
        (However, it's correct that a TSMC fab is the most advanced and complicated process on the planet.)
        [-]
        lysace 6 hours ago
        Nvidia, AMD, Intel and Applied Materials probably employ like 100k people in SV?
      - jwagenet 4 hours ago
        SV already wrecked HW engineering by paying far more for SW than market rate HW such that anyone with financial ambition made the switch long ago.
- lysace 7 hours ago
  Spell it out. WTF.
perihelions 8 hours ago
Some HN threads about the past incidents mentioned in OP,
https://news.ycombinator.com/item?id=17686310 ("Computer Virus Cripples Several Taiwan Semiconductor Plants (bloomberg.com)"—2018, 100 comments)
https://news.ycombinator.com/item?id=19214952 ("TSMC's Photoresist Material Incident: $550M Loss (anandtech.com)"—2019, 15 comments)
behnamoh 9 hours ago
Does it further delay the launch of M5 Max/Ultra?
taurath 10 hours ago
In September
[-]
- joecool1029 9 hours ago
  But first being reported now: It was only speculation on the financial reports before this. How quickly do they normally report disruptions like this?
  I wouldn't think it would have to be too quickly since I've heard about fab disruptions from fires and such since the early 2000's. Probably just sometime after quarterly reporting to set the record straight? Why not in the report?
  [-]
  - samus 9 hours ago
    I also had the impression from the report that shareholders were miffed about this Q3 snag, so they had to publish this even though they were about to treat this as business as usual.
caycep 8 hours ago
the rebels have hit the tibonna gas supply I see
jason-richar15 8 hours ago
[dead]
richisferezs 8 hours ago
guys is it me or sonnet 4.5 just became like 10x worst ?