Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In storage, moving away from 2D MLC and TLC NAND towards 3D TLC stacking (and horrendous higher bits) has introduced disturbances that literally shorten the memory life cycle. When a cell is read, the voltage alters the state of adjacent cells, which must be forced to be rewritten to preserve their state, thus shortening the life cycle of the disk just by reading data. they are selling us crap.

From the little I understand about the problem, this would be solved by occupying more surface area to separate the tracks that run through the vertical stacks ? what would be like a 2D design surface area but with bigger complications. Although I have read papers[1] that propose adding latency in an attempt to mitigate (not solve) the problem.

So now, reading this news about processors and stacking, I wonder about what inconveniences the end users are going to suffer with processors built under these techniques. Whether in computational reliability, vulnerabilities and so on.

I wrote vulnerabilities (pure imagination and speculation of my own, I'm imagining a prefetch problem at the transistor level) because if it turns to be real at future I can see the manufacturer introducing a fix for randomly increase latencies or any other thing, and sending the computing power back ten years with an "oh, we didn't expect it such thing were possible when we designed it".

And of course the computational reliability.

is being taken care of to avoid all of this?.. if not, I leave my comment here for courts in the future.

[1] [2021] doi.org/10.1145/3445814.3446733 (use sci-hub)

[2] [2018] doi.org/10.1145/3224432 https://people.inf.ethz.ch/omutlu/pub/3D-NAND-flash-lifetime...



>So now, reading this news about processors and stacking, I wonder about what inconveniences the end users are going to suffer with processors built under these techniques. Whether in computational reliability, vulnerabilities and so on.

Denser logic hasn't got the same issues as dense non-volatile storage as logic doesn't need to have any persistence.

It's what the likes of Micron and Samsung are good at fixing and working around when they launch and scale their Xnm processes for a specific storage technology, and what makes them better than competitors.

Intel, TSMC, GloFo, etc they all can buy the latest gen EUV machines from ASML if they want, but yet TSMC is always one node ahead on logic and Micron and Samsung win at storage, because they're good at ironing out the kinks and challenges that come from shrinking down those specific designs closer and closer to sub-nm level while the others can not (so easily).

If fabbing cutting edge silicone was as easy as just having the latest gen ASML machines, then ASML would just hoard the cutting edge machines for themselves and become vertically integrated in fabbing their own cutting chips as a side hustle before everyone else.


> they are selling us crap.

You can completely rewrite a modern 4TB 3D TLC NAND every day for 3 years (3000 TBW). How is that crap? Who even has such needs?

You are talking about some arbitrary "quality" - I want to be able to rewrite it a zillion times - which make no sense for 99.9% of use cases.

I'd much rather have a 4 TB drive which can be rewritten 1000 times versus a same price 256 GB one which can be rewritten 1 million times.


>You can completely rewrite a modern [..]

3D NAND has introduced degradation when is read data from the disk. You need to calculate then how many times the disk is read, the unwritten free space that will be consumed for to maintain the data when the disk is read, and so on.


and? the TBW guarantees are known in advance.


The TBW of the disk shown in the specifications is the estimated write limit of each cell multiplied by the number of cells. They don't take into account that in order to read the data of each cell, the adjacent cells will be written and will consume little by little these estimated write limits.

Therefore, if you fill the disk and only read data, it will sooner or later go into protection mode or lose data because of it.

They could only guarantee the TBW if more memory were added for to cover the writes consumption by the reads usage of the current 3D NAND design. I no longer know how to explain that it is programmed obsolescence, self-destructing disks by read data.

We stopped seeing 10 years guaranties when 3D NAND was introduced, so they know well what they are doing.



Why do you link an Industrial SSD Storage standard with write test? It only shows the cells have the corresponding write limits at beginning.

My last comment, I'm sorry but I can't spend any more time with this.

To read data consume writes,

https://dl.acm.org/doi/10.1145/3445814.3446733

" Figure 1a plots the average SSD lifetime consumed by the read-only workloads across 200 days on three SSDs (the detailed parameters of these SSDs can be found from SSD-A/-B/-C in Table 1). As shown in the figure, the lifetime consumed by the read (disturbance) induced writes increases significantly as the SSD density increases. In addition, increasing the read throughput (from 17MBps to 56/68MBps) can greatly accelerate the lifetime consumption. Even more problematically, as the density increases, the SSD lifetime (plotted in Figure 1b) decreases. In addition, SSD-aware write-reduction-oriented system software is no longer sufficient for high-density 3D SSDs, to reduce lifetime consumption. This is because the SSDs entered an era where one can wear out an SSD by simply reading it."

Data retention consume writes,

https://ghose.cs.illinois.edu/papers/18sigmetrics_3dflash.pd...

" 3D NAND flash memory exhibits three new error sources that were not previously observed in planar NAND flash memory:

(1) layer-to-layer process variation, a new phenomenon specific to the 3D nature of the device, where the average error rate of each 3D-stacked layer in a chip is significantly different;

(2) early retention loss, a new phenomenon where the number of errors due to charge leakage increases quickly within several hours after programming; and

(3) retention interference, a new phenomenon where the rate at which charge leaks from a flash cell is dependent on the data value stored in the neighboring cell. "

Free way.


TLC is a decent spot, which is why it's still being produced.

QLC is less so, since its endurance is only ~300 cycles. there's plenty of tension in the storage industry about this, with vendors saying "don't worry be happy", and purchasers saying "wait, what read:write ratio are you assuming, and how much dedupe?"

PLC (probably <100 cycles) is very dubious, IMO, simply because it would only be suitable for very cold data - and at that point you're competing with magnetic media (which has been scaling quite nicely).


I have hard drives which I only write to once - long term archive. I will gladly change them fro QLC/PLC storage if price is reasonable.

There is a market for any cycle count, it just needs to be reliable and respect the spec.


that's the tape market I mentioned. agreed, tape doesn't fit the personal market, but it totally dominates anywhere that has scale.

the question is: what counts as reliable? if PLC is good for 50 erasures, are you really comfortable with that? it's going to cost more than half of QLC, I assure you...

the interesting thing about flash is that people want to use the speed. which means they put in places that have a high content-mutation rate. if it's just personal stuff - mostly cold, little mutation - that's fine but not the main market.


There is a market for high speed read only data - S3 serving, and all kinds of mostly read database scenarios (OLAP). You can have tiered storage, data is first consolidated/updated on TLC drives, and as it ages is moved to PLC storage. RocksDB already supports something like this.


2D TLC wasn't quite decent, but 3D TLC is decent. I think some bad reputation about TLC is come from 2D TLC.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: