Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Each bit stored in dynamic memory must be refreshed, typically every 64ms (called Static Refresh). This is a rather costly operation. To avoid one major stall every 64ms, this process is divided into 8192 smaller refresh operations.

It implies that the length of refresh operation is roughly linear in the number of bits being refreshed. Why is it impossible to parallelize this?

> Typically I get ~140ns per loop, periodically the loop duration jumps to ~360ns. Sometimes I get odd readings longer than 3200ns.

What's the cause of the 3200ns+ readings?



DRAM beats SRAM at density, power, and cost by pushing the refresh circuitry out of the NM memory cells and into the periphery of the NM cell matrix. There are only M amplifiers per NM matrix, so you can only refresh M cells at a time and must perform N refreshes per refresh interval to catch every row. Could you put in NM amplifiers to refresh all the cells at once? Sure, but then we would call it SRAM :-)


This is a very good comment, but I think your joke needs to be explained for most people out there. "Sense Amplifiers" are the part of "DRAM" which can perpetually hold a charge.

The rest of DRAM are tiny capacitors (kinda like batteries) that can only hold a charge for 64-milliseconds. Furthermore, a SINGLE read will destroy the data. So the DRAM design is to transfer the information to "sense amplifiers" each read, and then to transfer the information back at the end when the "row of data is closed".

Once you understand that DRAM capacitors are so incredibly tiny, RAS, CAS, PRECHARGE, and REFRESH suddenly make a LOT of sense.

* "Row" is all of your sense amplifiers.

* RAS: Transfer "one row" from DRAM into the sense amplifiers.

* CAS: Read from the sense amplifiers

* PRECHARGE: Write the sense-amplifiers back into DRAM. Sense-amplifiers are now empty, and ready to hold a new row.

* Refresh: Sense-amplifiers read, and then write, a row to "refresh" the data, as per the 64-milisecond data-loss issue. According to Micron, all Sense Amplifiers must be in the ACTIVE state (ie: after a Pre-charge. They are empty and ready for reading / writing of new data).

> Could you put in NM amplifiers to refresh all the cells at once? Sure, but then we would call it SRAM :-)

Indeed. Sense Amplifiers are the "part" of DRAM which act like SRAM. Sense Amplifiers do NOT lose data when they are read from. They do NOT need to be refreshed. Etc. etc. Sense Amplifiers are effectively, the "tiny" SRAM inside of DRAM arrays that makes everything work.

The very point of "DRAM" is to make most of your RAM be these cheap capacitors. So the only solution is to read and write data to the sense amplifiers, as per the protocol.


I'm not an expert in the field but I was under the impression that SRAM worked completely differently, using a bi-stable transistor circuit and no capacitor, something like that: https://upload.wikimedia.org/wikipedia/commons/a/a5/Transist...

Such a circuit is stable and doesn't need any refresh or amplification.

Was I mistaken?


No, I was using "amplifier" in a slightly more general sense to mean "a circuit that uses power to turn a weakly driven signal into a strongly driven signal." You are absolutely correct that a SRAM cell would drive a near zero signal closer to zero and that this behavior differs from a linear amplifier which would drive a near zero signal further away from zero.

Here was my conundrum: a more general term like "active circuit" risked leaving people behind while a more specific term like "buffer" or "driver" didn't highlight the analogy between SRAM and DRAM. I chose "amplifier" as a compromise, hoping that people who were familiar enough to worry about bistability would be comfortable with the generalized definition while people who barely hanging on would miss that detail entirely but still get my point. Sounds like I caught you in the middle. Sorry for the confusion.


Technically yes, since the gate (control input) of a field effect transistor is functionally a capacitor, with the source-drain connection acting as a very crude sense amplifier. (You can actually observe this with some discrete MOSFETs, by attaching them to a breadboard in series with a LED and tapping the gate line against VCC or ground to turn them on or off.)


> Why is it impossible to parallelize this?

One of the mentioned articles touches on this: http://utaharch.blogspot.com/2013/11/a-dram-refresh-tutorial...

> Upon receiving a refresh command, the DRAM chips enter a refresh mode that has been carefully designed to perform the maximum amount of cell refresh in as little time as possible. During this time, the current carrying capabilities of the power delivery network and the charge pumps are stretched to the limit.

I guess: it's actually hard to deliver power to refresh all the bits at once? Also note that ancient cpus like Z80 had the memory refresh machinery built into CPU as opposed to memory https://en.wikipedia.org/wiki/Memory_refresh

> What's the cause of the 3200ns+ readings?

No idea. Random noise? Timing interrupt? Some peripheral doing DMA transfer? Kernel context switch? System Management Mode?

https://en.wikipedia.org/wiki/System_Management_Mode

Feel encouraged to run the code and try to debug it!


> I guess: it's actually hard to deliver power to refresh all the bits at once?

You can't refresh all the bits at once.

You only have something like 256kB worth of sense amplifiers across 2GB of RAM (Guesstimates from my memory: but the point is that you have much much FEWER sense-amplifiers than actual RAM). You need a sense amplifier to read RAM safely.

Each time you read from DRAM, it destroys the data. Only a sense amplifier can read data safely, store it for a bit, and then write it back. Since you only have 256kB of sense amplifiers, you have to refresh the data in chunks of 256kB at a time.

Not all sense amplifiers are "gang'ed up" together: they're actually broken up into 16-banks of sense amplifiers. But for whatever reason, modern DDR4 spec seems to ask all sense amplifiers to refresh together in a single command. In any case, you can at best, get 16x the parallelism (theoretically: since the spec doesn't allow for this) by imagining a "bank-specific refresh command".

That's a lot of complexity though, and I'm not even sure if you really gain anything from it. Its probably best to just refresh all 16-banks at the same time.


> modern DDR4 spec seems to ask all sense amplifiers to refresh together in a single command

Per bank auto-refresh could be exploited with elaborate memory controller algorithm trying to always prioritize refreshing unused/least used banks, except it was broken by design and you couldnt control which bank to refresh. Nobody even bothered implementing per row refresh counters to skip freshly read rows. Rowhammer is a real shitshow exposing sloppy memory engineering in the industry.


I don’t know modern RAM chip organization, but in the old days the bit cells were layed out in a square and an entire row would refresh in parallel. A 1Mb chip would then refresh 1Kb at a time taking 1K refresh cycles for the entire chip.


This is correct, and, the size of a dram "page" (which is not the same thing as a tlb page) has scaled up as memory chips have gotten larger.


This is the correct answer.


> Why is it impossible to parallelize this?

Within a single chip, it is impossible by design... not by any physical nature.

The DDR4 spec has 16-banks (organized into 4-bank groups), which could theoretically refresh individually. But that's not how the spec was written: a Refresh command will cause all 16-banks to start refreshing at the same time.

However, it is possible to "parallelize" this rather easily: that's why you have TWO sticks of RAM per channel. While one stick of RAM is going through a Refresh cycle, the other stick of RAM is still fully available for use.

My assumption is that it is a better protocol to issue all banks to refresh at the same time. Otherwise, you'd need to send 16x the Refresh commands (one for each of the 16-banks). At that point, most of your messages would be "Refresh" instead of RAS / CAS (open row, or open-column) commands, needed to read/write data.

If you really want parallelism, get multi-rank RAM or stick more RAM per channel. But even then, if the two sticks of RAM refreshed at the same time, you'd have fewer "Refresh" commands in general. So it still might make more sense for memory controllers to keep sticks of RAM all in sync with regards to Refreshes.


> It implies that the length of refresh operation is roughly linear in the number of bits being refreshed.

I think you are misreading it. The point to avoid locking all the memory each time, instead you only lock (via refresh) a small potion of memory each time.


It absolutely is linear with the number of rows being refreshed. When you send your dram chip a refresh command it's effectively being read a bunch of times. You cannot use that dram chip during that period.

You seem to be taking about banking, I think.


Maybe one of the reasons it's hard to parallelize is that the refresh operation requires electricity, and there's obviously a limit on how much of that it can use.


It's because of the grid layout where the sense amps are shared between rows. Thus you can fundamentally only refresh one row at a time. Wikipedia has a good page on DRAM refresh.


Well, you can't really read from multiple addresses simultaneously.


Well, you can parallelize it by adding more sticks of RAM.


Maybe the kernel timer tick?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: