Although NVidia’s current disastrous RTX 50-series is getting all the attention right now, this wasn’t the first misstep by NVidia. Back in 2014 when NVidia released the GTX 970 users were quickly dismayed to find that their ‘4 GB VRAM’ GPU had actually just 3.5 GB, with the remaining 512 MB being used in a much slower way at just 1/7th of the normal speed. Back then NVidia was subject to a $30/card settlement with disgruntled customers, but there’s a way to at least partially fix these GPUs, as demonstrated by a group of Brazilian modders (original video with horrid English auto-dub).
The mod itself is quite straightforward, with the original 512 MB, 7 Gbps GDDR5 memory modules replaced with 1 GB, 8 Gbps chips and adding a resistor on the PCB to make the GPU recognize the higher density VRAM ICs. Although this doesn’t fix the fundamental split VRAM issue of the ASIC, it does give it access to 7 GB of faster, higher-density VRAM. In benchmarks performance was massively increased, with Unigine Superposition showing nearly a doubling in the score.
In addition to giving this GTX 970 a new lease on life, it also shows just how important having more VRAM on a GPU is, which is ironic in this era where somehow GPU manufacturers deem 8 GB of VRAM to be acceptable in 2025.
This is fun, but wake me up when someone successfully applies the technique to modern cards so you can mod a budget GPU to have 24+ gb ram to self-host big LLMs.
Its been done. You can finde such monstsrosities for sale on aliexpress. But with cheap 4060 16GB second hand cards, LLM is fun all day.
Which models specifically?
+1
Would love to make a cheap LLM box to text me nonsense all day long
Imagine how much money and power you could save by making friends and talking to real people instead
there are rx 580 16GB for like 130bucks
2080ti 22GB
4080 48GB (they desolder the gpu and solder it into a new 3090ti pcb that has clamshell ram and is pin compatible)
and we know those are legit
Do you know where can i pechare those please?
You can do that by yourself… if you have the tools, chips, knowledge and a steady hand.
But there are chinese companies doing this professionally.
Do you know where can i find those?
First, sorry I hit “report” instead of “reply” – my bad.
Second, I wouldn’t be surprised if GPU manufacturers started pulling tricks like parts-pairing (yes, parts-pairing a soldered-on RAM chip to the part its soldered to, sigh) or other shenanigans to prevent things like this, so their “budget” GPUs wouldn’t cut into sales of their “not so budget” GPUs.
The real surprise would be why they hadn’t done it the Apple way yet.
I certainly wouldn’t put the idea past them; but I’d be curious whether (when you are the one designing the memory controller) it would be more cost effective to order slightly nonstandard RAM chips with the extra cryptographic module packaged in; or if it would be easier to keep the RAM generic but add the handful of efuses you’d need to be able to blow the intended memory configuration into the GPU die relatively late in manufacturing.
Cryptographic pairing is powerful if you want to prevent replacement of a specific part with something else without your blessing; but in this case they don’t really need to enforce ‘this specific module as paired at the factory’; just ‘regardless of what lies the chip tells; it’s an X Gb chip of Y speed for a total capacity of ZGB on a whatever-bit bus’
I did just that on polybius comment too.
And it can’t be undone.
And, for me, the ‘reply’ button is gone.
Quitte annoying
I’d also like it if model sizes got figured out so that it didn’t feel like you have to step down a huge amount to fit it on the GPU.
I’ve seen a lot of models just a bit too large for 16GB then drop to smaller than 8GB
“modern cards”.. that makes the GTX 970 sound very old, but it’s merely 10 years old actually and still extremely advanced technology as such.
Only a few countries have the technology to build chips of such a high integration level.
I mean, sure, historically ten years are a long time in computing.
On other hand, computer technology has stagnated in the past 20 years or so.
There hadn’t been real innovations any more, like there had been in the 80s to 90s to 2000s.
The graphics of games of the past 20 years look same pretty much, if not worse by now.
(There’s a trend in gaming to make graphics of new games look dull).
It took ages until MS had released DirectX 12, for example.
It needed AMD Mantle to get things going.
DirectX 10 was released in 2007 with Vista, DirectX 12 in 2014. That’s 7 years.
Yes, there was DirectX 10.1 and 11, too, but they were more of a minor upgrade from a technological point of view.
With tessellation being worth to mention, I think.
398mm^2 on TSMC 28nm is indeed not messing around in either absolute terms or state-of-broader-industry terms.
Aside from reaping generational improvements in basically all areas(process nodes, GDDR speed and density, advanced packaging, probably some overlooked power silicon without which feeding the newer parts would be much more difficult) the biggest change seems to be Nvidia’s willingness to flirt with the reticle limit.
970 and 980 were both the same size, varying in completeness, 990 either didn’t exist or never made it out of prototype.
For the current gen the 4070 and 4080 are actually slightly smaller than the 970/980; but the 5090 goes straight to 750mm^2; while their enterprise stuff is just described as ‘reticle limited’.
Obviously the 5090 is not…really…a ‘mainstream’ product; but it’s still pretty notable that you can get reticle limited or close to reticle limited dies in a product that can be relatively trivially purchased and operated under household conditions.
In the context of ML, I would not consider the 970 a modern Nvidia card because it lacks tensor cores.
But even in general, it’s 1/3 as fast as an RTX 5070 despite their similar MSRP (after inflation). That’s below Maxwell’s Law, but still a huge difference.
And I would agree that game visual improvements have been marginal lately, but not on a 20 year time scale. Grand Theft Auto San Andreas was considered a high-fidelity AAA title 20 years ago. It looks unplayable by modern standards.
Thankfully (today I learned and figured I’d share), the auto-dub can be turned off in “settings”, “audio track”, “Portuguese”. First time I’ve seen this! I guess on an accessibility side, it’s cool that it’s an option at all, I’d take windows narrator voice reading he auto-translated subs any day over that dreck though.
Hi, mobile users can do same by changing to “desktop website” mode in their browser.
After this, when clicking on the gear symbol (settings) in the video window, it’s possible to select the original audio track.
it doesn’t look like this card is supported by nouveau for reclocking, meaning it’s going to run ungodly slow with that driver. you may find this card is a brick when nvidia stop updating their own drivers and the existing ones fail to work on new linux kernel versions.
We need 256GB RAM to load LLMs, not 8GB.
Does this card supports 256GB RAM?
Yes! Where are those frankesteins? We all need our own LLM
The “Only 3.5GB!” panic was always a bit of a misnomer: 3.5GB worked at full speed, and 0.5GB worked at full speed. What did not work was simultaneous writes or simultaneous reads across the final 512MB boundary (i.e. 3GB was unaffected anyway). If you were reading from one 512MB chunk and writing to the other 512MB chunk, both would operate at full speed, and the drivers and the card memory controller itself knew this and scheduled accordingly to minimise simultaneous access.
The upshot is that the ‘issue’ was only discovered months later, due to the cards extremely competitive real-world performance at launch. The discovery of the memory controller limitation did not retroactively reduce performance, after all.