Modern games are quite often limited by the amount of volatile memory available to the GPU. Games can require many gigabytes of data during the rasterization process. So the obvious solution for better performance would be to buy a faster GPU, right? Well, not for [AssassinWarlord], who decided to find just what happens when you double the VRAM on an RTX 3070. The forum post is in German, but a translator gets the job done rather nicely.
For those of you following along at home, you will need a set of eight Samsung K4ZAF325BM-HC16 GDDR6 memory modules. In this case, the memory modules were salvaged from an AMD RX6900XT with a defective core. Naturally, you will need to re-ball the chips. To help the process, [AssassinWarlord] bought a stencil from AliExpress, with a 3D-printed holder for the memory modules.
It was a bit of a headache getting the card to actually reach the higher power states that the increased memory allows. Changing the “shunt” resistors allows the card to increase its power draw, but this proved ineffective. Some tuning and memory speed mapping at last proved successful in running the card stably.
But what happens when you benchmark the card? Well, not much. Most synthetic loads are compute-bound, not memory-bound. Therefore, overclocking the GPU would yield a greater performance increase than merely changing the memory modules. However, in some game titles, there is a marked performance benefit by nature of relying less on the glacially slow system memory. Is the performance difference worth the time and effort? No, probably not. The games that see an uptick in performance are few and far between (for now), so it’s not exactly a good effort-to-performance ratio.
If you are looking to follow along at home, make sure to check out this memory transplant next!
Thanks, [f!P[z]y] for the tip!

So, are we going to post this every half a year now?
It’s an actual hack in the actual spirit of HaD, unlike most of the click bait YouTube stuff, articles about stuff you shouldn’t actually be printing, or many other things that should be shamed instead of giving them a virtual high-five and publicity. This requires multiple skill sets and is something we should frequently be reminded exists.
So yeah.
Give me one of these articles every few months just to prove that people are still doing it.
I am not an expert, but while this upgrade has little impact on games isn’t it a potentially useful upgrade for running machine learning models:
With 32 GB of VRAM, you unlock a completely higher tier of intelligence. You can comfortably fit the following models, leaving enough leftover VRAM for a large 8k+ context window:
30B–34B Parameter Models (Medium-High Precision):
Yi-34B or Codellama-34B quantized at Q5_K_M (~24 GB) or Q8_0 (~36 GB requires slight offloading, but Q6 is perfect).
Command R (35B): A highly capable model for retrieval-augmented generation (RAG) that runs beautifully at Q4_K_M or Q5_K_M.
70B Parameter Models (Highly Compressed):
Llama 3 70B or Mistral Large quantized down to Q3_K_L or Q3_K_M (~28–30 GB). While highly compressed, a 70B model at 3-bit precision often outperforms a smaller 8B model at full precision. [1, 2]
Mixture of Experts (MoE) Models:
Mixtral 8x7B: This model has 47B total parameters but only uses about 13B per token. It requires roughly 26 GB of VRAM at Q4_K_M precision, making it an exclusive luxury for a 32 GB card. [1, 2, 3]
Alas, the card doesn’t have 32gb of VRAM, it has 16 (as the original 3070 had 8).
There are other, much simpler options if you want a 16gb gpu for machine learning.
This is really just a neat and technically impressive hack.
Also, I beg of you, please don’t cut-and-paste big chunks of AI answers into your posts especially if, as you say, you’re not an expert on the topic. Even if it were a 32gb card, the “info” provided is not useful to anyone; it’s a list of extremely outdated models and overly verbose statements of the obvious.
There are two kinds of “AI people”: People who use the AI to replace their brain, and people who use it to augment it. The first type are the actual worst, because they don’t know enough to stop the AI from doing something stupid, and then the spew the results all over the place and expect people to clap.
Indeed.
I’ve been trying out MS Copilot as a reference source and the results can be somewhat erratic to say the least. Even when clearly instructed to search for the most recent information and only use reliable sources.
It can also get ‘stuck in a rut’ where it will acknowledge it has made a mistake then respond by rephrasing the exact same answer. Or, as has happened on a few occasions, it will insist that Wikipedia and YouTube are reliable sources :)
Yeah large language models, such as ChatGPT, Copilot, etc, are good at LANGUAGE tasks. Such as re-wording an email to sound more professional, or more friendly. The purpose of an LLM is to generate new sentences.
They are emphatically NOT fact models. They are not for looking things up. That’s what a search engine is for. A language model is for producing new text that hasn’t been said before (aka fiction).
Last week I set up ollama so I could mess around with the new Gemm4 models that fit in a 16GB video card. At one point I asked it to write a specific image-cropping program, and the “thinking about it” preamble was 1100+ lines, and nearly all of that was going around and around a ton of variations on how to do the cropping. Funnily (?) enough, when it actually wrote code, it was very wrong. (To be fair I also know LLMs can actually generate the program I was looking for, and I may have inadvertently said something wrong in my prompt.)
Why in gods name would you be so moronic as to copy-and-paste an AI?
Shame on you.
The “only” viable (read proceed at your own risk) rtx DIY I’ve seen was the 3090. So we’d be going from 24g VRAM to 48g. I’d be over the moon for 24g VRAM. I do ml and some llm/AI work. But I’m a researcher so my needs overlap and are different in the venn diagram of usage here. But I know gamers know more than most phd I know (yes I’m speaking in gross aggregate). The question is.. Risk the 3070 for 20g Vram (for my math). Might be worth it… Depends on the soldering. Not my area. But man… I need the vram!
Hey guys I don’t know enough to comment but if we just actually take the resistor then solder the capacitor wouldn’t that increase the flux capacitor into a Giga state that would have verbose ramifications on the dextrososis 3fe side? Correct if I’m wrong…
Little impact in games? Going from 8 to 16 GB?
AI applications would certainly benefit from added VRam. Pity it’s not mentioned in the article.
That’s also wasted effort vs. benefit, imo.
That may be the real reason, but people are rightfully upset when you mention you’re doing something for the sake of AI.
Real hacking, and not stopping when issues appear!
I have a snapdragon x laptop I got last year with only 16gb of unified ram and I’d like to do this to it.
I can see a use for more VRAM.
Large amounts of user-created 3D content in games can bring any GPU down. Ever entered a big lobby on VRChat or ChilloutVR or something? 40 people with >100MB (compressed) of Avater each cause trouble with my 6GB, unless I limit it to 10 or so visible at once.
Also compute stuff, like blender renders, or training a computer vision model, can need large amounts of VRAM for bigger tasks. And i guess you can also sloperate harder, but that’s boring and not worth it.
Maybe someone can use AI to reverse-engineer the checksum computation used by Nvidia for the BIOS. Then the RAM timing tables can be fixed so you don’t have to run in P0 mode all the time when doing this upgrade.
This sounds like a chicken-and-egg problem: game developers will target (i.e. optimize for) the ratio of VRAM-to-compute found on the most common off-the-shelf video cards and make the compute-vs.-store decisions in their rendering pipeline accordingly. So even if a game could benefit in theory, as in there exists a possible rendering pipeline to generate the same final image from the same scene while making optimal use of the extra scribble space, that’s unlikely to be the render pipeline the game uses.
To really leverage all possible ratios of compute to memory a game would need a mechanism to poll the card on its properties (akin to what vkinfo outputs) and then know how to build the most efficient rendering pipeline for that card. Needless to say, this would be tricky to get right without a pretty advanced way of encoding the abstract intent, dependencies, and desired priorities which would preclude a lot of the clever rendering hacks that games have always used to wow their players with snazzy effects which look like they ought to take more horsepower than your hardware has…
What we are getting today is developers targeting last generation’s Mid to high end models as a target for what the next generation low to mid range will do.
Then the GPU manufacturer of note, Nvidia, paywalling more than 8GB of RAM at a $800+ price point.
There is not a steady supply of 12 and 16GB cards from years past to fill in the low end, so you are stuck, and very soon games will refuse to run on 8GB because the textures are all 4K and have to load into VRAM.
It’s poor optimization really.
I’m glad increasing RAM capacity puts the card into a higher power state.
Actually that sentence made no sense, as the RAM only accounts for 10s of watts while the core that does the computation is not meaningfully impacted by more RAM unless a lack of RAM was causing idle periods. This can happen in some newer games.