[Adam Conway] wanted to store files in the cloud. However, if you haven’t noticed, unlimited free storage is hard to find. We aren’t sure if he wants to use the tool he built seriously, but he decided that if he could encode data in a video format, he could store his files on YouTube. Does it work? It does, and you can find the code on GitHub.
Of course, the efficiency isn’t very good. A 7 K image, for example, yielded a 9-megabyte video. If we were going to store files on YouTube, we’d encrypt them, too, making it even worse.
The first attempt was to break the file into pieces and encode them as QR codes. Makes sense, but it didn’t work out. To get enough data into each frame, the modules (think pixels) in the QR code were small. Combined with video compression, the system was unreliable.
Simplicity rules. Each frame is 1920×1080 and uses a black pixel as a one and a white pixel as a zero. In theory, this gives about 259 kbytes per frame. However, to help avoid problems decoding due to video compression, the real bits use a 5×5 pixel block, so that means you get about 10 kbytes of data per frame.
The code isn’t perfect. It can add things to the end of a file, for example, but that would be easy to fix. The protocol could use error correction and compression. You might even build encryption into it or store more data — old school cassette-style — using the audio channel. Still, as a proof of concept, it is pretty neat.
This might sound like a new idea, but people way back in the early home computer days could back up data to VCRs. This isn’t even the first time we’ve seen it done with YouTube.
Though I appreciate the hack, this seems like destructive tech in my mind, if widely used would help precipitate the sixth mass extinction event on planet earth. Data centers are consuming ever larger amount of energy, the size of small countries in some cases and it aint gonna get better. Data storage needs to be more efficient, not less.
I guess you’re not a fan of cryptocurrencies either :)
(neither am I)
I’ve nothing against crytpo-currencies per se, it’s just the energy that they waste in creating the mainstream coins eg Bitcoin. I dont know how much energy is used for storing the coins though – maybe not so much. No doubt we will start to see coin ledgers stored as video files :(
Crypto is insane with energy consumption. 2022 Columbia study.
https://news.climate.columbia.edu/2022/05/04/cryptocurrency-energy/
Only with the proof of work method. The proof of stake is much more efficient (over 99% more, apparently)
The traditional banking system uses a lot more energy to do things with much worse efficiency.
The problem isn’t really the tech, the tech only exists due to the perverse incentives created by offering a free video platform yet charging for storage.
Well we could tack ads onto storage. Think it would fly about as far as it does on that “free” video platform.
Heh, get an ad in the middle of every file?
Reminds me of a time when mobile telecom service providers first started offering internet access. The per-megabyte service fees, along with the fixed added monthly expense just to have this service, was at the time, exorbitant.
Compared to the cheapest “voice only” plans of the era, some basic math revealed that if voice calls were imagined as landline 300 bps modem calls, one could hypothetically transfer more bytes per unit cost at the voice-call rate, than what the “data plans” offered at their per-megabyte rate (plus fixed monthly fees).
Sure enough, I had to try this theory. So I built a circuit to connect a landline modem to the headphone jack of my mobile phone, and called my dial-up ISP. It worked! (At 300 baud, the GSM audio codec didn’t mess up the tones too badly. The modem also had to be config’d to not wait for a dial tone, and the ISP had to be manually dialed from the phone keypad.) With this hack I had mobile internet cheaper than everyone else!
…But it was painfully slow, taking almost a whole minute just for the most basic bare-bones hand-coded-in-Notepad HTML web page of the era to appear. Web pages with useful content took several minutes… and still that was with images disabled in Netscape!
Nah…
Unlikely as this is to ever become THAT popular, if it did I think it would be the end of free video uploads long before it was a world ending data center bloom. Or at least there would be a lot more verification, checking and banning.
I’m not sure if that would be a good or a bad thing…
The problem is not so much the use of energy as its storage in our atmosphere. And as far as extinction goes, it won’t matter much in the grand scheme of things. One could argue that it could even improve things. The last five extinction brought about more evolved life forms.
Why would this see mass adoption.. Even if mass amounts of people were competent enough to use it this way, it’s not really a concern—if it got anything near mass adoption then youtube would put a stop to it
Video and audio streams are delivered as UDP data streams and are therefore inherently unreliable since no error control is performed. For critical data this is a hair brained idea at best. I wouldn’t dismiss it for photo storage though. Give a few frames for each picture to avoid video artefacts and upload all your pics in one massive video to be viewed privately. After all this was meant to use youtube for free storage. Who cares about efficiency…
Using UDP doesn’t mean that error control isn’t performed at all. Just that it isn’t at the protocol level. The application level can still perform error correction to handle the unreliable underlying stream, so not really an issue.
“Snow Crash” Here We Come!
If each ‘pixel’ (5×5 block) was one of 16 colours (including black) instead of just black/white, that would be the equivalent of 4 bits not just 1 thereby increasing efficiency.
Obviously 256 colours would increase that to 8bits per block but would YouTube compression make that more error prone? It’s still fairly bad compressing 8bits into a 25pixel block.
Also want to know does it use 30 or 60fps as 60fps would be quicker to download data.
This has the potential to make youtube much worse, and feels like an abuse that would possibly make it harder to upload videos to in the future.
It may give rise to competition.
You’ve not spent much time on gootube lately, eh?
Most of the thousands of hours of videos uploaded every minute are now algorithmically-generated trash with click-bait titles designed to get accidental clicks for the 10ths-of-a-penny ad revenue. This hack isn’t going to make any difference at all even if vigorously adopted by a million users.
It would be interesting to see how much of Youtube’s content is just copies of popular videos cut up and re-narrated by a speech generator.
Sometimes when you get to that end of Youtube, the algorithm starts pushing you an endless stream of “Watch these incredible workers do X!” from some eastern Asian country, and they’re just the same clips re-arranged in a different order.
From what I have clicked on, a LOT. Mirrored video and clip collection of stuff that almost relates to the topic. The Macedonian click-farmers were very quick to take advantage of free or nearly free AI resources.
Wait until the fully generated movies and videos that are tuned bu sending to hundreds or millions of people and suing feed back – maybe an hour to do that iteratively. The videos will be perfect for the target demographic and it will just get better and faster.
Also, while Youtube officially de-monetizes automatically generated content, there’s a bit of a goat as the cabbage patch guard effect in play: they get a cut of the money from advertisers, so there’s an incentive to look the other way to inflate the ad views.
FBook is that way for scam advertisers and scam lawyers who want to know if you have cancer and ever went neat a hardware store with Roundup on sale.
Or US stamps for pennies on the dollar. WhoIs the URL and you get something created last week and registered in Iceland. Report it and FBook says they checked it out and found no violation of policy. Or the $800 excavator toy for $35.99. If there is ad money, FBook see’s nothing.
> they get a cut of the money from advertisers, so there’s an incentive to look the other way to inflate the ad views.
THANK YOU. I have been ranting about this for years. Youtube (and facebook as mentioned below) have zero incentive to stop their platforms being so terrible. Every ad impression is more money for them. In fact, they have a perverse incentive to NOT show the user what they want. That way the user will spend more time on the platform, continuing to look for what they want, all the time being served more ads.
I don’t think I’ve ever really come across any algorithmically generated videos yet and I use YouTube a fair amount. Are you maybe seeing them because you are looking for them?
The science documentary space is flooded with them, and they very quickly take over the entire ‘recommended’ section.
I suppose that might count a little as looking for them, but my searches start out with names like ‘neil degrasse’ or ‘brian greene’.
Next you know they recommend videos on similar subjects with actual interesting sounding titles and thumbnails. There shouldn’t be anything wrong with wanting to expand your horizons…
It doesn’t take long to see the video is the same few clips from other videos over and over, with a voice that sounds almost real, up until it tries to pronounce abbreviations as new words :P
Search for MoneyPrinter on github. You can run a little app, enter a keyword of your choice and it will run off to various services and knock up a youtube short for you. Soon the 18 pence paydays will be yours!
yes,youtube now is mostly fit with a BULLSHIT by the stupid ,but greedy,mostly westerns and indians.As well crazy peoples that put there any nonsense like “receiving all tv channels for free with used sim card and capacitor”. A lot of shitty “gurus” are there that do not understand nothing ,but try to teach others. Not to talk how much fake and scam is there . Because peoples are stupid and misuse tube in their majority ,strict measure must be get to restrict access of the individuals to youtube ,but only organizations ,public media,archives and schools ,universities research institutions. If you dont know how to use provided to you service with the good intentions ut will be FORBID for you. No,I’m not totalitarian ,I just understand that freedom shoul NOT be provided for the peoples because they misuse it. And no,most of videos by persons are nonsense or abuse copyrights . Why some famous somg should be posted by unknown shitty guy and not from original owner? It already happens – many author and producers post their songs/vidro in Youtube .
This is wrong on so many levels… I love it..
Perhaps a slightly more efficient method would be to hexdump the data, and record the output of this scrolling up your screen. To restore it, simply play the YT video and use OCR to recover the data.
Actually, in all probability, printing the data out on listing paper, and shipping it to YT to use as wall paper would be a more efficient storage strategy, but who am I to judge?
How would two alphanumerical characters per byte made of hundreds of pixels be “slightly more efficient” than just 8 pixels per byte?
It is well known that, for a given pixel size, text printed in normal 5×7 pixel font is more space-efficient than QR codes storing the same amount of information.
Welcome to the Internet, where “Well Known” = “Not True”. The efficiency of a QR code gets better as it gets larger. But even the tiny 21×21 QR code can encode 25 alphanumeric characters, where a 5×7 font would only allow for 12 characters in the same area. And the QR code includes redundancy and orientation information.
Now include the (required by the standard) buffer white space around the QR code and do that calculation again.
Yep, we’re definatley on the internet where “do that calculation again” means “I didn’t do it, and am hoping you won’t either”. But the 15% margin for 21 pixels is 3 pixels and doesn’t make room room for another character.
Before asking me to do more calculations, actually do them yourself and include them in your reply, and do it for a QR code of an appropriate size to optimize the resolution available in a Youtube video.
QR code requires a minimum of 4 pixel (modules) margin. A 21×21 code takes a space of 841 pixels, and can store between 7 and 17 ASCII characters, depending on amount of error correct employed. A 5×7 human-readable matrix, plus 1 pixel buffer is 48 pixels per character, yielding 17 ASCII characters (plus change). If you want to use a “alphanumeric” subset of the full 256 character ASCII set, QR code can store more. But then 4×5 fonts are perfectly readable too, yielding at least 28 alphanumeric characters in that space, and more if using proportional spacing. Human-readable matrix glyphs are at least as information-dense dense as QR codes, and have the advantage that you don’t need an app to see the URL. They’re also just as readable to computer eyeballs too.
Did you read the article? He didn’t end up using QR codes but 1 (scaled) pixel per bit.
Right. I should be fired for my incompetence. Thanks for catching that.
Currently it only uses black and white pixels, you could probably make it more efficient by using colour, that way you could store multiple bits per cell, similar to flash storage.
If you had only a few colours and they were different enough then that would probably be quite reliable as even if the colours are altered slightly it is still possible to tell them apart.
Reminds me of the old Amiga days. There we had Video Backup System, which also got a DOS/Windows port with a dedicated ISA card.
More info from the guy that wrote the software: http://hugolyppens.com/VBS.html or a youtube series about it: https://www.youtube.com/watch?v=yeFfn9LYlhQ
A whopping 60MB (of 60 floppies) per hour with consumer hardware at a low cost price!
was thinking of the same. Made a lot of sense when PC storage was precious and VHS tapes were so common homeless people wore them as shoes and used the tape to add insulation to jackets.
Tape storage really could come down in price, especially hardware.
Tape is relatively cheap, but not very durable, slow and otherwise inconvenient. If you spread the cost of the tape drive over the number of tapes that you can record, the price can be competitive. I just looked up the price of a fairly current tape media LTO8 and it can be had for at least as low as 50USD for 12TB of uncompressed storage. A quick look tells me the drive is about $3kUSD new. You’d probably need a SAS controller too. I don’t personally have enough data to justify buying them. If anything I would go for a second hand legacy setup for a fraction of the price.
I still have the VBS system my Dad used when I was a kid. I recall us making backups with the best VCR in the house and the best quality tapes he could find. His GVP hard drive on the A500 was 100Mb (I still have it too) and the few times we actually tried to restore files it failed most of the time. I’ve been itching to try it out now by doing a direct capture to PC instead of tape but it is such a goofy time waste I just haven’t bothered to set it up.
Should be subtitled “How to get your Google account permanently banned”
… and then set up another account for free….
I create a new one for every android phone I own.
Generally, losing access to your data storage is more meaningful than being able to create an account to store new data.
Seen this before on Hackaday
https://hackaday.com/2023/02/21/youtube-as-infinite-file-storage/
Thank you! Someone remembered this. I thought I was going crazy. It’s literally the same idea a year later nearly to the day. I have no idea why this person didn’t just google the idea and find an exact copy already in existence.
I also remembered. But since I waste a lot of time on YT/www looking for interesting stuff, it happeneds a lot of times to see it again on HaD and many times I searched for that subject on HaD but didn’t find it. So now I’m wandering when I see something on HaD if I saw it here or elsewere? Better stop worrying.
To extend capacity per frame, dont simply encode ones and zeros as black and white. Encode 3 bits or NIBBLES using a color palette. Keep the color code simple enough that even with poor reproduction, you can still read it. The CGA color palette is perfect for this. Bit pattern already defined.
Would be smart to prepend each file “header” with a calibration chart.
The audio channels are also usable for data at “fax machine/modem” data rates like 28.8k bps
Selective procedural generation might help.
While you’re at it use the soundtrack too.
Considering some basic knowledge of video encoding algorithms would be quite helpful here
1. The video will be encoded as YUV (technically YCbCr or whatever). Take advantage of the three, mostly independent channels. Note that U & V are subsampled so offer only a quarter of the full resolution.
2. 5×5 is not a good block size, video macroblocks are power-of-two sized, the exact size depending on the codec. I would go for 4×4, this should reduce edge blurring significantly
Just these two changes would increase the density to about 24KB per frame on 1080p.
(yeah, I know this was meant to be a joke project)
Yep, my first thought was “color, you’re not using it!”.
As for “joke project”, my second thought was that it could be used for steganography as additive “noise”, but I don’t know how much it would be mangled by the codec.
How long before someone writes a FUSE wrapper for this? Remember GMailFS???
I was trying to remember the name of GMaiFS… I read about it on Slashdot back when it came out. Interesting to see so many similar responses in the comments for this Youtube version :-)
using hybrid resource attack (store 10kb of your data within 259kb of other’s free storage) is pure evil
It’s youtube, I’m not losing sleep over it.
This is hardly among the things which people do that are pure evil, by any sane definition
I’ve always been curious about data stored via audio stream. It works via fax. What if we scale that up to multichannel audio? I believe YT music allows free music uploads. Just don’t know how it handles multichannel audio.
Youtube also re-encodes old videos for higher compression and tosses away the original file. There will be slow “data rot” over time.
https://www.reddit.com/r/DataHoarder/comments/ize8xs/is_youtube_reducing_the_quality_of_old_videos/
Has anyone tried to store a video like this? (I will see my self out)
Nice!
FYI: Encryption doesn’t necesarily increases the size of the data. It completely depends on the cipher en algorithm used.
Ex: Full disk encryption requires to use something for which encrypted and unencrypted data take up exactly the same amount of space.
Some 5 years ago I found an Android app that allowed data transfer via several QR codes. On the phones I used to test it it failed miserably. Forgot the app’s name.
“[Adam Conway] wanted to store files in the cloud.”
—> I sure don’t want to store anything in the cloud… Unless it’s my own cloud of course. But then It’s obviously unlimited.
Now just use this technique for a swapfile.
Setting aside any ethical questions (largely a moot point since this is youtube we are talking about), this is what would be required in order to store a decent amount of data securely in a YouTube video:
1. Compression: use something with a high compression ratio (such as xz) to compress the data.
2. Encryption: any decent algorithm/key size should work. The real issue here is key management and distribution.
3. Encoding: assume we have 3×8 bit color channels. We can’t rely on every color being perfectly reproducible, but it should be fairly easy to decode 4 bits per channel. That gives us a total of 12 usable bits per pixel (or “superpixel” if we choose to set multiple pixels to the same color). This is convenient, as we can group 6 (super)pixels together to form 72-bit symbols, each encoding 64 bits of data and combined with 8 bits of error correction codes.
4. The audio channels don’t have enough bandwidth to encode a useful amount of data. However, it could potentially be used for synchronization or to encode metadata useful to a decoder.
Encode the file using Base64, break them up into 5mb chucks and upload to pastebin
would something like this work ?
import os
import random
import wave
import struct
from zipfile import ZipFile
from PIL import Image, ImageDraw
from moviepy.editor import VideoClip
from moviepy.audio.AudioClip import AudioArrayClip
from moviepy.editor import concatenate_videoclips
# Step 1: File Compression
def compress_file(input_file, output_zip):
with ZipFile(output_zip, ‘w’) as zipf:
zipf.write(input_file)
# Step 2: Data Segmentation
def segment_data(compressed_file, segment_size):
with open(compressed_file, ‘rb’) as f:
data = f.read()
segments = [data[i:i+segment_size] for i in range(0, len(data), segment_size)]
return segments
# Step 3: Grid Image Generation
def generate_grid_image(data_segment, index):
img = Image.new(‘RGB’, (1920, 1080), color=’black’)
draw = ImageDraw.Draw(img)
cell_width = 1920 / 18
cell_height = 1080 / 9
for i, byte in enumerate(data_segment):
x = (i % 18) * cell_width
y = (i // 18) * cell_height
color = (byte, byte, byte) # Assuming grayscale encoding
draw.rectangle([x, y, x + cell_width, y + cell_height], fill=color)
img.save(f’frame_{index}.png’)
# Step 4: Auditory Cue Addition
def add_auditory_cues(num_frames):
click_sound = generate_click_sound(duration=0.25)
audio_clips = [click_sound for _ in range(num_frames)]
concatenated_audio = concatenate_videoclips(audio_clips)
concatenated_audio.write_audiofile(“auditory_cues.mp3″)
def generate_click_sound(duration=0.25, sample_rate=44100):
num_samples = int(duration * sample_rate)
click_sound = [random.uniform(-1, 1) for _ in range(num_samples)]
click_sound = [struct.pack(‘h’, int(sample * 32767)) for sample in click_sound]
click_sound = b”.join(click_sound)
return AudioArrayClip(click_sound, fps=sample_rate)
# Step 5: Video Compilation
def compile_video(num_frames):
frames = [f’frame_{i}.png’ for i in range(num_frames)]
images = [Image.open(frame) for frame in frames]
video = VideoClip(lambda t: images[int(t * 24)], duration=num_frames / 24)
video.write_videofile(“output_video.mp4”, fps=24)
# Step 6: Video Export
def export_video():
pass # No additional export step needed
if __name__ == “__main__”:
# Initial assumptions and requirements
input_file = “input_file.txt”
output_zip = “compressed_file.zip”
compress_file(input_file, output_zip)
segment_size = 18 * 9 * 3 # 18×9 grid cells, 3 bytes per pixel
segments = segment_data(output_zip, segment_size)
for i, segment in enumerate(segments):
generate_grid_image(segment, i)
add_auditory_cues(len(segments))
compile_video(len(segments))
export_video()
If you need bro, let me know . I don’t want you to be storing files on YouTube
Or use unlim cloud app on playstore to dump data from pc / phone to telegram.
Have a system to store data in video, say a live stream that gets uploded to youtube after. Legitamite stream hidden data. Encrypted and unreadable by anyone else because of dropped bits and small par2 files to recover them.
Yeah, I also had this thought a while back, only I also decided to use a color palette to increase the data density. Trouble is that YouTube will never like this and will actively implement some guard against that pretty quick.
bros getting mad on a proof of concept code smh
great work on storage! if you need to later download, i made a simple webapp with pytube and pydub to download to mp3 or mp4. https://www.slyautomation.com/blog/youtube-to-mp3-code/ all deployed with streamlit!