High Quality 3D Scene Generation From 2D Source, In Realtime

September 2, 2023

Here’s some fascinating work presented at SIGGRAPH 2023 of a method for radiance field rendering using a novel technique called Gaussian Splatting. What’s that mean? It means synthesizing a 3D scene from 2D images, in high quality and in real time, as the short animation shown above shows.

Neural Radiance Fields (NeRFs) are a method of leveraging machine learning to, in a way, do what photogrammetry does: synthesize complex scenes and views based on input images. But NeRFs work in a fraction of the time, and require only a fraction of the source material. There are different ways to go about this and unsurprisingly, there tends to be a clear speed vs. quality tradeoff. But as the video accompanying this new work seems to show, clever techniques mean the best of both worlds.

A short video summary is embedded just below the page break. Interested in deeper details? The research PDF is here. The amount of development this field has seen is nothing short of staggering, and certainly higher in quality than what was state-of-the-art for NeRFs only a year ago.

28 thoughts on “High Quality 3D Scene Generation From 2D Source, In Realtime”

The Commenter Formerly Known As Ren says:

September 2, 2023 at 8:44 pm

I still don’t get what the “original” is…
Is it a series of photographs that are rendered 3D?
Is It a single photograph?
Is it a computer image?

Report comment

Reply
1. scott_tx says:
  
  September 2, 2023 at 8:54 pm
  
  They render transparent anisotropic gaussians and back propagate their gradients to their properties.. I dont know how they could explain it any simpler than that.
  
  Report comment
  
  Reply
  1. The Commenter Formerly Known As Ren says:
    
    September 2, 2023 at 9:04 pm
    
    Which is way over my head,
    But I am willing to accept that!
    B^)
    
    Report comment
    
    Reply
  2. tadpole says:
    
    September 3, 2023 at 3:11 am
    
    Yes, but what makes this really interesting is how they also applied a fourrier-transformation filter (NOT! fast-fourrier) to the individual vector components to re-align the main focal points of each frame, allowing for a smoother render in the end pass. Wait, what?!
    
    Report comment
    
    Reply
    1. itsalltoosimple says:
      
      September 3, 2023 at 6:38 am
      
      https://m.youtube.com/watch?v=Ac7G7xOG2Ag
      
      Report comment
      
      Reply
      1. Ostracus says:
        
        September 3, 2023 at 10:44 am
        
        Ah, so that’s who inspired Star Trek technobabble.
        
        Report comment
      2. tadpole says:
        
        September 4, 2023 at 9:46 am
        
        https://www.youtube.com/watch?v=G7Do2tlYLhs
        my inspiration
        
        Report comment
  3. craig says:
    
    September 4, 2023 at 1:26 pm
    
    “They render transparent anisotropic gaussians and back propagate their gradients to their properties.. I dont know how they could explain it any simpler than that.”
    
    Love
    This.
    
    Report comment
    
    Reply
  4. Morgantao says:
    
    September 4, 2023 at 10:22 pm
    
    A simpler way to describe the process is… Magic!
    
    Report comment
    
    Reply
2. Odel says:
  
  September 2, 2023 at 9:01 pm
  
  You feed it a Structure-from-Motion scene, these can be generated by colmap from a series of images
  
  Report comment
  
  Reply
3. PWalsh says:
  
  September 2, 2023 at 9:04 pm
  
  I haven’t grokked the paper and the code yet, but here’s my understanding so far.
  
  They take a set of still images and interpolate the objects, conceptually like photogrammetry interpolates objects.
  
  They then encode the objects as 3D gaussians: a 1D gaussian is a bell curve, a 2D is like a sand dune, and a 3D gaussian is like a sausage or gel-cap shape.
  
  Just as you can encode any 1D signal as a sum of sin and cos, with higher frequencies making for higher resolution, you can encode objects as a sum of 3D gaussians with smaller gaussians making for higher resolution of the 3D objects.
  
  Once you have the 3D objects encoded as gaussians, synthesizing a camera viewpoint from anywhere in the scene is easy: rotate and scale the gaussians and add them together.
  
  Once you have trained on some images, you can move a virtual camera around the scene in real time.
  
  I think that’s the thrust of the paper, but note that I haven’t looked deeply into it yet.
  
  Report comment
  
  Reply
  1. The Commenter Formerly Known As Ren says:
    
    September 2, 2023 at 9:07 pm
    
    Thanks, that helps me understand better (I think! ).
    
    Report comment
    
    Reply
    1. Michael says:
      
      September 2, 2023 at 10:20 pm
      
      Right, so the way I’d explain it to you, to sum it up is, they’ve taken a lot of photos, and basically exploded everything into little tiny particles—those particles (Gaussians) are basically the dna/bones of the photo set, and using mathematics, and machine learning-photo/video editing fly the program connects all the “dots” and synthesizes a movable, explorable video or 3D experience.
      
      It’s an interesting way to achieve some form of photogrammetry. I’ve never even thought of this before.
      
      Report comment
      
      Reply
      1. YesMaybe says:
        
        September 3, 2023 at 6:43 am
        
        There’s the magicians trick revealed: they take a lot of photos.
        Now do all this gaussian explosion using just *one* photo and with AI filling in the rest.
        
        Report comment
    2. Ostracus says:
      
      September 2, 2023 at 10:59 pm
      
      Just wait till we start throwing quantum into things. :-D
      
      Report comment
      
      Reply
      1. tadpole says:
        
        September 3, 2023 at 3:13 am
        
        When you add the Quantum, the colors really pop.
        
        Report comment
4. come2 says:
  
  September 2, 2023 at 9:14 pm
  
  It uses multiples images, the technology used is presented in this article https://hackaday.com/2022/11/30/nerf-neural-radiance-fields/ but they add the idea is to approximate the thing with a 3D gaussian surface (like the standard probability curve but in 3D) that is not isotropic (right and steady). I still don’t understand it, but the video I found that explains the entire paper lasts 3 HOURS.
  
  Report comment
  
  Reply
  1. SilentMunk says:
    
    September 2, 2023 at 10:00 pm
    
    Would you mind sharing said video?
    
    Report comment
    
    Reply
    1. Ostracus says:
      
      September 2, 2023 at 11:10 pm
      
      Two hours and forty six minutes.
      
      https://youtu.be/xgwvU7S0K-k
      
      Report comment
      
      Reply
5. TG says:
  
  September 2, 2023 at 9:36 pm
  
  You were right the first time. The “original” is a series of photographs from different angles, processed to become a 3d scene you can interact with and explore. Kind of like photogrammetry, except with fewer photos and better results. And it does not involve meshing a point cloud (which is a big source of dog shit models).
  
  Report comment
  
  Reply
  1. tadpole says:
    
    September 3, 2023 at 3:15 am
    
    So a series of “photoshop transform” actions? /sarcasm (or not, based on if my blind comment is at all correct)
    
    Report comment
    
    Reply
TG says:

September 2, 2023 at 9:39 pm

Been following nerfs a good while, mostly due to a coworker who is much more knowledgeable about the guts n gears involved than I am. The way it can render things like transparency and reflection, or lighting effects such as bloom.. it’s pretty astounding. You have to see it to believe it (and you have to know a bit about how nasty these things looked just a few years ago).

Report comment

Reply
Demon says:

September 3, 2023 at 1:18 pm

It’s not realtime, it can show something but you have to wait like progressive rendering, ofcause it requires 4090, and quality worse than professional software in photogrametry, best is Reality Capture it can run on low end pc. So this is all ads from nvidia.

Report comment

Reply
1. S O says:
  
  September 3, 2023 at 3:02 pm
  
  This is not *just* an ad. Yes it is not realtime, that’s referring to after data processing, and yes it requires high end hardware. I’m not sure where you are getting that it’s worse than existing photogrammetry software though, that’s just incorrect. It’s also a lot faster.
  
  Report comment
  
  Reply
2. shod says:
  
  September 5, 2023 at 3:22 am
  
  It uses CUDA? I think it’s a bit odd that universities allow such propriety tie-in like CUDA. Especially when there are alternatives.
  Regardless if they had no payments from or contact with Nvidia.
  
  Report comment
  
  Reply
3. shod says:
  
  September 5, 2023 at 3:30 am
  
  Oh and you are advertising software owned by Epicgames and requiring ‘activation’ and all kinds of agreements. While complaining of ads…
  So do you have a list of companies that you think ads are allowed from and which ones not?
  
  Report comment
  
  Reply
craig says:

September 4, 2023 at 1:24 pm

I’m so powerfully ignorant. Taking a bunch of photos of something then making a “3d scene”. Isn’t that just…. making a video? I thought the trick would be taking a single still image, hand waving, AI etc and generate a 3d scene like in the title. But comments lead me to believe otherwise.

Report comment

Reply
1. Curt says:
  
  September 9, 2023 at 1:49 pm
  
  Imagine that you can create a video flyby of a scene, moving at angles that you do not have pictures for, without a 3d, mesh-based scene being generated. Imagine that the “scene” is being created as a collections of clouds of light, that don’t care about “solid” “transparent” or “reflective” just the way light moves and from that you can render out arbitrary videos of in-scene motion. And you create the “scene” from just a handful of 2d images
  
  Report comment
  
  Reply

Hackaday

High Quality 3D Scene Generation From 2D Source, In Realtime

28 thoughts on “High Quality 3D Scene Generation From 2D Source, In Realtime”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

Back To The Future, 40 Years Old, Looks Like The Past

Why The Latest Linux Kernel Won’t Run On Your 486 And 586 Anymore

One Laptop Manufacturer Had To Stop Janet Jackson Crashing Laptops

The 2025 Iberian Peninsula Blackout: From Solar Wobbles To Cascade Failures

Field Guide To The North American Weigh Station

Our Columns

Hackaday Podcast Episode 327: A Ploopy Knob, Rube-Goldberg Book Scanner, Hard Drives And Power Grids Oscillating Out Of Control

Last Chance: 2025 Hackaday Supercon Still Wants You!

FLOSS Weekly Episode 839: I Want To Get Paid Twice

South Korea Brought High-Rise Fire Escape Solutions To The Masses

C++ Encounters Of The Rusty Zig Kind

28 thoughts on “High Quality 3D Scene Generation From 2D Source, In Realtime”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns