Gaussian splatting is a new technique for representing real-life scenes in 3D. It can model the world with amazing detail and fidelity, accurately capturing not only the geometry of the scene, but also lighting and reflections. Best of all, Gaussian splats can be created automatically from just a few photos captured with your smartphone by leveraging the latest advances in machine learning.
Traditionally, meshes have been the standard way of representing objects in 3D. They use triangles to represent the surface of an object.
Splats, on the other hand, are based on volume. They’re composed of 3D Gaussians, each with a position, size, orientation, and color properties. You can think of these Gaussians as fuzzy blobs in space — and with enough of these blobs, you can accurately model any shape or scene you want. (They're named after Carl Friedrich Gauss, the mathematician who introduced the concept of normal distributions, also known as bell curves. Gaussian splatting essentially uses 3D bell curves!)
While gaussians have been used for rendering since the 90s, it wasn't until last year that techniques for automatically fitting gaussians to scenes began to emerge. Optimizations and improvements quickly followed, and today you can capture a series of images with your smartphone and turn them into an incredibly lifelike 3D model in seconds.
That's the basics. Let's get into a bit more detail.
From 2D to 3D
An ordinary 2D photo captures a scene from one particular view point. While it might give you an idea of what something looks like from a particular angle, it won’t let you see how it looks from the other side. There might also be objects blocking your view or things that you’d like to see that lie outside the frame.
Adding a second photo gives you more information. Perhaps you can see the back of the object or areas that weren’t visible in the first photo. With a few more photos in the right locations, you might be able to piece together what the scene looks like in 3D.
Similarly, we can use multiple photos of the scene to construct a 3D Gaussian splat. The Gaussians start out as a rough guess at the geometry and color of the scene. We then render the scene from the same vantage point as one of our photos and compare the two. We then look at the differences between the images and adjust the properties of the Gaussians to make the rendered version a little bit closer to the actual photo. Then we repeat the same process from another view point. Over time, these small adjustments get us closer and closer to the actual scene. With enough steps, we end up with a collection of Gaussians that match all of the input images when rendered—in other words, an accurate 3D model of the scene. This process is called gradient descent and is used not only in 3D Gaussian splatting, but also for training of neural networks and large language models.
This is a splat, embedded in an iframe. You can drag, zoom, etc. A few things to notice: how perspective shifts properly through the windows, the smooth curves of the lamps and the round part of the sign, and how you can go behind the pillar.
A better representation
While meshes are well-understood and can be rendered efficiently, the real world isn’t made up of flat, opaque triangles. Real objects have volume, and can interact with light in complex ways throughout their volume, not just on the surface. And most objects have curves which can be difficult to represent accurately with triangles, but can be easily modeled with a handful of Gaussians.
While their ability to better model the world is great, the real strength of Gaussians lies in their fuzziness. You’ll recall from above that splats are created through a series of small steps, each reducing the difference between rendered and real images in a process called gradient descent. In order to do this, we need to be able to compute the gradient—the direction that we want to take our small adjustment—and this requires that our representation to be continuous in space. Gaussians, by virtue of their fuzziness, fit the bill, while meshes don't – a point in space is either in a triangle or not. This key difference means that splats can be automatically learned from a collection of photos using gradient descent, while meshes can't.
In the mesh, the water looks odd and there's no background because that would add far too much complexity. The splat looks smoother, accurately portrays how water reflects and distorts light, and handles the background with no issue.
It's just the beginning for splats
Gaussian splatting is a very new technique, and computer scientists and engineers are only beginning to figure out just what we can do with splatting. At Scaniverse, we have figured out how to train a splat entirely on your phone, so within a minute or two, you can get a ready-to-use splat.
We are also applying a geographic perspective to splatting, including working out how to combine many splats into a single, cohesive 3D map of the world. It's an essential component of Niantic's efforts to make global spatial computing a reality.
If you've got more thoughts or questions about splatting in general or Scaniverse in specific, please hop on over to our Community. (You'll have to download and sign in to Scaniverse in order to participate.)