> The triangles are well aligned with the underlying geometry. All triangles share a consistent orientation and lie flat on the surface.
When I first read "triangle splatting," I assumed Gaussians were just replaced by triangles, but triangles being aligned with geometry changes everything. Looking forward to seeing this in action in traditional rendering pipelines.
Normal aligned 3DGS was already proposed earlier[1].
This seems more like an iteration on getting performance improvement by moving from the theoretical principle closer to what the hardware is already doing well and finding a sweet spot of brute amount of features, fast rasterization method and perceived image quality.
It's already noticeable that there doesn't seem to be one fits all approach. Volumetric feathered features like clouds will not profit much from triangle representation vs high visual frequency features.
There are various avenues for speeding up rendering and improving 3d performance of 3DGS.
it's surely a very interesting research space to watch.
I think the major computational task is sorting the primitives, which works great on GPUs but not so much on CPUs. Im sure there is some research happening on sort-free primitives
After an email exchange with the lead author, I just rendered one of their demo datasets using my Datoviz GPU rendering library [1]. It looks nice and it's quite fast. I'm just rendering uniform triangles using standard 3D rasterization.
Strange that it works for me in various browsers while I'm logged out of GitHub. Might be a caching thing. Anyway, I reuploaded it elsewhere: https://imgur.com/Qc8r15I
This looks really nice, but I cant help to think this is a stop gap solution like other splatting techniques. It's certainly better than NERFs, where the whole scene is contained in a black box, but reality is not made up of a triangle soup or gaussian blobs. Most of the real world is made up of volumes, but can often be thought of as surfaces. It makes sense to represent the ground, a table, walls, etc with planes, not a cloud of semi translucent triangles. This is like pouring LEGO on the floor and moving them around until you get something that looks ok from a distance, instead of putting them together. Obviously looking good is often all that's needed, but it doesn't feel very elegant.
Although, the normals look pretty good in their example images, maybe you can get good geometry from this using some post processing? But then is a triangle soup really the best way of doing that? My impression is that this is chosen specifically to get a final representation that is efficient to render on GPUs. I haven't done any graphics programming in years, but I thought you'd want to keep the number of draw calls down, do you need to cluster these triangles into fewer draw calls?
Is there any work being done to optimize a volumetric representation of scenes and from that create a set of surfaces with realistic looking shaders or similar? I know one of the big benefits of these splatting techniques is that it captures reflections, opacity, anisotropicity, etc, so "old school" photogrammetry with marching cubes and textured meshes have a hard time competing with the visual quality.
> I haven't done any graphics programming in years, but I thought you'd want to keep the number of draw calls down, do you need to cluster these triangles into fewer draw calls?
GPUs draw can draw 10,000's of vertices per draw call, whether they are connected together into logical objects or are "triangle soup" like this. There is some benefit to having triangles connected together so they can "share" a vertex, but not as much as you might think. Since GPUs are massively parallel, it does not matter much where on the screen or where in the buffer your data is.
> Is there any work being done to optimize a volumetric representation of scenes and from that create a set of surfaces with realistic looking shaders or similar?
This is basically where the field was going until nerfs and splats. But then nerfs and splats were such HUGE steps in fidelity, it inspired a ton of new research towards it, and I think rightfully so! Truth is that reality is really messy, so trying to reconstruct logically separated meshes for everything you see is a very hard way to try to recreate reality. Nerfs and splats recreate reality much easier.
A digital image is a soup, of RGB dots of various size.
Gaussian Splatting radically changed the approach to photogrammetry. Prior approaches to generate surface models, and mapping the captures to materials that a renderer would more or less rasterize with physically accuracy were hitting the ceiling of the technique.
NerF was also a revolution but is very compute intensive.
Even a browser, a mid range GPU, can render millions of splats at 60 frames per seconds. That's how fast it goes and less than a million dense scene can already be totally bluf the eye in most possible angles.
Splatting is the most advanced, promising and already delivered on the promise technique for photogrammetry. The limit is that can't do as much in term of modification to point clouds vs surface with great PBR attributes.
No, an image is a well ordered grid of pixels. The 3D variant would be voxels, and Nvidia recently released a project to do scene reconstruction with sparse voxels [0].
If you take these triangles, make them share vertices, and order them in a certain way, you have a mesh. You can then combine some of them into larger flat surfaces when that makes sense, draw thousands of them in one draw call, calculate intersections, volumes, physics, LODs, use textures with image compression instead of millions of colored objects, etc with them. Splatting is one way of answering the question "how do we reproduce these images in a way that lets us generate novel views of the same scene", not "what is the best representation of this 3D scene".
The aim is to find the light field that describes the scene, and if you have solid objects that function can be described on the surface of those objects. Seems like a much more elegant end result than a cloud of separate objects, no matter what shape they have, since that's much closer to how reality works. Obviously we need to handle volumetrics and translucency as well, but if we model the real surfaces as virtual surfaces I think things like reflections and shadow removal will be easier. At least gaussian splats have a hard time with reflections, they look good from some viewing angles, but the reflections are often handled as geometry [1].
I'm not arguing that it doesn't look good or that it doesn't serve a purpose, sometimes a photorealistic novel view of a real scene is all you want. But I still don't think it's the best representation of scenes.
I'm unsure whether it is me, but one of us is confused about the representation of a 2D image with a 3D scene. it's absolutely correct that a digital (2d) image is a grid of pixels. We can call it a soup if you want. An audio file, our text document are soups too.
A 3D scene (the digital representation) is structure that can't be reduced to a simple grid. At least it better not to or it wouldn't look great from almost all angles.
Back to the splats...
Gaussian Splatting is a technique designed to tackle what seemed like impossible or at least prove extremely challenging with 3D scene reconstruction. The authors took a radically different approach and demonstrated feasibility. I haven't meant anybody who would look at a gaussian splatting reconstruction of a scene, and claim another method would look better. Or even could look better. Maybe some day, but as of 2025 there isn't.
On the voxel definition. I don't see vowel as encompassing any (as in all) 3d structure representation. I mean by that voxel is a definition.. A good one. But not every representation fit into that definition so we better watch out with what we infer about anything that doesn't fit in.
Imo gaussian splats do not form a voxel.
But, let's say they do. So what?
No idea how that's relevant to my point which is that gaussian splatting (voxel or not voxel) is a superior technology to any other for 3d reconstructions to date. I even caveated the cases where this method would be totally unhelpful. As of now editing splats is barely a thing (Super splats is super, but all we may do is remove splats) Some software can fuse, even adjusts properties of the splats, areas of splats, but these editing solutions are at their infancy, so they just don't count.
Your point that triangles can be combined into larger flat surfaces and optimized for rendering is a valid, but it doesn't help with the fact that non gaussian splatting methods, including the much slower nerf approach are inferior in the quality (let's say fidelity) they all produce.
Your argument doesn't discuss, compare, or even mention limitations faced by all the traditional mesh-based approaches.
3DGS, I'm not selling the thing just making it clear, is able at its current advancement of the method: to rasterize so efficiently that a millions of splats scene can render at 60 FPS on a mid-range GPU. (As t least it does on my laptop).
All that with the most accurate representation of lighting, reflection. Of whatever the camera was able capture really. Novel inference is just an approximation it doesn't invent anything unless some generative ML is plugged in, faked in, plastered all over so that the word Ai gets mentioned.
I don't think that's me..I think you are confusing the method and parts of the method. Gaussian Splatting is not a technique of generating novel views off some captured data.
Here is the situation: Most click bait articles or even GitHub repos will splash that aspect as if gaussians splatting is about generating the novel views. I should read the paper again but it isn't what I see in the discovery.
But still, let's say it is. On that front it may not outperforms nerf I don't know, it may still be the state of that art, but that's very slow, almost impractical for most workstations, and, doesn't outperform 3DGS on about everything other front.
Your argument that it is not close to how reality is, totally irrelevant again. Even contrary to what CG in general has demonstrated many times. We don't need the concept, what is captures, how things are represented, or how things are displayed to match reality more closely. That's usually the best way to fail, to attempt to compute reality as we believe it to be. Some would even argue with you: what reality are you talking about. We still don't have a clue what reality is.
All we know is that we seem to perceive things a certain way. Our brain may play a movie in there based on that. .it doesn't matter what's there. perception, then tricking the eyes or our neurons is all we have to focus on to make reconstruction valid.
But it's funny, actually the gaussians are based on optical functions. The blending of multiple layers of light wave is also a natural phenomenon. For what we know.
Anyhow, there is a lot of confusion out there about gaussian splats. I suspect not many people understand this tech but many are talking loud about it, confusing everyone else.
I hope you don't see my response as for sake of arguments, your reply was a good read, elegant, with a tone of authority on the question but I invite you to check 3DGS again (yourself not in the news).
Edit: I have no considered the voxel method you've shared. So not claiming gaussian splatting is superior to that, will check the claims though that wouldn't be the first..
> I'm unsure whether it is me, but one of us is confused about the representation of a 2D image with a 3D scene. it's absolutely correct that a digital (2d) image is a grid of pixels. We can call it a soup if you want. An audio file, our text document are soups too.
No, I don't agree that images or audio files or text documents are soups, they're ordered grids or lists of equidistant samples. To be clear, I didn't make up the description triangle soup, the authors did, it's right there in the article.
> A 3D scene (the digital representation) is structure that can't be reduced to a simple grid. At least it better not to or it wouldn't look great from almost all angles.
Yes it can, it's just a matter of resolution. Gaussian splats, Nerfs, and other similar techniques aim to represent the radiance field that the input images are samples of. The radiance field, like the EM field or most other fields, can be quantized to grids, just like how we represent 2d samples of scenes with grids of pixels.
> Gaussian Splatting is a technique designed to tackle what seemed like impossible or at least prove extremely challenging with 3D scene reconstruction.
Gaussian splatting is not used for scene reconstruction in the general sense, it's used for novel view synthesis. It doesn't claim to reconstruct the 3d scene, it only tries to make a estimation of the function that takes a position and direction and returns a color. I think the original Nerf paper does a good job explaining what the radiance field is, and how using images of a scene to estimate it works. 3dgs is a more efficient and intuitive way of doing the same thing.
> I haven't meant anybody who would look at a gaussian splatting reconstruction of a scene, and claim another method would look better. Or even could look better. Maybe some day, but as of 2025 there isn't.
Like I mentioned previosly, they look good and sometimes that's all you need, but as a representation of a 3d scene they're chaotic and not very elegant.
> Imo gaussian splats do not form a voxel.
No they don't.
> Your argument doesn't discuss, compare, or even mention limitations faced by all the traditional mesh-based approaches.
I didn't argue that there are any mesh based methods that look better at the moment.
> All that with the most accurate representation of lighting, reflection. Of whatever the camera was able capture really. Novel inference is just an approximation it doesn't invent anything unless some generative ML is plugged in, faked in, plastered all over so that the word Ai gets mentioned.
3dgs comes up with fake representations of reflections, it pretends that reflective surfaces aren't there and puts splats representing the reflections behind where the reflective surface should be. It does this because it has no concept of scene geometry, all it knows and cares about is optimizing the splats' positions and color so they look like the input images when rendered from the input images' positions.
> I don't think that's me..I think you are confusing the method and parts of the method. Gaussian Splatting is not a technique of generating novel views off some captured data.
Yes, it literally is. From the original paper from 2023 [0]:
We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (≥ 30 fps) novel-view synthesis at 1080p resolution.
Additional Key Words and Phrases: novel view synthesis, radiance fields, 3D gaussians, real-time rendering
Our goal is to optimize a scene representation that allows high quality novel view synthesis, starting from a sparse set of (SfM) points without normals.
Those SfM (Structure from Motion) points are from a previous step, often COLMAP, where the input images are used to find the camera poses and a sparse point cloud from features in the input images, the captured data. The images are samples of the scene's radiance field.
> Some would even argue with you: what reality are you talking about. We still don't have a clue what reality is.
It isn't made up of disjoint triangles, I think most people agree with that. And without getting philosophical or diving into atoms or quantum fields, most of reality can be represented as continuous volumes or surfaces at a macro scale.
> All we know is that we seem to perceive things a certain way. Our brain may play a movie in there based on that. .it doesn't matter what's there. perception, then tricking the eyes or our neurons is all we have to focus on to make reconstruction valid.
Yes, if all we care about is novel view synthesis, which is a valid use case.
> Anyhow, there is a lot of confusion out there about gaussian splats. I suspect not many people understand this tech but many are talking loud about it, confusing everyone else.
It's really not that complicated, the original and follow up papers are easy to follow.
This seems like the natural next step after Gaussian splatting. After all, triangles are pretty much the most "native" rendering that GPUs can do. And as long as you figure out a way to make it differentiable (e.g. with their windowing function), it should be possible to just throw your triangles into a big optimizer.
Gaussian splatting models the scene as a bunch of normal distributions (fuzzy squished spheres) instead of triangles, then renders those with billboarded triangles. It has advantages (simpler representation, easy to automatically capture from a scan) and disadvantages (not what the hardware is designed for, not watertight). The biggest disadvantage is that most graphics techniques need to be reinvented for it, and it's not clear what the full list of advantages and disadvantages will be until people have done all of those. But that big disadvantage is also a great reason to make tons of papers.
They don't create contiguous surfaces, and GPUs are optimized to deal with sets of triangles that share vertices (a vertex typically being shared by four to six triangles), rather than not shared at all as with this.
"Watertight" is a actially a stronger criterion, which requires not only a contigous surface, but one which encloses a volume without any gaps, but "not watertight" suffices for this.
AFAIK Gaussian Splatting is somehow connected to NeRFs (neural radiance fields), so job of turning multiple 2D images into 3D scene. Actually tried doing something like this recently for drone navigation (using older point cloud methods) but no luck so far.
Can anyone who read this suggest something to use to scan room geometry using camera only in real-time (with access to beefy NVIDIA computer if needed) for drone navigation purposes?
I get the impression the goal is to save 3D environments with baked lighting without having to run raytracing, at a level above explicitly defined meshes with faces covered by 2D textures, which can't represent fog, translucency, reflection glints, etc without a separate lighting pass. Basically trying to get raytracing without doing raytracing.
Autoencoders should output these kinds of splats instead of pixel outputs and likely obtain better representations of the world at the bottleneck. These features can be used for downstream tasks.
Can someone explain what a splat is? I did graphics programming 25 years ago, but haven't touched it since. I don't think I've ever heard this word before.
A splat is basically a point in a point cloud. But instead of a gaussian splat being just a point it isn’t infinitesimally small but like a 3d gaussian where the point represent the mean. It also has color and opacity. You can also stretch it like an ellipsoid instead of having it have perfect radial symmetry.
And in case it helps further in the context of the article: traditional rendering pipelines for games don't render fuzzy Gaussian points, but triangles instead.
Having the model trained on how to construct triangles (rather than blobbly points) means that we're closer to a "take photos of a scene, process them automatically, and walk around them in a game engine" style pipeline.
A triangle by definition is guaranteed to be co-planer; three vertices must describe a single flat plane. This means every triangle has a single normal vector across it, which is useful for calculating angles to lighting or the camera.
It's also very easy to interpolate points on the surface of a triangle, which is good for texture mapping (and many other things).
It's also easy to work out if a line or volume intersects a triangle or not.
Because they're the simplest possible representation of a surface in 3D, the individual calculations per triangle are small (and more parallelisable as a result).
Triangles are the simplest polygons, and simple is good for speed and correctness.
Older GPUs natively supported quadrilaterals (four sided polygons), but these have fundamental problems because they're typically specified using the vertices at the four corners... but these may not be co-planar! Similarly, interpolating texture coordinates smoothly across a quad is more complicated than with triangles.
Similarly, older GPUs had good support for "double-sided" polygons where both sides were rendered. It turned out that 99% of the time you only want one side, because you can only see the outside of a solid object. Rendering the inside back-face is a pointless waste of computer power. This actually simplified rendering algorithms by removing some conditionals in the mathematics.
Eventually, support for anything but single-sided triangles was in practice emulated with a bunch of triangles anyway, so these days we just stopped pretending and use only triangles.
As an aside, a few early 90s games did experiment with spheroid sprites to approximate 3D rendering, including the DOS game Ecstatica [1] and the (unfortunately named) SNES/Genesis game Ballz 3D [2]
Yes, using triangles simplifies a lot of math, and GPUs were created to be really good at doing the math related to triangles rasterization (affine transformations).
I always assumed "gaussian splatting" was a reference to old school texture splatting, where textures are alpha-blended together. AFAIK the graphcis terminology of splats as objects (in addition to splatting as a operation) is new.
Practically, what differentiateS a splat from standard photogrammetry is that it can capture things like reflections, transparency and skies. A standard photogram of (for example) a mirror would confuse the reflection in the mirror for a space behind the mirror. A photogram of a sheet of glass would likewise suffer.
The problem is that any tool or process that converts splats into regular geometry produces plain old geometry and RGB textures, thus loosing its advantage. For this reason splats are (in my opinion) a tool in search of an application. Doubtless some here will disagree.
I've never been quite clear on how Splats encode specular (directional) effects. Are they made to only be visible from a narrow field of view (so you see a different splat for different view angles?) or do they encode the specular stuff internally somehow?
This is a good question. As I understand it, the only material parameters a splat can recognize are color and transparency. Therefore the first of your two options would be the correct one.
You can use spherical harmonics to encode a few coefficients in addition to the base RGB for each splat such that the rendertime view direction can be used to compute an output RGB. A "reflection" in 3DGS isn't a light ray being traced off the surface, but instead a way of saying "when viewed from this angle, the splat may take an object's base color, while from that angle, the splat may be white because the input image had glare"
This ends up being very effective with interpolation between known viewpoints, and hit-or-miss extrapolation beyond known viewpoints.
Because you have source imagery and colors (and therefore specular and reflective details) from different angles you can add a view angle and location based component to the material/color function; so the material is not just f(point in 3d space) it’s f(pt, view loc, view direction). That’s made differentiable and so you get viewpoint dependent colors for ‘free’.
To add to the rest of the replies: color comes from spherical harmonics, which I'm sure you came across them (used traditionally for diffuse light or shadows, SuperTuxKart uses them)
I wonder how hard it would be to implement an extra processing step, to turn this to a more 'stylized low poly' look. Basically, the triangle count would be drastically smaller, but the topology would have to be crisp.
The 3d analogue of the triangle then I think you're referring to is called a tetrahedron, one classic algorithm for creating 3d surface representations of volume data is called "marching tetrahedrons" (it's a more correct and was at the time a patent free variation of the marching cubes algorithm).
A pyramid is unnecessarily bound, a triangle performs better if it is free flowing. I understand that this performs better because there is less IO but slightly more processing. IO is the biggest cost when it comes to GPUs.
This figure is sort of an overclaim imho. If you look inside the paper, the reported figure is 97 FPS actually (vs 135 FPS for 3DGS on their device). This 2400FPS they advertise is for a degraded version that completely ignores the transparency... but the transprency is both what makes these representation support interesting volumetric effects and what makes rendering challenging (because it requires sorting things). Drawing 1M triangles at 2400FPS on their hardware is probably just quite normal.
author = {Held, Jan and Vandeghen, Renaud and Deliege, Adrien and Hamdi, Abdullah and Cioppa, Anthony and Giancola, Silvio and Vedaldi, Andrea and Ghanem, Bernard and Tagliasacchi, Andrea and Van Droogenbroeck, Marc},
while on arxiv and the top of the page
Jan Held, Renaud Vandeghen, Adrien Deliege, Abdullah Hamdi, Silvio Giancola, Anthony Cioppa, Andrea Vedaldi, Bernard Ghanem, Andrea Tagliasacchi, Marc Van Droogenbroeck
it's the same list of authors, written in different formats
the first one is SURNAME, NAME separated by "and"
the second one is NAME SURNAME separated by commas
The second one is easier to read by humans, but the first one makes it clearer what is the surname (which would be ambiguous otherwise, when there are composite names). But then again, the first format breaks when someone has "and" in their name, which is not unheard of.
Why do they use "and"? Why not use an unambiguous joining token like `/`? This just feels like an abuse of informal language to produce fundamentally formal data.
As it stands, it certainly does not resemble readable or parseable english.
The person who designed it was solving primarily for lexical sorting of the author field, thought maybe having more than two authors was an edge case, and wanted the two author case to be a logical extension of the single author one?
When I first read "triangle splatting," I assumed Gaussians were just replaced by triangles, but triangles being aligned with geometry changes everything. Looking forward to seeing this in action in traditional rendering pipelines.
It's already noticeable that there doesn't seem to be one fits all approach. Volumetric feathered features like clouds will not profit much from triangle representation vs high visual frequency features.
There are various avenues for speeding up rendering and improving 3d performance of 3DGS.
it's surely a very interesting research space to watch.
https://arxiv.org/pdf/2410.20593
https://speedysplat.github.io/
another venue is increasing the complexity of the gradient function like applying Gabor filters
https://arxiv.org/abs/2504.11003
some many ways to adapt and extend on the 3dgs principles.
https://github.com/user-attachments/assets/6008d5ee-c539-451...
(or https://github.com/datoviz/data/blob/main/gallery/showcase/s...)
I'll add it to the official gallery soon, with proper credits.
[1] https://datoviz.org/
Although, the normals look pretty good in their example images, maybe you can get good geometry from this using some post processing? But then is a triangle soup really the best way of doing that? My impression is that this is chosen specifically to get a final representation that is efficient to render on GPUs. I haven't done any graphics programming in years, but I thought you'd want to keep the number of draw calls down, do you need to cluster these triangles into fewer draw calls?
Is there any work being done to optimize a volumetric representation of scenes and from that create a set of surfaces with realistic looking shaders or similar? I know one of the big benefits of these splatting techniques is that it captures reflections, opacity, anisotropicity, etc, so "old school" photogrammetry with marching cubes and textured meshes have a hard time competing with the visual quality.
GPUs draw can draw 10,000's of vertices per draw call, whether they are connected together into logical objects or are "triangle soup" like this. There is some benefit to having triangles connected together so they can "share" a vertex, but not as much as you might think. Since GPUs are massively parallel, it does not matter much where on the screen or where in the buffer your data is.
> Is there any work being done to optimize a volumetric representation of scenes and from that create a set of surfaces with realistic looking shaders or similar?
This is basically where the field was going until nerfs and splats. But then nerfs and splats were such HUGE steps in fidelity, it inspired a ton of new research towards it, and I think rightfully so! Truth is that reality is really messy, so trying to reconstruct logically separated meshes for everything you see is a very hard way to try to recreate reality. Nerfs and splats recreate reality much easier.
Gaussian Splatting radically changed the approach to photogrammetry. Prior approaches to generate surface models, and mapping the captures to materials that a renderer would more or less rasterize with physically accuracy were hitting the ceiling of the technique.
NerF was also a revolution but is very compute intensive.
Even a browser, a mid range GPU, can render millions of splats at 60 frames per seconds. That's how fast it goes and less than a million dense scene can already be totally bluf the eye in most possible angles.
Splatting is the most advanced, promising and already delivered on the promise technique for photogrammetry. The limit is that can't do as much in term of modification to point clouds vs surface with great PBR attributes.
If you take these triangles, make them share vertices, and order them in a certain way, you have a mesh. You can then combine some of them into larger flat surfaces when that makes sense, draw thousands of them in one draw call, calculate intersections, volumes, physics, LODs, use textures with image compression instead of millions of colored objects, etc with them. Splatting is one way of answering the question "how do we reproduce these images in a way that lets us generate novel views of the same scene", not "what is the best representation of this 3D scene".
The aim is to find the light field that describes the scene, and if you have solid objects that function can be described on the surface of those objects. Seems like a much more elegant end result than a cloud of separate objects, no matter what shape they have, since that's much closer to how reality works. Obviously we need to handle volumetrics and translucency as well, but if we model the real surfaces as virtual surfaces I think things like reflections and shadow removal will be easier. At least gaussian splats have a hard time with reflections, they look good from some viewing angles, but the reflections are often handled as geometry [1].
I'm not arguing that it doesn't look good or that it doesn't serve a purpose, sometimes a photorealistic novel view of a real scene is all you want. But I still don't think it's the best representation of scenes.
[0] https://svraster.github.io/
[1] https://www.youtube.com/watch?v=yq6gtdpLUCo
It made so much sense to me: voxels with view dependent color, using eg. spherical gaussians.
I don't know how it compares to newer techniques, probably badly since nobody seems to be talking about it.
https://svraster.github.io/images/teaser.jpg
A 3D scene (the digital representation) is structure that can't be reduced to a simple grid. At least it better not to or it wouldn't look great from almost all angles.
Back to the splats...
Gaussian Splatting is a technique designed to tackle what seemed like impossible or at least prove extremely challenging with 3D scene reconstruction. The authors took a radically different approach and demonstrated feasibility. I haven't meant anybody who would look at a gaussian splatting reconstruction of a scene, and claim another method would look better. Or even could look better. Maybe some day, but as of 2025 there isn't.
On the voxel definition. I don't see vowel as encompassing any (as in all) 3d structure representation. I mean by that voxel is a definition.. A good one. But not every representation fit into that definition so we better watch out with what we infer about anything that doesn't fit in.
Imo gaussian splats do not form a voxel. But, let's say they do. So what? No idea how that's relevant to my point which is that gaussian splatting (voxel or not voxel) is a superior technology to any other for 3d reconstructions to date. I even caveated the cases where this method would be totally unhelpful. As of now editing splats is barely a thing (Super splats is super, but all we may do is remove splats) Some software can fuse, even adjusts properties of the splats, areas of splats, but these editing solutions are at their infancy, so they just don't count.
Your point that triangles can be combined into larger flat surfaces and optimized for rendering is a valid, but it doesn't help with the fact that non gaussian splatting methods, including the much slower nerf approach are inferior in the quality (let's say fidelity) they all produce.
Your argument doesn't discuss, compare, or even mention limitations faced by all the traditional mesh-based approaches.
3DGS, I'm not selling the thing just making it clear, is able at its current advancement of the method: to rasterize so efficiently that a millions of splats scene can render at 60 FPS on a mid-range GPU. (As t least it does on my laptop).
All that with the most accurate representation of lighting, reflection. Of whatever the camera was able capture really. Novel inference is just an approximation it doesn't invent anything unless some generative ML is plugged in, faked in, plastered all over so that the word Ai gets mentioned.
I don't think that's me..I think you are confusing the method and parts of the method. Gaussian Splatting is not a technique of generating novel views off some captured data.
Here is the situation: Most click bait articles or even GitHub repos will splash that aspect as if gaussians splatting is about generating the novel views. I should read the paper again but it isn't what I see in the discovery.
But still, let's say it is. On that front it may not outperforms nerf I don't know, it may still be the state of that art, but that's very slow, almost impractical for most workstations, and, doesn't outperform 3DGS on about everything other front.
Your argument that it is not close to how reality is, totally irrelevant again. Even contrary to what CG in general has demonstrated many times. We don't need the concept, what is captures, how things are represented, or how things are displayed to match reality more closely. That's usually the best way to fail, to attempt to compute reality as we believe it to be. Some would even argue with you: what reality are you talking about. We still don't have a clue what reality is.
All we know is that we seem to perceive things a certain way. Our brain may play a movie in there based on that. .it doesn't matter what's there. perception, then tricking the eyes or our neurons is all we have to focus on to make reconstruction valid.
But it's funny, actually the gaussians are based on optical functions. The blending of multiple layers of light wave is also a natural phenomenon. For what we know.
Anyhow, there is a lot of confusion out there about gaussian splats. I suspect not many people understand this tech but many are talking loud about it, confusing everyone else.
I hope you don't see my response as for sake of arguments, your reply was a good read, elegant, with a tone of authority on the question but I invite you to check 3DGS again (yourself not in the news).
Edit: I have no considered the voxel method you've shared. So not claiming gaussian splatting is superior to that, will check the claims though that wouldn't be the first..
No, I don't agree that images or audio files or text documents are soups, they're ordered grids or lists of equidistant samples. To be clear, I didn't make up the description triangle soup, the authors did, it's right there in the article.
> A 3D scene (the digital representation) is structure that can't be reduced to a simple grid. At least it better not to or it wouldn't look great from almost all angles.
Yes it can, it's just a matter of resolution. Gaussian splats, Nerfs, and other similar techniques aim to represent the radiance field that the input images are samples of. The radiance field, like the EM field or most other fields, can be quantized to grids, just like how we represent 2d samples of scenes with grids of pixels.
> Gaussian Splatting is a technique designed to tackle what seemed like impossible or at least prove extremely challenging with 3D scene reconstruction.
Gaussian splatting is not used for scene reconstruction in the general sense, it's used for novel view synthesis. It doesn't claim to reconstruct the 3d scene, it only tries to make a estimation of the function that takes a position and direction and returns a color. I think the original Nerf paper does a good job explaining what the radiance field is, and how using images of a scene to estimate it works. 3dgs is a more efficient and intuitive way of doing the same thing.
> I haven't meant anybody who would look at a gaussian splatting reconstruction of a scene, and claim another method would look better. Or even could look better. Maybe some day, but as of 2025 there isn't.
Like I mentioned previosly, they look good and sometimes that's all you need, but as a representation of a 3d scene they're chaotic and not very elegant.
> Imo gaussian splats do not form a voxel.
No they don't.
> Your argument doesn't discuss, compare, or even mention limitations faced by all the traditional mesh-based approaches.
I didn't argue that there are any mesh based methods that look better at the moment.
> All that with the most accurate representation of lighting, reflection. Of whatever the camera was able capture really. Novel inference is just an approximation it doesn't invent anything unless some generative ML is plugged in, faked in, plastered all over so that the word Ai gets mentioned.
3dgs comes up with fake representations of reflections, it pretends that reflective surfaces aren't there and puts splats representing the reflections behind where the reflective surface should be. It does this because it has no concept of scene geometry, all it knows and cares about is optimizing the splats' positions and color so they look like the input images when rendered from the input images' positions.
> I don't think that's me..I think you are confusing the method and parts of the method. Gaussian Splatting is not a technique of generating novel views off some captured data.
Yes, it literally is. From the original paper from 2023 [0]:
Those SfM (Structure from Motion) points are from a previous step, often COLMAP, where the input images are used to find the camera poses and a sparse point cloud from features in the input images, the captured data. The images are samples of the scene's radiance field.> Some would even argue with you: what reality are you talking about. We still don't have a clue what reality is.
It isn't made up of disjoint triangles, I think most people agree with that. And without getting philosophical or diving into atoms or quantum fields, most of reality can be represented as continuous volumes or surfaces at a macro scale.
> All we know is that we seem to perceive things a certain way. Our brain may play a movie in there based on that. .it doesn't matter what's there. perception, then tricking the eyes or our neurons is all we have to focus on to make reconstruction valid.
Yes, if all we care about is novel view synthesis, which is a valid use case.
> Anyhow, there is a lot of confusion out there about gaussian splats. I suspect not many people understand this tech but many are talking loud about it, confusing everyone else.
It's really not that complicated, the original and follow up papers are easy to follow.
[0] https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
I'm more familiar with traditional 3D graphics, so this new wave of papers around gaussian splatting lies outside my wheelhouse.
"Watertight" is a actially a stronger criterion, which requires not only a contigous surface, but one which encloses a volume without any gaps, but "not watertight" suffices for this.
Can anyone who read this suggest something to use to scan room geometry using camera only in real-time (with access to beefy NVIDIA computer if needed) for drone navigation purposes?
Having the model trained on how to construct triangles (rather than blobbly points) means that we're closer to a "take photos of a scene, process them automatically, and walk around them in a game engine" style pipeline.
Are triangles cheaper for the rasterizer, antialiasing, or something similar?
A triangle by definition is guaranteed to be co-planer; three vertices must describe a single flat plane. This means every triangle has a single normal vector across it, which is useful for calculating angles to lighting or the camera.
It's also very easy to interpolate points on the surface of a triangle, which is good for texture mapping (and many other things).
It's also easy to work out if a line or volume intersects a triangle or not.
Because they're the simplest possible representation of a surface in 3D, the individual calculations per triangle are small (and more parallelisable as a result).
Older GPUs natively supported quadrilaterals (four sided polygons), but these have fundamental problems because they're typically specified using the vertices at the four corners... but these may not be co-planar! Similarly, interpolating texture coordinates smoothly across a quad is more complicated than with triangles.
Similarly, older GPUs had good support for "double-sided" polygons where both sides were rendered. It turned out that 99% of the time you only want one side, because you can only see the outside of a solid object. Rendering the inside back-face is a pointless waste of computer power. This actually simplified rendering algorithms by removing some conditionals in the mathematics.
Eventually, support for anything but single-sided triangles was in practice emulated with a bunch of triangles anyway, so these days we just stopped pretending and use only triangles.
[1] https://www.youtube.com/watch?v=nVNxnlgYOyk
[2] https://www.youtube.com/watch?v=JfhiGHM0AoE
Yes, using triangles simplifies a lot of math, and GPUs were created to be really good at doing the math related to triangles rasterization (affine transformations).
In fact, I belive that under the hood all 3d models are triangulated.
So a gaussian splat scene is not a pointcloud but rather a cloudcloud.
A good way of putting it.
The problem is that any tool or process that converts splats into regular geometry produces plain old geometry and RGB textures, thus loosing its advantage. For this reason splats are (in my opinion) a tool in search of an application. Doubtless some here will disagree.
This ends up being very effective with interpolation between known viewpoints, and hit-or-miss extrapolation beyond known viewpoints.
https://convexsplatting.github.io/
the seminal paper is still this one:
https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
Wouldn't this lead to the full 3D representation?
Go team triangles!
author = {Held, Jan and Vandeghen, Renaud and Deliege, Adrien and Hamdi, Abdullah and Cioppa, Anthony and Giancola, Silvio and Vedaldi, Andrea and Ghanem, Bernard and Tagliasacchi, Andrea and Van Droogenbroeck, Marc},
while on arxiv and the top of the page
Jan Held, Renaud Vandeghen, Adrien Deliege, Abdullah Hamdi, Silvio Giancola, Anthony Cioppa, Andrea Vedaldi, Bernard Ghanem, Andrea Tagliasacchi, Marc Van Droogenbroeck
the first one is SURNAME, NAME separated by "and"
the second one is NAME SURNAME separated by commas
The second one is easier to read by humans, but the first one makes it clearer what is the surname (which would be ambiguous otherwise, when there are composite names). But then again, the first format breaks when someone has "and" in their name, which is not unheard of.
As it stands, it certainly does not resemble readable or parseable english.
‘Better’ formats have been proposed but none have stuck nearly as well. It works, and there’s tooling for it.
https://bibtex.eu/fields/author/
Names are ordered from general/family to specific/given.
author = {surname 1, first name 1 and surname 2, first name 2 and ...}
"and" is the separator.