This is crazy impressive, and the fact they have the whole thing running with a PBR texturing pipeline is really cool.
That being said, I wonder if the use of signed distance fields (SDFs) results in bad topology.
I saw a paper earlier this week that was recently released that seems to build "game-ready" topology --- stuff that might actually be riggable for animation.
https://github.com/buaacyw/MeshAnything
I can't wait for this to become usable. I love VR but the content generation is just sooooo labour intensive. Help creating 3D models would help so much and be the #1 enabler for the metaverse IMO.
VR is especially unforgiving of "fake" detailing, you need as much detail as possible in the actual geometry to really sell it. That's the opposite how these models currently work, they output goopy low-res geometry and approximate most of the detailing with textures, which would be immediately obvious with stereoscopic depth perception.
They seem to admit as much in Table 1 which indicates this model is not capable of "clean topology". Somewhat annoyingly, they do not discuss topology anywhere else in the paper (at least, I could not find the word "topology" via Ctrl+F).
Credit where it's due, unlike most of these papers they do at least show some of their models sans textures on page 11, so you can see how undefined the actual geometry is (e.g. none of the characters have eyes until they are painted on).
Such a silly argument. Fixing topology is a nearly solved problem in geometry processing. (Or just start with a good topology and 'paste' a texture onto it like they develop techniques for here.)
depends what you're talking about and what your criteria is. In gamedev, studios typically use a retopology tool like topogun (https://www.topogun.com/) to aid in the creation of efficient topologies, but it's still a manual task, as different topologies have different tradeoffs in terms of poly count, texture detail, options for how the model deforms when animated, etc. For example you may know that you're working on a model of a player character in a 3rd person game where the camera is typically behind you, so you want to spend more of your budget on the _back_ of the model than the _front_, because the player is typically looking at their character's back. If your criteria is "find the minimum number of polygons", sure, it's solved. That's just one of many different goals, and not the goal that is typically used by gamedev, which I assume to be a primary audience of this research.
It's an essential skill for reading scientific papers to notice what isn't there. It's as important as what is there.
In my field, analog IC design, if we face a wall, we often do some literature review with a colleague and more often than not, results are not relevant for commercial application. Forget about Monte Carlo, sometimes even there aren't full PVT corners.
I tried all the recent wave of text/image to 3D model services, some touting 100 MM+ valuations and tens of millions raised and found them all to produce unusable garbage.
I have too, and you’re quite right. Also the various 2D-to-3D face generators are mostly awful. I’ve done a deep dive on that and nearly all of them seem to only create slight perturbations on some base model, regardless of the input.
I’m puzzled by the poor texture quality in these. The colours are just bad - it looks like the textures are blown out (the detail at the bright end clip into white) and much too contrasty ( the turkey does that transition from red to white via a band of yellow). I wonder why that is - was the training data just done on the cheap?
In the comparison between the models only Rodin seems to produce clean topology, hopefully in the future we will see a model with the strength of both, hopefully from Meta as Rodin is a commercial model.
can somebody please please integrate SAM with 3d primitive RAGging? This is the golden chalice solution as a 3d modeler, having one of those "blobs" generated by Luma and likes aren't very useful
I think what they did here was go text prompt -> generate multiple 2d views -> reconstruction network to go multiple 2d images to 3d representation -> mesh extraction from 3d representation.
That's a long way of saying, no, I don't think that this introduces a component that specifically goes 2d -> 3d from a single 2d image.
That being said, I wonder if the use of signed distance fields (SDFs) results in bad topology.
I saw a paper earlier this week that was recently released that seems to build "game-ready" topology --- stuff that might actually be riggable for animation. https://github.com/buaacyw/MeshAnything
Everyone I see text to 3D, it’s ALWAYS textured. That is the obvious give-away that it is still garbage.
Show me text to wireframe that looks good and I’ll get excited.
Or, just throw a PS1 filter on top and make some retro games
The wire frame is going to be unrecognizable-bad.
Still a ways to go.
Expectation vs. reality: https://i.imgur.com/82R5DAc.png
It absolutely does. But great, let's look forward to Printables being ruined by off-model nonsense.
In my field, analog IC design, if we face a wall, we often do some literature review with a colleague and more often than not, results are not relevant for commercial application. Forget about Monte Carlo, sometimes even there aren't full PVT corners.
https://arstechnica.com/information-technology/2024/06/ridic...
Question: What is the current state of the art commercially available product in that niche?
But it's using for 3D gen, a model that is more flexible:
https://assetgen.github.io/
It can be conditioned on text or image.
he still needs a moat with its own ecosystem like the iphone
That's a long way of saying, no, I don't think that this introduces a component that specifically goes 2d -> 3d from a single 2d image.
[0] https://hyperhuman.deemos.com/rodin