Perceptually lossless (talking head) video compression at 22kbit/s

(mlumiste.com)

225 points | by skandium 194 days ago

20 comments

gwd 194 days ago
This reminds me of a scene in "A Fire Upon the Deep" (1992) where they're on a video call with someone on another spaceship; but something seems a bit "off". Then someone notices that the actual bitrate they're getting from the other vessel is tiny -- far lower than they should be getting given the conditions -- and so most of what they're seeing on their own screens isn't actual video feed, but their local computer's reconstruction.
[-]
- Rebelgecko 194 days ago
  Was that the same book that had the concept of (paraphrasing using modern terminology) doing interstellar communications by sending back and forth LLMs trained on the people who wanted to talk, prompted to try and get a good business deal or whatever?
  [-]
  - alex-robbins 194 days ago
    That happened in Redemption Ark by Alastair Reynolds (2002), though of course the idea may also have been used before or since.
  - DoneWithAllThat 193 days ago
    This idea was also used in The Algebraist.
- miohtama 194 days ago
  And also it was a deep fake.
  BTW This is the best sci-fi book ever.
  [-]
  - Retric 194 days ago
    Might be better if you like space opera style really soft science fiction. I really didn’t enjoy it.
    [-]
    - mercutio2 194 days ago
      Fascinating. Vinge is about the furthest from “soft” sci-fi I can think of. We must have very different definitions of what makes something soft.
      It’s certainly true that Vinge doesn’t spend much time on the engineering details, but I find him unusually clear on “imagine if we had this kind of impossible-now technology, but the rest of what we know about physics remained, how would people behave?”
      He was, after all, a physics professor.
      Rainbow’s End is much clearer on this than his distant future stuff, of course.
      [-]
      - com2kid 194 days ago
        > Fascinating. Vinge is about the furthest from “soft” sci-fi I can think of. We must have very different definitions of what makes something soft.
        That award goes to Greg Egan who has full list of citations on his website for each of his novels, as well as a list of mathematicians and physicists he requested help from.
        If you want to read books that occasionally delve into pages of equations, Greg Egan is the author for you! (Seriously though, really good books, and the implications of his "what-ifs" are pretty damn cool)
        [-]
        loxias 194 days ago
        Seconding this, Greg Egan is one of the best of all time.
        The short stories "Luminous" and "Dark Integers", the novels "Diaspora" and "Schild's Ladder". So good.
        qntm (another author) hits somewhat similarly.
        exe34 194 days ago
        i might have to have another go at dichronauts. that one broke my mind a few pages in and I had to stop.
      - opo 194 days ago
        >He was, after all, a physics professor.
        Actually, he was a mathematics and computer science teacher at San Diego State University.
        https://en.wikipedia.org/wiki/Vernor_Vinge
        [-]
        mercutio2 192 days ago
        You’re right, I was wrong!
      - Retric 194 days ago
        Soft vs hard is based on how closely the world tracks with modern physics/science. As such even just FTL is soft, let alone everything else that doesn’t fit.
        [-]
        jrussino 194 days ago
        > Soft vs hard is based on how closely the world tracks with modern physics/science
        Maybe it's not productive to quibble about definitions like this, but FWIW I don't agree with this criteria. I would argue Greg Egan's work, for example, is just about the "hardest" sci-fi there is, and yet much of that work takes place in universes that are entirely unlike our own.
        Personally, I think what makes for "hard" sci-fi is that the rules of the universe are well-laid-out and consistent, and that the story springs (at least in some significant part) out of the consequences of those rules. That may mean a story set in the "future", where we have new technology or discover new physics, or "alternate universe" sci-fi like Egan's.
        [-]
        Retric 194 days ago
        If changing the laws of the universe is fine, then nothing gets excluded even Harry Potter. It’s one of those definitions that allows anything and ultimately only feels fine because you’re adding some other criteria.
        In defense of hard science fiction, it’s a meaningful category to talk about even if it’s not something you personally care about. People often want to weaken it but that just opens a door for a new category say “scientific science fiction” and we are back to square one.
        Asking questions like what does AGI look like when they can’t just magically solve all issues can be fun. Hand waving the singularly as some religious event can also make interesting stories but so is considering how chaos theory limits what computation can actually achieve.
        [-]
        com2kid 194 days ago
        > If changing the laws of the universe is fine, then nothing gets excluded even Harry Potter.
        Greg Egan's law changes are on the level of "I consulted with a bunch of theoretical physics professors and asked them what the implication of tweaking this one fundamental constant would be, then I spent years meticulously crafting a world that takes into account those implications, and I had others physics professors check over my work to make sure it was within the bounds of actuality, and then I wrote a story about characters in this new world."
        > Asking questions like what does AGI look like when they can’t just magically solve all issues can be fun.
        Greg Egan actually has a great book about this! Permutation City. CPU cycles aren't unlimited, and there are tons of ethical problems being confronted with the entire "simulate a person" thing.
        burnished 194 days ago
        Harry Potter isnt typically considered scifi because it doesn't critically examine its own premise and because the rules of the universe are yoked to the needs of the plot.
        [-]
        Retric 194 days ago
        > the rules of the universe are yoked to the needs of the plot
        It’s common for the rules of the universe to be adapted to fit the plot of random Star Trek episodes.
        HP is not considered science fiction because of the trappings of the story. People use spells and enchanted objects for telekinesis, teleportation, and time travel not psychic abilities and technology to do the same things.
        > critically examine its own premise
        A great deal of science fiction doesn’t do that while plenty of fantasy does.
        exe34 194 days ago
        > If changing the laws of the universe is fine, then nothing gets excluded even Harry Potter
        the laws of the universe in Harry Potter are so fickle and ever changing with the plot line that to me, it can only be considered soft. compare with Egan who takes a given cosmology and then works 100% within that world. that's hard.
        [-]
        Retric 194 days ago
        That’s not a question about the underlying rules of a fictional work but your perception of how they are created. It’s possible to have a completely well defined fantasy setting with exact rules without the reader being aware of what those rules are or even knowing it’s using well defined rules.
        Consider The Martian, early versions where posted online and the author changed what resources the character had to work with at the beginning. So what feels like a creative solution to limited resources was really giving the character exactly what they needed after a solution was found. Only examining a work we can’t distinguish ‘soft’ physics updated as the plot demands from a story based around fixed rules.
        [-]
        exe34 193 days ago
        You seem to confuse the creative process with the final product. The rules can change during the creative process. It's the final product that I judge as a reader - I won't bother going over the inconsistencies in Harry Potter here, it's been done ad nauseam elsewhere. The physics doesn't change over the course of the story of the Martian.
        [-]
        Retric 193 days ago
        What you view as inconsistencies are based around assumptions for how the underlying rules work and what happened that don’t necessarily apply.
        One of the more interesting science fiction short stories I read seemed to have very inconsistent time travel, but on closer reading you find the two different methods involved had two different sets of rules. It’s easy to say something is inconsistent, but any possible story has a corresponding set of rules that work.
        It’s rather similar to considering what characters may have been lying in a story.
        [-]
        exe34 193 days ago
        When it comes to Harry Potter,
        > but on closer reading
        does not make the inconsistencies go away, but they multiply.
        [-]
        Retric 193 days ago
        > they multiply
        Again based on specific assumptions. The universe of possibilities includes very strange places.
        [-]
        exe34 193 days ago
        > universe of possibilities includes very strange places.
        that's an unjustified assumption.
        [-]
        Retric 193 days ago
        Why? I mean if we and the characters are unsure what the underlying physics is then the possibilities are literally endless.
        If nothing else pure randomness is a unsatisfying possibility as is a full branching search of every possible state for a universe.
        jamiek88 194 days ago
        That is simply your personal definition, right?
        You don’t claim to be definitive?
        [-]
        Retric 194 days ago
        It’s a classic definition. Soft/hard science fiction has two meanings either the topic is focused on hard sciences (physics) vs soft sciences (sociology) or “It can also refer to science fiction which prioritizes human emotions over scientific accuracy or plausibility.[1]”
        So it’s not universal but is an accepted definition that any deviation from the possible or probable (for example, including faster-than-light travel or paranormal powers) to be a mark of "softness."
        https://en.wikipedia.org/wiki/Soft_science_fiction
        Popular science fiction is generally extremely soft, but occasionally you get stuff like The Cold Equations where the plot is driven by real world constraints. Even then it included FTL so a purest would call it soft.
    - gwd 194 days ago
      A friend of mine and I both read it about the same time and discussed it afterwards. I thought it was pretty good, he thought it was not that great. What we agreed on was that in spite of there being many fantastic aspects to the book, on the whole it failed to be an awesome novel.
      Definitely worth giving it a try if you're a programmer, just for the fact that it's written by another programmer: the opening scene where they find a bunch of rules written down and just follow them reminds me of ACPI; the discussion of public-key cryptography and shipping drives full of one-time-pad around the galaxy; the "compression scheme" with the video.
      [-]
      - Boxxed 194 days ago
        I agree that it was good but not particularly great. A Deepness in the Sky, however, is fantastic -- similar in many aspects but just flat out better all around.
    - lern_too_spel 194 days ago
      The softness is deceptive. Hard concepts about communication and different types of brains are essential to the plot.
    - aaronblohowiak 194 days ago
      It uses technological differences as key plot and setting components not just space as sea, so it is sci fi but it is improbable in many ways so yea “soft” sci fi or more speculative fiction
  - jf 194 days ago
    I beg to differ. A Deepness In The Sky is the best sci-fi book ever.
    [-]
    - space_fountain 194 days ago
      I think I agree both books were good and "A Deepness In The Sky" was better, but I would warn everyone that I thought both books used dramatic irony (showing us that characters were evil while hiding this from main characters) to hold attention to a degree that I kind of hated. And in "A Deepness In The Sky" sexual violence was used repeatedly to illustrate how evil the main characters were. I found it unnecessarily and a bit in poor taste.
      On the other hand I think both books developed ideas wonderfully and there are bits of them I keep coming back to, even if I'll probably never reread them
- _kb 193 days ago
  At least for audio, that dystopia is already shipping in end-user product: https://blog.webex.com/collaboration/hybrid-work/next-level-...
- janandonly 194 days ago
  I came here to reply just this exactly and found a fellow geek beat me to it. Indeed a brilliant book.
zbobet2012 194 days ago
These sorts of models pop here quite a bit, and they ignore fundamental facts of video codecs (video specific lossy compression technologies).
Traditional codecs have always focused on trade offs among encode complexity, decode complexity, and latency. Where complexity = compute. If every target device ran a 4090 at full power, we could go far below 22kbps with a traditional codec techniques for content like this. 22kbps isn't particularly impressive given these compute constraints.
This is my field, and trust me we (MPEG committees, AOM) look at "AI" based models, including GANs constantly. They don't yet look promising compared to traditional methods.
Oh and benchmarking against a video compression standard that's over twenty years old isn't doing a lot either for the plausibility of these methods.
[-]
- skandium 194 days ago
  This is my field as well, although I come from the neural network angle.
  Learned video codecs definitely do look promising: Microsoft's DCVC-FM (https://github.com/microsoft/DCVC) beats H.267 in BD-rate. Another benefit of the learned approach is being able to run on soon commodity NPUs, without special hardware accommodation requirements.
  In the CLIC challenge, hybrid codecs (traditional + learned components) are so far the best, so that has been a letdown for pure end to end learned codecs, agree. But something like H.267 is currently not cheap to run either.
  [-]
  - zbobet2012 194 days ago
    Winning in bd rate though isn't hard. You need to win in bd rate and have a hardware implementable, power efficient, cheap decoder.
    Agreed hybrid presents real opportunity.
  - AzzyHN 194 days ago
    Did you mean H.266? Or is there some secret H.267 that hasn't been agreed upon yet
- smokel 194 days ago
  Why so sour? This particular article doesn't seem to ignore a lot, it even references the Nvidia work that inspired it, as well as a recent benchmark.
  Someone was just having fun here, it's not as if they present it as a general codec.
- AzzyHN 194 days ago
  Really? Would there be a way to replicate this with currently available encoders? I'd like to try it
LeoPanthera 194 days ago
This is very impressive, but “perceptually lossless” isn’t a thing and doesn’t make sense. It means “lossy”.
[-]
- Bjartr 194 days ago
  It may sound like marketing wank, but it does a appear to be an established term of art in academia as far back as 1997 [1]
  It just means that a person can't readily distinguish between the compressed image and the uncompressed image. Usually because it takes some aspect(s) of the human visual system into account.
  [1] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C22&q=per...
- tatersolid 194 days ago
  I read “perceptually lossless” to be equivalent to “transparent”, a more common phrase used in the audio/video codec world. It’s the bitrate/quality at which some large fraction of human viewers can’t distinguish a losslessly-encoded sample and the lossy-encoded sample, for some large fraction of content (constants vary in research papers).
  As an example, crf=18 in libx264 is considered “perceptually lossless” for most video content.
- Ladsko 194 days ago
  Can you propose a better term for the concept then? Perceiving something as lossless is a real world metric that has a proper use case. "Perceptually lossless" does not try to imply that it is not lossy.
  [-]
  - ComplexSystems 194 days ago
    The term for this is "transparency." A codec is "transparent" if people can't tell the difference between the original and the compressed version.
    [-]
    - edflsafoiewq 194 days ago
      "Transparency" is a fairly annoying term for this in image/video because of the obvious polysemy.
    - unshavedyak 194 days ago
      So it would be `transparent lossy compression`? To this layman `perceptually lossless` sounds more clear, but i understand the issue with the name.
    - esperent 193 days ago
      I work in graphics. Calling this transparency would be a terrible idea and make a lot of discussions around compression of videos and images with actual transparency very confusing.
      Does the compression algorithm work well for transparency? Yes, it's effect on transparency is totally transparent! In fact the transparency is fully transparently compressed by our codec.
      Yeah, don't do this please. Perceptually lossless is a term I've heard lots of times before and companies developing codecs usually have a fairly strong technical basis for making the claim. As in, it's not like they just glance at the results and say "yep, looks good to me". Rather, they'll be looking and spectral curves and image diffs - probably also motion diffs for videos - and checking whether they the losses are small enough to be undetectable to human eyes.
- high_byte 194 days ago
  why not? if you change one pixel by one pixel brightness unit it is perceptually the same.
  for the record, I found liveportrait to be well within the uncanny valley. it looks great for ai generated avatars, but the difference is very perceptually noticeable on familiar faces. still it's great.
  [-]
  - LegionMammal978 194 days ago
    For one, it doesn't obey the transitive property like a truly lossless process should: unless it settles into a fixed point, a perceptually lossless copy of a copy of a copy, etc., will eventually become perceptually different. E.g., screenshot-of-screenshot chains, each of which visually resembles the previous one, but which altogether make the original content unreadable.
    [-]
    - pizza 193 days ago
      Perceptual closure under repeated iterations is just a stronger form of perceptual losslessness, then, after k generations instead of the usual k=1. What you’re describing is called generation loss, and there are in fact perceptually lossy image codecs that have essentially no generation loss; jpeg xl is one https://m.youtube.com/watch?v=FtSWpw7zNkI
  - codeflo 194 days ago
    GP is correct, that’s the definition of “lossy”. We don’t need to invent ever new marketing buzzwords for well-established technical concepts.
    [-]
    - AndrewDucker 194 days ago
      GP is incorrect.
      There is "Is identical", "looks identical" and "has lost sufficient detail to clearly not be the original." - being able to differentiate between these three states is useful.
      [-]
      - tialaramex 194 days ago
        Importantly the first one is parameterless, but the second and third are parameterized by the audience. For example humans don't see colour very well, some animals have much better colour gamut, while some can't distinguish colour at all.
        [-]
        travisjungroth 194 days ago
        Perceptually lossless (nature for dogs) video compression at 15bit/s.
      - Rygian 194 days ago
        Lossless means "is identical".
        The other two are variations of lossy.
        Calling one of them "perceptually lossless" is cheating, to the disadvantage of algorithms that honestly advertise themselves as lossy while still achieving "looks identical" compression.
        [-]
        protimewaster 194 days ago
        It's a well established term, though. It's been used in academic works for a long time (since at least 1970), and it's basically another term for the notion of "transparency" as it relates to data compression.
        TeMPOraL 194 days ago
        I honestly don't notice this anymore. Advertisers have been using such language since time immemorial, to the point it's pretty much a rule that an adjective with a qualifier means "not actually ${adjective}, but kind of like it in ${specific circumstances}". So "perceptually lossless" just means "not actually lossless, except you couldn't tell it from truly lossless just by looking".
    - protimewaster 194 days ago
      But this marketing term has been regularly used in academic papers for nearly 50 years (or probably more), so it seems like it should get a pass IMO.
      It's also used in the first paragraph of the Wikipedia article on the term "transparency" as it relates to data compression.
    - Dylan16807 194 days ago
      It is in no way the definition of lossy. It is a subset of lossy. Most lossy image/video compression has visible artifacting, putting it outside the subset.
- Brian_K_White 194 days ago
  It means what it already says for itself, and does not need correcting into incorrectness.
  "no perceived loss" is a perfectly internally consistent and sensible concept and is actually orthogonal to whether it's actually lossless or lossy.
  For instance an actually lossless block of data could be perceptually lossy if displayed the wrong way.
  In fact, even actual lossless data is always actually lossy, and only ever "perceptually lossless", and there is no such thing as actually lossless, because anything digital is always only a lossy approximation of anything analog. There is loss both at the ADC and at the DAC stage.
  If you want to criticize a term for being nonsense misleading dishonest bullshit, then I guess "lossless" is that term, since it never existed and never can exist.
  [-]
  - unshavedyak 194 days ago
    Similar to your points, i also expect `perceptually lossless` to be a valid term in the future with respect to AI. Ie i can imagine a compression which destroys detail, but on the opposite end it uses "AI" to reconstruct detail. Of course though, the AI is hallucinating the detail, so objectively it is lossy but perceptibly it is lossless because you cannot know which detail is incorrect if the ML is doing a good job.
    In that scenario it certainly would not be `transparent` ie visually without any lossy artifacts. But your perception of it would look lossless.
    The future is going to be weird.
- rowanG077 194 days ago
  Why don't you think it's a thing? A trivial example is audio. A ton of audio speakers can produce frequencies people cannot hear. If you have an unprocessed audio recording from a high end microphone one of the first compressions things you can do is clip of imperceptible frequencies. A form of compression.
- ranger_danger 194 days ago
  As there are several patents, published studies, IEEE papers and thousands of google results for the term, I think it's safe to say that many people do not agree with your interpretation of the term.
  "As a rule, strong feelings about issues do not emerge from deep understanding." -Sloman and Fernbach
- lifthrasiir 194 days ago
  It is definitely a thing given a good perceptual metric. The metric even doesn't have to be very accurate if the distortion is highly bounded, like only altering the lowermost bit. It is unfortunate that most commonly used distortion metrics like PSNR are not really that, though.
  [-]
  - rini17 194 days ago
    But that's mathematically impossible, to restore signal from extremely low bitrate stream with any highly bounded distortion. Perhaps only if you have highly restricted set of posible input, which online meetings aren't.
    [-]
    - lifthrasiir 194 days ago
      > Perhaps only if you have highly restricted set of posible input, which online meetings aren't.
      Are you sure? After all, you can effectively summarize meetings in a plain text which is extremely restricted in comparison to the original input. Guaranteed, exact manner of speech and motions and all subtleties should be also included to be fair, but that information is still far limited to fill the 20 kbps bandwidth.
      We need far more bandwidth only because we don't yet have an efficient way to reconstruct the input faithfully from such highly condensed information. Whenever we actually could, we ended up having a very efficient lossy algorithm that still preserves enough information for us human. Unless you are strictly talking about the lossless compression---which is however very irrelevant in this particular topic---, we should expect much more compression in the future even though that might not be feasible today.
      [-]
      - rini17 194 days ago
        Okay, did not know you measure distortion that way.
        [-]
        lifthrasiir 194 days ago
        If you wonder, that definition is pretty much universal in the lossy compression (e.g. rate-distortion optimization or RDO).
- _ZeD_ 194 days ago
  also are .mp3, yet they are hardly discernible from the originals
  [-]
  - bityard 194 days ago
    Ability to tell MP3 from the original source was always dependent on encoder quality, bitrate, and the source material. In the mid 2000's, I tried to encode all of my music as MP3. Most of it sounded just fine because pop/rock/alt/etc are busy and "noisy" by design. But some songs (particularly with few instruments, high dynamic range, and female vocals) were just awful no matter how high I cranked the bitrate. And I'm not even an "audiophile," whatever that means these days.
    No doubt encoders and the codecs themselves have improved vastly since then. It would be interesting to see if I could tell the difference in a double-blind test today.
    [-]
    - comboy 194 days ago
      These are always fun
      https://abx.digitalfeed.net/
      https://www.npr.org/sections/therecord/2015/06/02/411473508/...
      [-]
      - jmb99 194 days ago
        I find generic ABX tests not great, personally, because I generally don’t know what to be listening for. However, with songs I’ve listened to lossless my whole life, it’s much easier to spot encoding failures - an intuitive “wait, that cymbal crash sounded different” or “that multi-instrument harmonic should be cleaner/dirtier.”
        That being said, 320Kbps AAC encoded by Core Audio I’ve found to be pretty much transparent with anything I’ve thrown at it. Anything less than that (256Kbps AAC, 320Kbps MP3, etc) I can ABX sometimes, as long as I’m familiar with the source material, and usually only with quality headphones. Although no streaming services provide that, so I’m stuck with ALAC through Apple Music for streaming (which is more convenient than my old solution, which was transcoding and transferring to an iPod ~20k songs selected from ~90k in my library based on a variety of rules than never gave me the song I’m looking for). And really, ~900Kbps lossless is pretty easy to justify these days with 5G data speeds and generally much higher data transfer limits.
        The other downside to storing losing encodings these days is the fact that almost everyone uses Bluetooth for their listening, which is an additional lossy encoding. While 256Kbps AAC/320Kbps MP3 might be transparent in some cases, when it’s re-encoded it very rarely is (in my experience)
    - unshavedyak 194 days ago
      iirc there's "easy" (though i don't know them) tests to validate if the signal is lossless or not. When played over speakers for humans, at least.
      I always intend to figure out how that works, because i don't feel a lot of audiophiles are actually speaking truth in many cases lol. Still, i don't know - i can't remember my sources to figure it out for myself :/
  - Dwedit 194 days ago
    Lossy audio formats suddenly become very discernible once you subtract the left channel from the right channel. Try that with Lossless audio vs MP3, Vorbis, Opus, AAC, etc. You're listening to only the errors at that point.
  - rini17 194 days ago
    not at 22kbit :)
- rob74 194 days ago
  Yeah, all lossy compression could be called "perceptually lossless" if the perception is bad enough...
  [-]
  - bux93 194 days ago
    A family member of mine didn't see the point of 1080p. Turned out they needed cataract surgery and got fancy replacement lenses in their eyes. After that, they saw the point.
  - Dylan16807 194 days ago
    Needing to define "perception" is a much weaker criticism than "isn't a thing and doesn't make sense".
    It's easy enough to specify an average person looking very closely, or a 99th percentile person, or something like that, and show the statistics backing it up.
- k__ 194 days ago
  Is this the real-time discussion all over again?
Vecr 194 days ago
Fire Upon the Deep had more or less this. Story important, so I won't say more. That series in general had absolutely brutal bandwidth limitations.
MayeulC 194 days ago
I like how the saddle in the background moves with the reconstructed head; it probably works better with uncluttered backgrounds.
This is interesting tech, and the considerations in the introduction are particularly noteworthy. I never considered the possibility of animating 2D avatars with no 3D pipeline at all.
antiquark 194 days ago
Not quite lossless... look at the bicycle seat behind him. When he tilts his head, the seat moves with his hair.
[-]
- manmal 194 days ago
  His gaze also doesn’t quite match.
  [-]
  - hinkley 194 days ago
    Why is nobody noticing the eyes?? This is important!
    I feel like I’m taking crazy pills.
    [-]
    - olddustytrail 194 days ago
      Read the text underneath the image and you'll understand.
      [-]
      - hinkley 194 days ago
        No, I really don’t. He acknowledges it’s not in keeping with the title or the thesis and then just sort of waves it off.
        Smells like rationalization to me.
        [-]
        skandium 194 days ago
        Well, this isn't probably a problem with the model, but the source frame having wrong eye gaze. Besides, perceptually lossless need not be defined in a side-by-side comparison context. If you were only viewing the right hand side video, how could you tell the eye gaze is off? The point was more on that the movement looks natural, unlike almost all neural avatars up to this year.
        [-]
        manmal 194 days ago
        Your argumentation does make sense to me; but it also makes the term lossless pull a lot of weight. Lossless in video encoding is usually defined by zero difference between source and target.
- metaphor 194 days ago
  Very noticeable jitter in bicycle front tire too.
red0point 194 days ago
> But one overlooked use case of the technology is (talking head) video compression.
> On a spectrum of model architectures, it achieves higher compression efficiency at the cost of model complexity. Indeed, the full LivePortrait model has 130m parameters compared to DCVC’s 20 million. While that’s tiny compared to LLMs, it currently requires an Nvidia RTX 4090 to run it in real time (in addition to parameters, a large culprit is using expensive warping operations). That means deploying to edge runtimes such as Apple Neural Engine is still quite a ways ahead.
It’s very cool that this is possible, but the compression use case is indeed .. a bit far fetched. A insanely large model requiring the most expensive consumer GPU to run on both ends and at the same time being limited in bandwidth so much (22kbps) is a _very_ limited scenario.
[-]
- gambiting 194 days ago
  One cool use would be communication in space - where it's feasible that both sides would have access to high-end compute units but have a very limited bandwidth between each other.
  [-]
  - bliteben 194 days ago
    Wonder if its better than a single color channel hologram though
  - bityard 194 days ago
    Bandwidth is not the limitation in space comms, latency is.
    [-]
    - cogman10 194 days ago
      Underwater communications, on the other hand, could use this.
      Though, I somewhat doubt even 22kbps is available generally.
  - JamesLeonis 194 days ago
    Increasingly mobile networks are like this. There are all kinds of bandwidth issues, especially when customers are subject to metered pricing for data.
- loa_in_ 194 days ago
  Staying in contact with someone for hours on metered mobile internet connection comes to mind. Low bandwidth translates to low total data volume over time. If I could be video chatting on one of those free internet SIM cards that's a breakthrough.
- omh 194 days ago
  One use case might be if you have limited bandwidth, perhaps only a voice call, and want to join a video conference. I could imagine dialling in to a conference with a virtual face as an improvement over no video at all.
  [-]
  - helf 194 days ago
    [dead]
- jl6 194 days ago
  130m parameters isn’t insanely large, even for smartphone memory. The high GPU usage is a barrier at the moment, but I wouldn’t put it past Apple to have 4090-level GPU performance in an iPhone before 2030.
- loudmax 194 days ago
  The trade-off may not be worth it today, but the processing power we can expect in the coming years will make this accessible to ordinary consumers. When your laptop or phone or AR headset has the processing power to run these models, it will make more efficient use of limited bandwidth, even if more bandwidth is available. I don't think available bandwidth will scale at the same rate as processing power, but even if it does, the picture be that much more realistic.
hinkley 194 days ago
The second example shown is not perceptually lossless, unless you’re so far on the spectrum you won’t make eye contact even with a picture of a person. The reconstructed head doesn’t look in the same direction as the original.
However is does raise an interesting property in that if you are on the spectrum or have ADHD, you only need one headshot of yourself staring directly at the camera and then the capture software can stop you from looking at your taskbar or off into space.
[-]
- DCH3416 194 days ago
  > unless you’re so far on the spectrum you won’t make eye contact even with a picture of a person.
  I don't know. I think you'd be surprised.
  That's already kind of an issue with vloggers. Often they're looking just left or right of the camera at a monitor or something.
AndrewVos 194 days ago
Elon weirdly looks more human than usual in the AI version!
initramfs 194 days ago
nice feature for low bandwidth 4G cell systems.
Reminds me of the video chat in Metal Gear Solid 1 https://youtu.be/59ialBNj4lE?t=21
[-]
- dormento 194 days ago
  Now that you mention it, it never occurred to me that Snake's radio transmitted video as well. "Did you like my new sunglasses?"
  If you could reserve a small portion of the radio bandwidth to broadcast a thumbnail + low bandwidth compressed representation of the face movements, you could technically have something similar without encoding any video (think low res, eye + mouth movements).
- hinkley 194 days ago
  Nice feature for many to one video conferencing as well. Though I don’t know if the organizers will agree.
pastelsky 194 days ago
Did not expect to see Emraan Hashmi in this post!
[-]
- shaan7 194 days ago
  Indeed! Bollywood makes it to HN xD
stuaxo 193 days ago
Bit off putting that it's Musk for some reason, maybe it's just overexposure to his bullshit, I could quite happily never see him again.
Maybe there is a custom web filter in there somewhere that could block particular people and images of them.
[-]
- Separo 193 days ago
  We could run something like TensorFlow.js in a Chrome extension to identify the person in the image and replace it in the dom. A little resource intensive for inference on every image in but probably worth it in this case.
JimDabell 194 days ago
I got some interesting replies when I suggested this technique here:
https://news.ycombinator.com/item?id=22907718
jacobgorm 194 days ago
Related Show HN https://news.ycombinator.com/item?id=31516108
userbinator 193 days ago
the only information that needs to be transmitted is the change in expression, pose and facial keypoints
Does anyone else remember the weirder (for lack of a better term) features of MPEG-4 part 2, like face and body animation? It did something like that, but as far as I know nearly no one used that feature for anything.
https://en.wikipedia.org/wiki/Face_Animation_Parameter
and in the worst, trust on the internet will be heavily undermined
...as long as the model doesn't include data to put a shoe on one's head.
accra4rx 193 days ago
why does he has to deep fake Imran Hashmi the serial kisser
tommiegannert 194 days ago
Now that we're moving towards context-specific compression algorithms, can we please use WASM as the file header for these media files, instead of inventing something new. :)
vtodekl 194 days ago
[dead]
up2isomorphism 194 days ago
“Perceptually lossless” is an oxymoron.
[-]
- ranger_danger 194 days ago
  As there are several patents, published studies, IEEE papers and thousands of google results for the term, I think it's safe to say that many people do not agree with your interpretation of the term.
- hinkley 194 days ago
  You’re still listening to vinyl, arntcha?
  Lossiness definitely matters when you’re doing forensics. But not for consumers.
  If you just want to bop to Taylor who the fuck cares. The iPod ended that argument. Yes I can be a perfectionist, or I can have one thousand songs in my pocket. That was more than half of your collection for many people at the time.
  [-]
  - up2isomorphism 193 days ago
    Calm down dude. It is just a marketing term for something lossy.
- esafak 193 days ago
  It means you don't perceive the loss. What are you arguing; that you can perceive any loss?
- Brian_K_White 194 days ago
  There is no oxymoron in "no perceived loss".
andrewstuart 194 days ago
The more magic AI makes, the less magical the world becomes.
[-]
- EarlKing 194 days ago
  Clearly Sauron is a jealous ringmaker and doesn't like hobbits using his ring to shitpost.
  [-]
  - Joel_Mckay 194 days ago
    Probably just disappointed at the wasted bandwidth:
    24fps * 52 facial 3D marker * 16bit packed delta planar projected offsets (x,y) = 19.968 kbps
    And this is done in Unreal games on a potato graphics card all the time:
    https://apps.apple.com/us/app/live-link-face/id1495370836
    I am sure calling modern heuristics "AI" gets people excited, but it doesn't seem "Magical" when trivial implementations are functionally equivalent. =3
    [-]
    - scotty79 194 days ago
      I think the point here is to make it photorealistic which everything apart from AI still fails at superhard.
      [-]
      - Joel_Mckay 194 days ago
        Take a minute to look something up first, and then formulate a more interesting opinion for us to discuss:
        https://www.unrealengine.com/en-US/metahuman
        The artifacts in raster image data is nowhere near what a reasonable model can achieve even at low resolutions. =3
        [-]
        scotty79 194 days ago
        I know metahuman. As impressive as it is, when you judge by the standards of game graphics, if you are ever mislead into thinking metahumans are real humans or even real physically existing things it's time to see your eye doctor (and/or do MRI head scan).
        On the other hand AI videos can be easily mistaken for people or hyper realistic physical sculptures.
        https://img-9gag-fun.9cache.com/photo/aYQ776w_460svvp9.webm
        There's something basic about how light works that traditional computer graphics still fails to grasp. Looking at its productions and comparing it to what AI generates is like looking at output of amateur and an artist. Sure, maybe artist doesn't always draw all 5 fingers but somehow captures the essence of the image in seemingly random arrangement of light and dark strokes, while amateur just tries to do their best but fails in some very significant ways.
        [-]
        Joel_Mckay 194 days ago
        "AI" videos make many errors all the time, but most people are not aware of what to look for... Undetectable CGI is done in film/games all the time, and indeed it takes talent to hide the fact it is fake.
        One could rely on the media encoder to garble output enough to look more plausible (people on potato devices are used to looking at garbage content.) However, at the end of the day the "uncanny valley" effect takes over every-time even for live action data in a auto-generated asset, as the missing data can't be "Magically" recovered with 100% certainty.
        Bye =3
        [-]
        scotty79 194 days ago
        Undetectable CGI in games ... right. I don't think you are a gamer.
        In movies it can be done with enough of manual tweaking by artists and a lot of photographic content around to borrow sense of reality from it.
        "Potato" devices by which I assume you mean average phones, currently have better resolutions than PCs had very recently and a lot still do (1080p).
        And a photo on 480p still looks more real than anything CGI (not AI).
        Your signature is hilarious. I won't comment about the reasons because I don't want this whole thread to get flagged.
        [-]
        Joel_Mckay 194 days ago
        I think most "AI" slop content falls under this phenomena:
        https://www.youtube.com/watch?v=vJG698U2Mvo
        Several 8bit games had their own aesthetic charm, but were at least fun...
        Cheers, =3
- psychoslave 194 days ago
  The greatest feat ever: let magic disappear before wonder of understanding.
- xyzsparetimexyz 194 days ago
  Oh shut up. There's plenty of awful uses for ai but this isn't one of them
- HPsquared 194 days ago
  This is the power of numerical methods.
  [-]
  - andrewstuart 194 days ago
    There’s a finite amount of magic and if AI borrows it here then it must be repaid there.
- andai 194 days ago
  What did you mean by this?
- satvikpendem 194 days ago
  > Any sufficiently advanced technology is indistinguishable from magic.
  - Arthur C. Clarke
- andai 194 days ago
  ?
  [-]
  - andai 193 days ago
    Why am I downvoted for asking parent to clarify? Was I impolite for not using a full sentence?