2 comments

  • habitue 1 day ago
    Does apple actually name their research papers "Pro" too? Like is there an iLearning paper out there?
  • isoprophlex 2 days ago
    The example images look convincing, but the sharp hairs of the llama and the cat are pictured against an out-of-focus background...

    In real life, you'd use these models for synthetic depth-of-field, adding fake bokeh to a very sharp image that's in focus everywhere. so this seems too easy?

    Impressive latency tho.

    • amluto 1 day ago
      I’m not convinced that this type of model is the right solution to fake bokeh, at least not if you use it as a black box. Imagine you have the letter A in the background behind some hair. You should end up with a blurry A and most in-focus hair. Instead you end up with an erratic mess, because a fuzzy depth map doesn’t capture the relevant information.

      Of course, lots of text-to-image models generate a mess, because their training sets are highly contaminated by the messes produced by “Portrait mode”.

    • dagmx 1 day ago
      On visionOS 2, there’s functionality to convert 2D images to 3D images for stereo viewing.

      https://youtu.be/pLfCdI0mjkI?si=8K7rPHu558P-Hf-Z

      I assume the first pass is the depth inference here.

      • netruk44 1 day ago
        If this is the model they're using then, speaking as someone who owns a Vision Pro, this works really well but there are definitely still edge cases where it mis-estimates depth.

        In particular, things in the distance being bisected by things in the foreground (such as water behind a fence or telephone wires behind a utility pole) can still sometimes trip it up. Not always, but there are edge cases.

    • JBorrow 1 day ago
      I don't think the only utility of a depth model is to provide synthetic blurring of backgrounds. There are many things you'd like to use them for, including feeding into object detection pipelines.