Show HN: Text-to-video model from scratch (2 brothers, 2 years, 2B params)

(huggingface.co)

51 points | by schopra909 9 hours ago

3 comments

WhitneyLand 2 hours ago
Great work. How many GPU hours to train?
E-Reverance 4 hours ago
Post it on r/StableDiffusion
streamer45 8 hours ago
Rad! huggingface link gives 404 on my side though.
[-]
- schopra909 8 hours ago
  Oh damn! Thanks for catching that -- going to ping the HF folks to see what they can do to fix the collection link.
  In the meantime here's the individual links to the models:
  https://huggingface.co/Linum-AI/linum-v2-720p https://huggingface.co/Linum-AI/linum-v2-360p
  [-]
  - schopra909 8 hours ago
    Should be fixed now! Thanks again for the heads up
    [-]
    - streamer45 8 hours ago
      All good, cheers!
      [-]
      - schopra909 8 hours ago
        Per the RAM comment, you may able to get it run locally with two tweaks:
        https://github.com/Linum-AI/linum-v2/blob/298b1bb9186b5b9ff6...
        1) Free up the t5 as soon as the text is encoded, so you reclaim GPU RAM
        2) Manual Layer Offloading; move layers off GPU once they're done being used to free up space for the remaining layers + activations
        [-]
        dsrtslnd23 2 hours ago
        Any idea on the minimum VRAM footprint with those tweaks? 20GB seems high for a 2B model. I guess the T5 encoder is responsible for that.
        [-]
        schopra909 34 minutes ago
        T5 Encoder is ~5B parameters so back of the envelope would be ~10GB of VRAM (it's in bfloat16). So, for 360p should take ~15 GB RAM (+/- a few GB based on the duration of video generated).
        We can update the code over the next day or two to provide the option for delete VAE after the text encoding is computed (to save on RAM). And then report back the GB consumed for 360p, 720p 2-5 seconds on GitHub so there are more accurate numbers.
        Beyond the 10 GB from the T5, there's just a lot of VRAM taken up by the context window of 720p video (even though the model itself is 2B parameters).
  - streamer45 8 hours ago
    Looks like 20GB VRAM isn't enough for the 360p demo :( need to bump my specs :sweat_smile: