13 comments

  • timlod 53 days ago
    The title is a bit confusing as open-source separation of ... reads like source separation, which this is not. Rather, it is a pitch detection algorithm which also classifies the instrument the pitch originated with.

    I think it's really neat, but the results look like it could take more time to fix the output than using a manual approach (if really accurate results are required).

    • earthnail 53 days ago
      Thanks for clarifying.

      In fairness to the author, he is still at high school: https://matthew-bird.com/about.html

      Amazing work for that age.

      • veunes 53 days ago
        He's definitely a talent to watch!
      • timlod 53 days ago
        Wow, I didn't see that. Great to see this level of interest early on!
    • TazeTSchnitzel 53 days ago
      Is “source separation” better known as “stem separation” or is that something else? I think the latter term is the one I usually hear from musicians who are interested in taking a single audio file and recovering (something approximating) the original tracks prior to mixing (i.e. the “stems”).
      • timlod 53 days ago
        Audio Source Separation I think is the general term used in research. It is often applied to musical audio though, where you want to do stem separation - that's source separation where you want to isolate audio stems, a term referring to audio from related groups of signals, e.g. drums (which can contain multiple individual signals, like one for each drum/cymbal).
      • Earw0rm 53 days ago
        Stem separation refers to doing it with audio playback fidelity (or an attempt at that). So it should pull the bass part out at high enough fidelity to be reused as a bass part.

        This is a partly solved problem right now. Some tracks and signal types can be unmixed easier than others, it depends on what the sources are and how much post-processing (reverb, side chaining, heavy brick wall limiting and so on)

        • dylan604 53 days ago
          > This is a partly solved problem right now.

          I'd agree with the partly. I have yet to find one that either isolates an instrument as a separate file or removes one from the rest of the mix that does not negatively impact the sound. The common issues I hear are similar to the early internet low bit rate compression. The new "AI" versions are really bad at this, but even the ones available before the AI craze were still susceptible

          • mh- 52 days ago
            I'm far (far) from an expert in this field, but when you think about how audio is quantized into digital form, I'm really not sure how one solves this with the current approaches.

            That is: frequencies from one instrument will virtually always overlap with another one (including vocals), especially considering harmonics.

            Any kind of separation will require some pretty sophisticated "reconstruction" it seems to me, because the operation is inherently destructive. And then the problem becomes one of how faithful the "reproduction" is.

            This feels pretty similar to the inpainting/outpainting stuff being done in generative image editing (a la Photoshop) nowadays, but I don't think anywhere near the investment is being made in this field.

            Very interested to hear anyone with expertise weigh in!

            • nineteen999 51 days ago
              I won't say expertise, but what I've done recently:

              1) used PixBim AI to extract "stems" (drums, bass, piano, all guitars, vocals). Obviously a lossless source like FLAC works better than MP3 here

              2) imported the stems to ProTools.

              3) from there, I will usually re-record the bass, guitars, pianos and vocals myself. Occassionally the drums as well.

              This is a pretty good way I found to record covers of tracks at home, re-using the original drums if I want to, keeping the tempo of the original track intact etc. I can embellish/replace/modify/simplify parts that I re-record obviously.

              It's a bit like drawing using tracing paper, you're creating a copy to the best of your ability, but you have a guide underneath to help you with placement.

            • Earw0rm 48 days ago
              It's not really digital quantisation that's the problem, but everything else that happens during mixing - which is a much more complicated process, especially for pop/rock/electronic etc., than just "sum all the signals together".

              There's a bunch of other stuff that happens during and after summing which makes it much harder to reliably 100% reverse that process.

              • mh- 48 days ago
                I didn't mean to say that quantization was the problem, just that you're basically trying to pick apart a "pixel" (to continue my image-based analogy) that is a composite of multiple sounds (or partially-transparent image layers).

                I was sincere when I said:

                > I'm really not sure how one solves this with the current approaches.

                I was hoping someone would come along and say it is, in fact, possible. :)

      • popalchemist 52 days ago
        Source separation is a general term, stem separation is a specific instance of source separation.
    • emptiestplace 53 days ago
      No, it doesn't read like that. The hyphen completely eliminates any possible ambiguity.
      • ipsum2 53 days ago
        The title of the submission was modified. It you read the article it says:

        Audio Decomposition [Blind Source Seperation]

      • croes 53 days ago
        Maybe added later by OP? Because there is no hyphen in the article’s subtitle.

        >Open source seperation of music into constituent instruments.

        • emptiestplace 53 days ago
          The complaint:

          > The title is a bit confusing as open-source separation of ... reads like source separation, which this is not.

  • loubbrad 53 days ago
    I didn't see it referenced directly anywhere in this post. However, for those interested, automatic music transcription (i.e., audio->MIDI) is actually a decently sized subfield of deep learning and music information retrieval.

    There have been several successful models for multi-track music transcription - see Google's MT3 project (https://research.google/pubs/mt3-multi-task-multitrack-music...). In the case of piano transcription, accuracy is nearly flawless at this point, even for very low-quality audio:

    https://github.com/EleutherAI/aria-amt

    Full disclaimer: I am the author of the above repo.

    • Earw0rm 53 days ago
      He's trying to solve a second (also hard ish) problem as well, deriving an accurate musical score from MIDI data. It's a "sounds easy but isn't" problem, especially when audio to MIDI transcribers are great at pitch and onset times, but rather less reliable at duration and velocity.
      • loubbrad 53 days ago
        I agree that the audio->score and MIDI->score problems are quite hard. There has been research in this area too, however it is far less developed than audio->MIDI.
        • Earw0rm 53 days ago
          That's because MIDI doesn't contain all the information that was in a score.

          Scores are interpreted by musicians to create a performance, and MIDI is a capture of (some of) the data about that performance. Music engraving is full of implicit and explicit cultural rules, and getting it _right_ has parallels with handwritten kanji script in terms of both the importance of correctness to the reader, and the amount of traps for the unwary or uncultured.

          All of which can be taken to mean "classical musicians are incredibly picky and anal about this stuff", or, "well-formed music notation conveys all sorts of useful contextual information beyond simply 'what note to play when'".

          • pclmulqdq 53 days ago
            A lot of modern scores are written with MIDI in mind (whether or not the composer knows it - that's how they hear it the first 50 or so times). That should make it somewhat easier to go MIDI -> score for similar pieces. Current attempts I have seen still make a lot of stupid errors like making note durations too precise and spelling accidentals badly. There's probably still a lot of low-hanging fruit.

            This is absolutely not easy, though, given all the cultural context. Things like picking up a "legato" or "cantabile" marking and choosing an accent vs a dagger or a marcato mark are going to be very difficult no matter what.

    • bravura 53 days ago
      I know the reported scores of MT3 are very good, but have you had success with using it yourself?

      https://replicate.com/turian/multi-task-music-transcription

      I ported their colab to runtime so I could use it more easily.

      The MIDI output is... puzzling?

      I've tried feeding it even simple stems and found the output unusable for some tracks, i.e. the MIDI output and audio were not well aligned and there were timing issues. On other audio it seemed to work fine.

      • loubbrad 53 days ago
        Multi-track transcription has a long way to go before it seriously useful for real-world applications. Ultimately I think that converting audio into MIDI makes a lot more sense for piano/guitar transcription than it does for complex multi-instrument works with sound effects ect...

        Luckily for me, audio-to-seq approaches do work very well for piano, which turns out to be an amazing way of getting expressive MIDI data for training generative models.

      • air217 53 days ago
        I developed https://pyaar.ai, it uses MT3 under the hood. I realized that continuous string instruments (guitar) that have things like slides, bends are quite difficult to capture in MIDI. Piano works much better because it's more discrete (the keys abstract away the strings) and so the MIDI file has better representation
        • duped 53 days ago
          > I realized that continuous string instruments (guitar) that have things like slides, bends are quite difficult to capture in MIDI.

          It's just pitch bend?

          I think trying to transcribe as MIDI is just a fundamentally flawed approach that has too many (well known) pitfalls to be useful.

          A trained human can listen to a piece and transcribe it in seconds, but programming it as MIDI could take minutes/hours. If you're not trying to replicate how humans learn by ear, you're probably approaching this wrong.

    • WiSaGaN 53 days ago
      How does the problem simplify when it's restricted to piano?
      • loubbrad 53 days ago
        Essentially, the leading way to do automatic music transcription is to train a neural network on supervised data, i.e., paired audio-MIDI data. In the case of piano recordings, there is a very good dataset for this task which was released by Google in 2018:

        https://magenta.tensorflow.org/datasets/maestro

        Most current research involves refining deep learning based approaches to this task. When I worked on this problem earlier this year, I was interested in adding robustness to these models by training a sort of musical awareness into them. You can see a good example of it in this tweet:

        https://x.com/loubbrad/status/1794747652191777049

  • fxj 53 days ago
    If you are interested in audio (or stem) separation have a look at RipX

    https://hitnmix.com/ripx-daw-pro/

    It can even export the separated tracks as midi files. It still has some problems but works very well. Stem separation is now standard in the musical software and almost every DAW provides it.

    • tasty_freeze 53 days ago
      RipX can do stem separation and allows repitching notes in the mix. If that is what you want to do it is great.

      I find moises (https://moises.ai/) to be easy to use for the tasks I need to do. It allows transposing or time scaling the entire song. It does stem separation and has a simple interface for muting and changing the volume on a per-track basis. It auto-detects the beat and chords.

      I'm not affiliated, just a happy nearly-daily user for learning and practicing songs. I boost the original bass part and put everything else at < 10% volume to hear the bass part clearly clearly (which often shows how bad online transcriptions are, even paid ones). Once once I know the part, I mute the bass part and play along with the original song as if I was the bass player.

      • alok-g 52 days ago
        Moises looks promising.

        I wonder why pricing information is so hard to find these days. Would like to get an idea of the same.

    • sbarre 53 days ago
      Stemroller[0] has been around for a while too, it's free and based on Meta's models:

      0: https://www.stemroller.com/

      • cloudking 53 days ago
        I've heard Meta's Demucs is SOTA, has anything else better come out since?
        • adzm 52 days ago
          It's still pretty much the best, though there are fine tunings and tweaks on top of that and the runner-up MDX that work well for specific scenarios.
    • oidar 53 days ago
      > almost every DAW provides it.

      It's an up and coming feature that nearly every DAW should have, but most don't yet.

      Ableton Live - No

      Bigwig - No

      Cubase - No

      FL - Yes

      Logic - Yes

      Pro Tools - No

      Reason - No

      Reaper - No

      Studio One - Yes

      • fxj 52 days ago
        MPC3 - Yes

        Mixcraft - Yes

        Maschine3 - Yes

    • antback 53 days ago
      It appears to be related to Polymath.

      https://github.com/samim23/polymath

      Polymath is effective at isolating and extracting individual instrument tracks from MP3s. It works very well.

    • makz 53 days ago
      Thanks for the information. I’m a long time Logic Pro user and I wasn’t aware of this feature.
      • Sporktacular 53 days ago
        On an M1/2/3/4 processor. Not Intel.
  • bottom999mottob 53 days ago
    This is really cool, but there's real-world instrument physics that might not be captured by simple Fourier transform templates, like a trumpet playing softly can have a significantly different harmonic spectrum than the same trumpet playing loudly, even at the same pitch

    Trumpets produce a rich harmonic series with strong overtones, meaning their Fourier transform would show prominent peaks at integer multiples of the fundamental frequency. Instruments like flutes have more pure tones, but brass instruments typically have stronger higher harmonics, which would lead to more complex partial derivatives in the matrix equation shown in the article

    So this script uses bandpass filtering and cross-correlation of attack/release envelopes to identify note timing. Given that brass instruments can exhibit non-linear behavior where the harmonic content changes significantly with playing intensity (think of the brightness difference between pp and ff passages), not sure how would this algorithm could handle intensity-dependent timbral variations. I'd consider adding intensity-dependent Fourier templates for each instrument to improve accuracy

    • atoav 53 days ago
      As someone who uses source separation twice a week for mixing purposes the number of other instruments that can produce sounds of "vocal" quality is high. These models all stop functiining well when you have bands where the instruments don't sound typical and aren't played and/or mixed in a way that achieves maximum separation between them — e.g. an electrical guitar with a distorted harmonic hitting the same note as your singer while the drummer plays only shrieking noises on their cymbals and the bass player simulates a punching kick drum on their instrument.

      In these situations (experimental music) source separation will produce completely unpredictable results, thst may or may not be useful for musical rebalancing.

      • fnordlord 53 days ago
        What tool do you use for the source separation? Everything I've used so far is great for learning or transcribing to MIDI but the separated tracks always have a strange phasing sound to them. Are you doing something to clean that up before mixing back in or are the results already good enough?
        • atoav 53 days ago
          iZotope RX with musical rebalance, great to reduce drum spill from vocal mics
  • ekianjo 53 days ago
    Looks like this may be the work of Joshua Bird's little brother (?). Joshua bird did some impressive projects already, that were featured on HN before: https://www.youtube.com/@joshuabird333
    • njb99 46 days ago
      Yes, Matt is Josh's little brother. I'm impressed - and very pleased - you noticed this.
  • generalizations 53 days ago
    No one else is going to mention that "separation" was misspelled four times?
  • baq 53 days ago
    Got a flashback of playing audiosurf 15 or so years ago. Time flies.

    https://en.wikipedia.org/wiki/Audiosurf

  • ipsum2 53 days ago
    I must be dumb, but none of the YouTube video demos are demonstrating source separation?

    Edit: to clarify, source separation in audio research means separating out the audio into separate clips.

    • atoav 53 days ago
      I think decomposition is the word, source separation in this case (misleadingly) referes to the fact that the decomposed notes can be separated into different sources.
    • wkjagt 53 days ago
      The "source" here goes with "open source".
  • fonema 52 days ago
    I'm a long-time fan of Ultrastar Deluxe, which is an open-source clone of Singstar. This is a karaoke game where people compete by singing along to the tune. It recognizes the notes you are singing and compares them to a vocals-timings mapping file for that particular song. The better you sing to the tune (getting the words correct doesn't matter), the higher your score.

    While there are extensive libraries of fan-made song mappings, it's never enough, and there are very few mapped songs in languages other than English or Spanish (if you or your friends prefer your native language). Doing the entire mapping manually is time-consuming, not to mention that I am almost tone-deaf myself, which would make it even more difficult. I have been wondering for a long time what software I could use to make this process easier to automate. This seems like a great tool for capturing vocal timings and notes from original songs.

    I have it on my bucket list to create a Singstar playlist in my native language and host a singing party with friends.

    Does anyone have suggestions for other similar tools?

  • DidYaWipe 53 days ago
    Some of those videos don't have audio, as far as I can tell...
    • tjoff 53 days ago
      The youtube links explains why: "No audio as a result of copyright." And also has a link to the audio that you can play alongside.
      • DidYaWipe 52 days ago
        Of course, we can't expect Google to respect the obvious fair-use nature of these demonstrations.
  • bastloing 53 days ago
    I can't find the source code, but the project looks interesting.
  • kasajian 53 days ago
    dude can't spell
    • berbec 53 days ago
      He's in high school and pulls of a project like this. I thought I was slick convincing the 7-11 guy to give me my Twist-a-Pepper soda without charging me bottle deposit or tax.
  • testoveride 53 days ago
    Ff