TinyLoRA – Learning to Reason in 13 Parameters

(arxiv.org)

131 points | by sorenjan 4 days ago

6 comments

  • matt123456789 1 hour ago
    Such low dimensionality of the LoRA vector must surely result in a close-to-linear modification to the KV calculation. This seems to me to imply that what we call "reasoning" is latent within the model. Pretty clear I didn't read the paper, I'm sure the authors address this.
    • a-t-c-g 1 hour ago
      Yes - some degree of reasoning appears to be latent in the structure of language itself. But models trained explicitly on reasoning-focused data still perform better than models trained only on general corpora.*

      *At least up to 300B parameters, based on the models we’ve tested.

  • a-t-c-g 3 hours ago
    The quality of custom models trained with proper reasoning datasets[0] even with small parameters (3-7B is sweet spot) is incredible now

    [0]: cartesien.io or Salesforce's WebscaleRL

    • objektif 2 hours ago
      What are you basing how good they are on? Personal experience or some benchmarks?
      • a-t-c-g 1 hour ago
        Benchmarks, we have internal ones testing reasoning fine-tuned v/s frontier + prompts

        For some use cases it can be parity performance at 1/20th the cost up to exceeds at 1/10th the cost. Trade-off is ofc narrow applicability

  • measurablefunc 4 hours ago
    With four parameters I can fit an elephant, and with five I can make him wiggle his trunk so there is still room for improvement.
    • esafak 4 hours ago
      Except learning to reason is a far cry from curve fitting. Our brains have more than five parameters.
      • sdenton4 1 hour ago
        It's the statistics equivalent of 'no one needs more than 640kb of RAM'
      • voxelghost 3 hours ago
        After a quick content browse, my understanding is this is more like with a very compressed diff vector, applied to a multi billion parameter model, the models could be 'retrained' to reason (score) better on a specific topic , e.g. math was used in the paper
      • ekuck 3 hours ago
        speak for yourself!
      • est 3 hours ago
        reasoning capability might just be some specific combinations of mirror neurons.

        even some advanced math usually evolves applying patterns found elsewhere into new topics

      • measurablefunc 3 hours ago
        I agree, I don't think gradient descent is going to work in the long run for the kind of luxurious & automated communist utopia the technocrats are promising everyone.
  • Sim-In-Silico 2 hours ago
    [dead]
  • ValveFan6969 30 minutes ago
    [dead]
  • ValveFan6969 3 hours ago
    [dead]