This is interesting. The blog post links several papers, and I recommend reading them.
Responses here however seem not commensurate with the evidence presented. Two of the papers[0][1] that provide the sources for the illustration in the blog post are about research conducted on a very small group of subjects. They measure neural activity when listening to a 30 minutes podcast (5000 words). Participants tried to guess next words. All the talk about "brain embedding" is derived from interpreting neuronal activity and sensor data geometrically. It is all very contrived.
Very interesting stuff from a neuroscience, linguistics and machine learning perspective. But I will quote from the conclusion of one of the papers[1]: "Unlike humans, DLMs (deep language models) cannot think, understand or generate new meaningful ideas by integrating prior knowledge. They simply echo the statistics of their input"
I find the OP very difficult to comprehend, to the point that I question whether it has content at all. One difficulty is in understanding their use of the word "embedding", defined (so to speak) as "internal representations (embeddings)", and their free use of the word to relate, and even equate, LLM internal structure to brain internal structure. They are simply assuming that there is a brain "embedding" that can be directly compared to the matrix of numerical weights that comprise an LLM's training. That seems a highly dubious assumption, to the point of being hand-waving.
They mention a profound difference in the opening paragraph, "Large language models do not depend on symbolic parts of speech or syntactic rules". Human language models very obviously and evidently do. On that basis alone, it can't be valid to just assume that a human "embedding" is equivalent to an LLM "embedding", for input or output.
> They are simply assuming that there is a brain "embedding" that can be directly compared to the matrix of numerical weights that comprise an LLM's training.
If there were no such structure, then their methods based on aligning neural embeddings with brain "embeddings" (really just vectors of electrode values or voxel activations) would not work.
> They mention a profound difference in the opening paragraph, "Large language models do not depend on symbolic parts of speech or syntactic rules". Human language models very obviously and evidently do. On that basis alone, it can't be valid to just assume that a human "embedding" is equivalent to an LLM "embedding", for input or output.
This feels like "it doesn't work the way I thought it would, so it must be wrong."
I think actually their point here is mistaken for another reason: there's good reason to think that LLMs do end up implicitly representing abstract parts of speech and syntactic rules in their embedding spaces.
>They mention a profound difference in the opening paragraph, "Large language models do not depend on symbolic parts of speech or syntactic rules. "Human language models very obviously and evidently do.
Honestly do they ? To me, they clearly don't. Grammar is not how language works. It's useful fiction. Language even in humans seems to be a very statistical process.
Yes! As somebody who speaks 2 languages, and sort of reads/understands 2 more, I cannot agree more. Human spoken languages do not follow any grammars. Grammars are just simplified representations of reality that is probabilistic in nature.
This is something that Chomsky got very wrong, and the statistical/ML crowd got very right.
Languages definitely follow grammars. They don't follow the grammars that were written by observing them, but you can discover unwritten grammatical structures that are nevertheless followed by everyone who speaks a language, and who if asked wouldn't even be able to articulate the rules that they are following. It's the following that defines the grammar, not the articulation of the rules.
Statistics are just another way to record a grammar, all the way down to the detail of how one talks about bicycles, or the Dirty War in Argentina.
If a grammar is defined as a book that enumerates the rules of a language, then of course language doesn't require following a grammar. If a grammar is defined as a set of rules for communicating reasonably well with another person who knows those same rules, then language follows grammars.
But it's the other way around! Grammars follow languages. Or, more precisely, grammars are (very lossy) language models.
They describe typical expectations of an average language speaker. Grammars try to provide a generalized system describing an average case.
I prefer to think of languages as a set of typical idioms used by most language users. A given grammar is an attempt to catch similarities between idioms within the set and turn 'em into a formal description.
A grammar might help with studying a language, and speed up the process of internalizing idioms, but the final learning stage is a set of things students use in certain situations aka idioms. And that's it.
> Statistics are just another way to record a grammar
I almost agree.
But it should be "record a language". These are two approaches to the problem of modeling human languages.
Grammars are an OK model. Statistical models are less useful to us humans but given the right amount of compute they do show much better (see LLMs).
Linguists however know that grammar is, indeed, important for linguistic comprehension. For example, the German "Ich sehe die Frau mit dem Fernglas" (I see the woman with the binoculars) is _unambiguous_ because "die Frau" and "mit dem Fernglas" match in both gender and case. If this weren't the case, it could be either "I see (the woman with the binoculars)" or "I see (the woman) with [using] the binoculars". Even in German you might encounter this e.g. if you instead had to say "Ich sehe das Mädchen mit dem Fernglas", as das Mädchen (the girl) is neuter rather than feminine in gender.
Die Frau and dem Fernglass don’t bind tightly though.
In my view, this phrase is only unambiguous to those who feel the preposition tradition, and all the heavy lifting is done here by “mit” (and “durch” in the opposite case, if one wants to make it clear). Articles are irrelevant and are dictated by the verb and the preposition, whose requirements are sort of arbitrary (sehen Akk., mit Dat.) and fixed. There’s no article-controlled variation that could change meaning, to my knowkedge it would be simply incorrect.
I’m also quite rusty on Deutsch, aber habe es nicht völlig vergessen, it seems.
> For example, the German "Ich sehe die Frau mit dem Fernglas" (I see the woman with the binoculars) is _unambiguous_ because "die Frau" and "mit dem Fernglas" match in both gender and case. If this weren't the case, it could be either "I see (the woman with the binoculars)" or "I see (the woman) with [using] the binoculars".
My German is pretty rusty, why exactly is it unambiguous?
I don't see how changing the noun would make a difference. "Ich sehe" followed by any of these: "den Mann mit dem Fernglas", "die Frau mit dem Fernglas", "das Mädchen mit dem Fernglas" sounds equally ambiguous to me.
My point is that Grammar is to language what Newton was to gravity i.e useful fiction that works well enough for most scenarios, not that language has no structure.
How do you explain syntactic islands, binding rules or any number of arcane linguistic rules that humans universally follow? Children can generalise outside of their training set in a way that LLMs simply cannot (e.g. Nicaraguan sign language or creolization)
I don’t disagree with any of your particular points, but I think you’re missing the forest here: their argument is primarily based in empirical results, not a theoretical framework/logical deduction. In other words, they’re trying to explain why LLMs work so well for decoding human neural content, not arguing that they do!
I think any reasonable scientist would a-priori react the same way to these claims as claims that neural networks alone can possibly crack human intuition: “that sounds like sci-fi speculation at best”. But that’s the crazy world we live in…
Is there some theorem stating something like random few-hot vectors can always be combined linearly to match any signal with a low p-value?
I thought I encountered it sometimes in my experiments and that this might be happening in this llm x neuroscience trend of matching llm internals to brain signals.
I view this as compelling evidence that current models are more than "stochastic parrots," because as the OP shows, they are learning to model the world in ways that are similar (up to a linear transformation) to those exhibited by the human brain. The OP's findings, in short:
* A linear transformation of a speech encoder's embeddings closely aligns them with patterns of neural activity in the brain's speech areas in response to the same speech sample.
* A linear transformation of a language decoder's embeddings closely aligns them with patterns of neural activity in the brain's language areas in response to the same language sample.
Yeah, I have always firmly maintained that there is less fundamental difference between LLMs and human brains than most people seems to assume.
Going a bit further, I'll speculate that the actions made by a human brain are simply a function of the "input" from our ~5 senses combined with our memory (obviously there are complications such as spinal reflexes, but I don't think those affect my main point). Neural nets are universal function approximators, so can't a sufficiently large neural net approximate a full human brain? In that case, is there any merit to saying that a human "understands" something in a way that a neural net doesn't? There's obviously a huge gap between the two right now, but I don't see any fundamental difference besides "consciousness" which is not well defined to begin with.
The UAT is a pretty weak result in practice. A lot of systems have the same property, and most of them are pretty poor approximators in practice. It may very well be that no reasonable amount of computing power allows approximating the "function of consciousness". Plus, if you're a certain kind of dualist the entire idea of a compact, smooth "consciousness" function may be something you reject philosophically.
I agree there are issues with the UAT, but I feel like my conclusion is still valid: a neural net, given the memories and senses that a humans has, is capable of approximating a human's response accurately enough to be indistinguishable from another human, at least to another human.
I philosophically reject the notion that consciousness is an important factor here. The question of whether or not you have a consciousness doesn't affect what I take away from this conversation, and similarly the question of whether an AI has a consciousness doesn't affect what I take away from my actions with it. If the (non-)existence of others' consciousnesses doesn't materially affect my life—and we assume that it's a fundamentally unanswerable question—why should I care other than curiosity?
>a neural net, given the memories and senses that a humans has, is capable of approximating a human's response accurately enough to be indistinguishable from another human, at least to another human.
That doesn't remotely follow from the UAT and is also almost certainly false.
> current models are more than "stochastic parrots"
I believe the same, and also I'm willing to accept that the human brain can intentionally operate in an stochastic parrot mode.
Some people have the ability to fluently speak non-stop, completely impromptu. I wonder if it's similar to an LLM pipeline, where there'a constant stream of thoughts being generated based on very recent context, which are then passed through various output filters.
> I view this as compelling evidence that current models are more than "stochastic parrots,"
More evidence against "stochastic parrots"
- zero shot translation, where LLMs can translate between unseen pairs of languages
- repeated sampling of responses from the same prompt - which shows diversity of expression with convergence of semantics
- reasoning models - solving problems
But my main critique is that they are better seen as pianos, not parrots. Pianos don't make music, but we do. And we play the LLMs on the keyboard like regular pianos.
How human brains process thoughts is non-uniform across population. There’s imagery, written language, sound, speech, tactile, etc. Not everything that you think about is readily expressible in your language. There are definitely people with and without “internal screen”, and probably few more types with/without X, where X is a set of things we’ve never talked about, either assuming everyone has it or not realizing that it’s a non-mandatory part of how you think.
That's not really what I'm saying. What I'm saying is how does the brain look when you do both ? Is there a clear difference ? There's no 'thinking mode' followed by a 'language processing mode'.
Language processing is thinking as far the brain is concerned and there's no evidence that these are 2 cleanly separated processes whether you 'think' in words or not.
That's 100% false, dogs and pigeons can obviously think, and it is childish to suppose that their thoughts are a sequence of woofs or coos. Trying to make an AI that thinks like a human without being able to think like a chimpanzee gives you reasoning LLMs that can spit out proofs in algebraic topology, yet still struggle with out-of-distribution counting problems which frogs and fish can solve.
The correlations are 0.25-0.5, which is quite poor (Gaussian distribution plots with those correlations look like noise). That's before analyzing the methodology and assumptions.
Correlation of 0.25-0.5 being poor is very problem dependent.
For example, in difficult perceptual tasks ("can you taste which of these three biscuits is different" [one biscuit is made with slightly less sugar]), a correlation of 0.3 is commonplace and considered an appropriate amount of annotator agreement to make decisions.
Yes for certain things like statistical trading (assuming some kind of "nice" Gaussian-like distribution) where you have lots of trades and just need to be more right than wrong it's probably useful.
Not here though, where you are trying to prove a (near) equivalence.
I do not understand what you find convincing about this that changes your mind.
We have a closed system that we designed to operate in a way that is similar to our limited understanding of how a portion of the brain works, based on how we would model that part of the brain if it had to traverse an nth-dimensional array. We have loosely observed it working in a way that could roughly be defined as similar to our limited understanding of how a portion of the brain works given that limitation that we know is not true of the human brain, with a fairly low confidence level.
Even if you put an extreme level of faith into those very subpar conclusions and take them to be rigid... That does not make it actually similar to the human brain, or any kind of brain at all.
My mildly grumpy opinion: this is not the first paper to show correlation between brain activity and the layers of a transformer. I know that Wang et. al (2024) have done it last year[1], but I doubt they're the only ones - I just have them in my head because I was reading their paper last week. Bonus fact: Wang et. al's paper also shows that test scores are a relevant factor in said correlation.
The point that always comes to mind is: correlation does not imply causation. I guess the main contribution would be a better mapping of the areas of the brain associated with speech production, but jumping from "these two things correlate" to "these two things are essentially the same" seems to me a bit of a stretch.
It is somewhat ironic that they had to use an OpenAI model for this research. At the same time, this gives nice continuity from earlier works that demonstrated similar, smaller scale, results using GPT-2.
So its neuronal activity from intercranial electrodes, during an active conversation. And, they found there are causal chains type patterns in the neuronal activity to produce the speech (and presumed thought) in the conversation which compare "favourably" with the LLM.
Ok. I buy it. The sequencing necessary to translate thought to words, necessarily imposes a serialisation which in consequence marshalls activity into a sequence, which in turn matches the observed statistically derived LLM sequences.
I tend to say the same things. I often say "this AGI is bullshit" and the ocurrence of Bullshit after the acronym AGI is high. I would be totally unsurprised if the linear sequence of neuronal signalling to both think, and emote as speech or even "charades" physical movements to say "AGI is bullshit" would not in some way mimic that of an LLM, or vice versa.
ok, that pretty cool research from Google, hope this leads to even more discoveries around the brain, hopefully it's time we get a better understanding of our brains and how to hack them.
Due to my current condition. I feel that I could do more both for myself and the world but unfortunately motivation plays a big role or otherwise I have to trick myself into feeling stressed in order to do things like work that might be boring or feeling observer.
So many reasons: absorb information faster; improve spatial visualization; motivation, intrinsic motivation hacking ; simulations...etc
Give me the code to my brain and let me edit it, with version control please :D
meditation, if you want to try a drug free approach.
Make it simple. Stare at a clock with a big second hand. Take one breath every 15 seconds. Then, after a minute or so, push it out to 20 seconds, then one every 30 seconds.
For the 30, my pattern tends to stabilize on inhale for 5-7, hold for 5-7, and then a slow exhale. I find that after the first exhale, if I give a little push I can get more air out of my lungs.
Do this once a day, 7-10 minutes session, for a week, and see if things aren't a little different.
Brains are already hackable in multiple senses. Namely through exogenous chemicals, electrical stimulation, hypnosis, and combinations of these. They aren’t necessarily reverse engineerable, which is what computational models like LLM-tied ones would enable, but they are hackable. We have the Cold War to thank for that.
You're mostly driven by bodily conditions and hormones. A computer recording of you isn't going to behave the same because it has no particular motivation to behave any specific way in the first place.
To be noted, if you accept a brain upload made gradually you should also accept a brain upload made discontinuously. If the same brain state comes out, the process can't actually matter.
Responses here however seem not commensurate with the evidence presented. Two of the papers[0][1] that provide the sources for the illustration in the blog post are about research conducted on a very small group of subjects. They measure neural activity when listening to a 30 minutes podcast (5000 words). Participants tried to guess next words. All the talk about "brain embedding" is derived from interpreting neuronal activity and sensor data geometrically. It is all very contrived.
Very interesting stuff from a neuroscience, linguistics and machine learning perspective. But I will quote from the conclusion of one of the papers[1]: "Unlike humans, DLMs (deep language models) cannot think, understand or generate new meaningful ideas by integrating prior knowledge. They simply echo the statistics of their input"
[0] Alignment of brain embeddings and artificial contextual embeddings in natural language points to common geometric patterns (https://www.nature.com/articles/s41467-024-46631-y)
[1] Shared computational principles for language processing in humans and deep language models (https://www.nature.com/articles/s41593-022-01026-4)
They mention a profound difference in the opening paragraph, "Large language models do not depend on symbolic parts of speech or syntactic rules". Human language models very obviously and evidently do. On that basis alone, it can't be valid to just assume that a human "embedding" is equivalent to an LLM "embedding", for input or output.
If there were no such structure, then their methods based on aligning neural embeddings with brain "embeddings" (really just vectors of electrode values or voxel activations) would not work.
> They mention a profound difference in the opening paragraph, "Large language models do not depend on symbolic parts of speech or syntactic rules". Human language models very obviously and evidently do. On that basis alone, it can't be valid to just assume that a human "embedding" is equivalent to an LLM "embedding", for input or output.
This feels like "it doesn't work the way I thought it would, so it must be wrong."
I think actually their point here is mistaken for another reason: there's good reason to think that LLMs do end up implicitly representing abstract parts of speech and syntactic rules in their embedding spaces.
Honestly do they ? To me, they clearly don't. Grammar is not how language works. It's useful fiction. Language even in humans seems to be a very statistical process.
This is something that Chomsky got very wrong, and the statistical/ML crowd got very right.
But still, grammars are a very useful model.
Statistics are just another way to record a grammar, all the way down to the detail of how one talks about bicycles, or the Dirty War in Argentina.
If a grammar is defined as a book that enumerates the rules of a language, then of course language doesn't require following a grammar. If a grammar is defined as a set of rules for communicating reasonably well with another person who knows those same rules, then language follows grammars.
But it's the other way around! Grammars follow languages. Or, more precisely, grammars are (very lossy) language models.
They describe typical expectations of an average language speaker. Grammars try to provide a generalized system describing an average case.
I prefer to think of languages as a set of typical idioms used by most language users. A given grammar is an attempt to catch similarities between idioms within the set and turn 'em into a formal description.
A grammar might help with studying a language, and speed up the process of internalizing idioms, but the final learning stage is a set of things students use in certain situations aka idioms. And that's it.
> Statistics are just another way to record a grammar
I almost agree.
But it should be "record a language". These are two approaches to the problem of modeling human languages.
Grammars are an OK model. Statistical models are less useful to us humans but given the right amount of compute they do show much better (see LLMs).
In my view, this phrase is only unambiguous to those who feel the preposition tradition, and all the heavy lifting is done here by “mit” (and “durch” in the opposite case, if one wants to make it clear). Articles are irrelevant and are dictated by the verb and the preposition, whose requirements are sort of arbitrary (sehen Akk., mit Dat.) and fixed. There’s no article-controlled variation that could change meaning, to my knowkedge it would be simply incorrect.
I’m also quite rusty on Deutsch, aber habe es nicht völlig vergessen, it seems.
My German is pretty rusty, why exactly is it unambiguous?
I don't see how changing the noun would make a difference. "Ich sehe" followed by any of these: "den Mann mit dem Fernglas", "die Frau mit dem Fernglas", "das Mädchen mit dem Fernglas" sounds equally ambiguous to me.
The first 5 minutes of this video do good job of explaining what i'm getting at - https://www.youtube.com/watch?v=YNJDH0eogAw
You would also think emphasizing grammar's usefulness would make it plain that I do not think it is a waste of time.
I think any reasonable scientist would a-priori react the same way to these claims as claims that neural networks alone can possibly crack human intuition: “that sounds like sci-fi speculation at best”. But that’s the crazy world we live in…
Is there some theorem stating something like random few-hot vectors can always be combined linearly to match any signal with a low p-value?
I thought I encountered it sometimes in my experiments and that this might be happening in this llm x neuroscience trend of matching llm internals to brain signals.
* A linear transformation of a speech encoder's embeddings closely aligns them with patterns of neural activity in the brain's speech areas in response to the same speech sample.
* A linear transformation of a language decoder's embeddings closely aligns them with patterns of neural activity in the brain's language areas in response to the same language sample.
Going a bit further, I'll speculate that the actions made by a human brain are simply a function of the "input" from our ~5 senses combined with our memory (obviously there are complications such as spinal reflexes, but I don't think those affect my main point). Neural nets are universal function approximators, so can't a sufficiently large neural net approximate a full human brain? In that case, is there any merit to saying that a human "understands" something in a way that a neural net doesn't? There's obviously a huge gap between the two right now, but I don't see any fundamental difference besides "consciousness" which is not well defined to begin with.
What is your basis for this? Do you have any evidence or expertise in neuroscience to be able to make this claim?
> Neural nets are universal function approximators, so can't a sufficiently large neural net approximate a full human brain?
We do not understand the brain well enough to make this claim.
> but I don't see any fundamental difference besides "consciousness" which is not well defined to begin with.
Yeah besides the gaping hole in our current understanding of neuroscience, you have some good points I guess.
I philosophically reject the notion that consciousness is an important factor here. The question of whether or not you have a consciousness doesn't affect what I take away from this conversation, and similarly the question of whether an AI has a consciousness doesn't affect what I take away from my actions with it. If the (non-)existence of others' consciousnesses doesn't materially affect my life—and we assume that it's a fundamentally unanswerable question—why should I care other than curiosity?
That doesn't remotely follow from the UAT and is also almost certainly false.
I believe the same, and also I'm willing to accept that the human brain can intentionally operate in an stochastic parrot mode.
Some people have the ability to fluently speak non-stop, completely impromptu. I wonder if it's similar to an LLM pipeline, where there'a constant stream of thoughts being generated based on very recent context, which are then passed through various output filters.
More evidence against "stochastic parrots"
- zero shot translation, where LLMs can translate between unseen pairs of languages
- repeated sampling of responses from the same prompt - which shows diversity of expression with convergence of semantics
- reasoning models - solving problems
But my main critique is that they are better seen as pianos, not parrots. Pianos don't make music, but we do. And we play the LLMs on the keyboard like regular pianos.
(My utterly uninformed knee-jerk reaction here, but even if I was a true believer I don't think I'd reach for "compelling".)
if you add random rolls, you get a gaussian, thanks central limit theorem.
if you sum them, you get a lognormal distribution, which approximates a power law up to a cutoff
I see this as maybe it’s not a statistical parrot, but it’s still only some kind of parrot, Maybe a sleep deprived one.
Language processing is thinking as far the brain is concerned and there's no evidence that these are 2 cleanly separated processes whether you 'think' in words or not.
For example, in difficult perceptual tasks ("can you taste which of these three biscuits is different" [one biscuit is made with slightly less sugar]), a correlation of 0.3 is commonplace and considered an appropriate amount of annotator agreement to make decisions.
Not here though, where you are trying to prove a (near) equivalence.
We have a closed system that we designed to operate in a way that is similar to our limited understanding of how a portion of the brain works, based on how we would model that part of the brain if it had to traverse an nth-dimensional array. We have loosely observed it working in a way that could roughly be defined as similar to our limited understanding of how a portion of the brain works given that limitation that we know is not true of the human brain, with a fairly low confidence level.
Even if you put an extreme level of faith into those very subpar conclusions and take them to be rigid... That does not make it actually similar to the human brain, or any kind of brain at all.
The point that always comes to mind is: correlation does not imply causation. I guess the main contribution would be a better mapping of the areas of the brain associated with speech production, but jumping from "these two things correlate" to "these two things are essentially the same" seems to me a bit of a stretch.
[1] https://arxiv.org/pdf/2407.10376
Ok. I buy it. The sequencing necessary to translate thought to words, necessarily imposes a serialisation which in consequence marshalls activity into a sequence, which in turn matches the observed statistically derived LLM sequences.
I tend to say the same things. I often say "this AGI is bullshit" and the ocurrence of Bullshit after the acronym AGI is high. I would be totally unsurprised if the linear sequence of neuronal signalling to both think, and emote as speech or even "charades" physical movements to say "AGI is bullshit" would not in some way mimic that of an LLM, or vice versa.
So many reasons: absorb information faster; improve spatial visualization; motivation, intrinsic motivation hacking ; simulations...etc
Give me the code to my brain and let me edit it, with version control please :D
Make it simple. Stare at a clock with a big second hand. Take one breath every 15 seconds. Then, after a minute or so, push it out to 20 seconds, then one every 30 seconds.
For the 30, my pattern tends to stabilize on inhale for 5-7, hold for 5-7, and then a slow exhale. I find that after the first exhale, if I give a little push I can get more air out of my lungs.
Do this once a day, 7-10 minutes session, for a week, and see if things aren't a little different.
Nootropics may help with stimulation, as well as memory and cognition in general. There is a whole community dedicated to it with effective stacks.