May 25, 2025 Current Resources

Ways LLMs Do and Don't Seem Like the Human Brain

This is largely a repost, with a hopefully more compelling introduction.

For about a year, I've thought a lot about ways LLMs do and don't seem comparable to the human brain. It's obvious that LLMs aren't exactly like the human brain, because there's no way to just hook one up to a spinal cord and get something that competently pilots a human body. But also, it's very suspicious that that training LLMs on language corpuses eventually causes them to cognitively respond to it in ways deeply similar to how human beings do. Between this and other obvious comparisons, such as those between biological and digital neurons, it's hard not to wonder if brains and LLMs tap into some of the same basic cognitive strategies.

I'm going to explore some much more fine-grained similarities and differences I believe I've noticed between the two systems over the course of this post. The reason I'm doing this is that I think it's potentially quite helpful for thinking about human psychology, AI research in general, and alignment research in particular. For example, there turn out to be connections between, the concept of "agency" as in agent foundations, and agency as in "you can just do things," which can inform our intuitions about developing human and AI agency alike.

Other comparisons can be clarified through this frame as well, such as the connection between RLHF and human socialization processes, or our notion of what it means for humans and LLMs alike to be "deceptive," either to themselves or to each other.

But a lot of those juicy details will have to wait for other posts. This time, I'm just going to go through a systematic rundown of the architectural similarities and differences between the two systems. Hopefully we can get into more behavioral comparisons in later posts. That said, let's get into it.

Similarities

Probably the most important claim to establish up front is that I think LLMs are specifically a lot like the human cortex. One reason to suspect this is that both seem to be "trained" using highly general learning algorithms. This is a hypothesis some LessWrong users have been entertaining for nearly a decade now, on the basis of observations like the simple, uniform architecture that pervades the human cortex. This post additionally points out that the cortex seems to adapt to process whatever information is actually being fed into it; i.e., blind people don't develop visual regions of their cortices, and when one region of the cortex is killed off, its functions are taken over by other regions of the cortex instead.

This observation resonates with now the algorithms we use to train artificial neural networks, namely backpropagation and gradient descent, are also capable of teaching systems an extremely wide variety of behaviors, given the right loss function and training data. Indeed, systems like LLMs are effectively massive virtual webs of neurons which are trained to appropriately process whatever data is fed into them by a single, highly general learning algorithm. So that's one set of reasons to suspect important similarities with the human cortex.

Another relevant datapoint is that, in both humans and LLMs, merely increasing the scale of the neural network seems to yield significantly improved performance. Gwern made this observation in his classic post The Scaling Hypothesis. It's now empirically obvious that simply scaling up LLMs makes them capable of much higher performance, and intriguingly, this at least superficially aligns with how human brains are in significant part just scaled up monkey brains. The human cortex is in fact significantly larger than that of any other primate, and practical constraints on head size are often considered a primary bottleneck that kept human intelligence from scaling any further over the course of evolution.

One can't rule out more basic algorithmic improvements, as potential contributors to humanity's comparative intelligence. Indeed, humans do seem vastly capable of combining words into complex sentences than most (though not all) species of monkeys, although monkeys are known to be able to understand and correctly use single sign language gestures just fine. It's possible that, as we were evolving from our ancestors, our brains made an algorithmic innovation similar to that made by transformers, which similarly enabled the different words in a sentence to efficiently contextualize each other. And of course, besides that, humans also developed tongues and mouths which were better for articulating verbal language as well.

Nevertheless, the relative scale of the human cortex is at least evidence that it's running a universal learning algorithm that makes effective use of whatever neural resources it's allocated, which would be another similarity with modern deep learning systems.

There's also some evidence that the specifics of the learning algorithms each system runs are quite similar. For example, LLM chat models are trained using reinforcement learning, as a way of converting them from next-word predictors into chatbots. Reinforcement learning also seems to be a major component of how humans acquire their values. Indeed, the concept of reinforcement actually had its origins in human psychology, where scientists were trying to pin down the details of the intuitively obvious process whereby humans learn to pursue what they find pleasurable and avoid what they find aversive.

(The fact that we learn to avoid thinking certain painful thoughts is evidence that this RL process applies to our higher-level cognition as well, which suggests involvement from the cortex.)

Somewhat more speculatively, it's possible that concurrently with this RL process, humans also implement something like predictive learning; this process would in some ways be analogous to the learning method used for pretraining an LLM base model. This is basically the thesis of the predictive processing faction in cognitive science: the ways that humans update their models of the world is by trying to predict their own incoming sense data, and improving their models based on the failures of those predictions. Here's a Scott Alexander post where he goes over some evidence for this hypothesis from cognitive science.^[1]

It's at the very least intuitively obvious that humans have some kind of mechanism for automatically and appropriately updating their beliefs when they make surprising observations. If it were to turn out that there was a barrier around the Earth preventing anypony from leaving, like in Unsong, people would be shocked by this observation, but they'd also pretty quickly adjust their expectations for space travel as well as their cosmological theories in light of the new evidence.

My laymen's impression is that the neurological evidence on predictive processing theory is still fairly modest. But either way, the simple fact that LLMs can use very similar algorithms to power predictive learning and reinforcement learning is evidence as well, since it suggests that if the cortex has the capacity to learn from one kind of feedback (RL), it might not be hard to teach it using another (predictive learning).

The last and most speculative comparison I want to make is that humans may have something computationally analogous to a "context window", a record of recently experienced sensations which effectively gets fed into the "hidden layers" of the cortex to produce thoughts and actions. Some everyday observations support this: people talk about "context-switching" when moving from one task to another,^[2] and often stumble through a complex task until they've loaded enough of its details back into their ~[working memory].

This would effectively mean that humans, like LLMs, actually have two main ways their outputs can be conditioned: the information they've been trained on historically, and the information they're currently being "prompted with". Notably, this also maps onto an existing intuition people have about human psychology: the contrast between crystalized and fluid intelligence. Crystalized intelligence has to do with competencies one could access just by prompting humans with the right situations or sense data; they don't need to "learn anything new" to access it. Fluid intelligence is more like updates to the weights of a neural network itself, the sample efficiency of a human's "training process" for learning some new skill or piece of information from scratch. Again, this points to humans having a "context window" that's algorithmically distinct from the synaptic "weights" in their cortex.

Anyway, so much for architectural similarities.

Differences

I'll start by quickly noting that I doubt the specific "universal learning algorithm" the cortex uses involves backprop in particular (backprop being an algorithm used to train neural networks). I can't think of a clear theoretical reason backprop would lead to the kind of regional specialization we see in the cortex, and I'm not aware of empirical evidence that e.g. multimodal ANNs develop different "lobes" for, e.g., processing images vs. text. If the universal learning hypothesis is true, the learning algorithm in question needs to somehow give rise to the local specialties we see in the human cortex in practice. I only have vague guesses about what the algorithm in question could be and they don't merit attention in a summary post like this.

Anyway, let's move onto more concrete differences between current LLMs and the human cortex. One such difference is that humans continue to learn over the courses of their lifetimes. I've heard this termed as neuroplasticity, which current LLMs lack insofar as their weights are frozen during deployment. You can still in some sense "teach" a deployed LLM new things by feeding information it wasn't previously aware of into its context window, but this is transient and perhaps somewhat different in terms of what information the model is capable of gleaning, relative to if the model was trained on this information. By contrast, the (synaptic) "weights" in the "hidden layers" (the cortex) of the human brain are able to be edited in deployment; something similar for LLMs would probably help them deal better with problem domains that weren't present in their training data, thereby giving the LLM + learning system more generalizable capabilities than the LLM alone.

Another learning-related difference is that, whereas LLMs are generally trained via predictive learning and then reinforcement learning, the cortex apparently undergoes RL and (whatever its equivalent is to) PL simultaneously. If the PL process is implemented by means of literal next-sensation prediction, this means the brain probably needs at least two distinct streams of outputs coming out of the cortex: predictions of incoming sense data, and commands for action. I don't see any reason you couldn't build an artificial neural network like this; indeed, it might even let you teach a model new information as effectively as predictive learning teaches it, while it nevertheless maintains its chat output capabilities. But this predictive output/chatbot output duality isn't how most LLMs are currently built, as far as I'm aware.

Even if the cortex doesn't truly "output predictions" to compare against real sense data, though, there's also another output channel it likely has, besides raw commands for motor outputs. Namely, imagined sense data can be feed back into the equivalent to our context windows, and used to prompt further thoughts. This can include the inner monologue, mental images, imagined tactile sensations, or whatever else.

Modern reasoning models do have something akin to this in the form of the chain of thought. However, unlike the chain of thought, the human imagination is running in full concurrency with its non-CoT outputs. Additionally, even to the extent that our imagined thoughts are verbal, they don't always seem to exhibit the linear, grammatically coherent properties we've imposed onto LLM chains-of-thought. It's more like a rapid stream of associative impressions, likely still trained in large part by a mix of RL and PL, but not precisely akin to chains of thought in modern reasoning models.

(I imagine this has some effect on how humans vs. LLMs describe their inner states. Humans are constantly being "reminded" of things they associate with the contents of their context window, loading those associated sensations into context. I would guess this is part of where we get our sense that any given token of experience has a unique "flavor", qualia, or "something-that-it's-like" associated with it. But that's a huge conversation for another post.)

On the topic of the structure of our respective context windows, it's worth noting that the human cortex is obviously much more multi-modal than a pure LLM. Our experiences are "tokenized" in terms of different types of inputs corresponding to each of our senses, none of which are uniquely specialized for taking in language in the way that LLM tokenization schemes are. (This is related to how we can learn to interpret information from basically any sensory modality as language, from writing to speech to something like braille.) Still, many frontier models have been trained to make sense of both text inputs and visuals, which at least confirms that this algorithmic between LLMs and the cortex doesn't strictly imply more fundamental differences elsewhere.

Another unique property of the human context window is that we seem to have unique "tokens" which get placed into context whenever we're undergoing positive and negative reinforcement, tokens we call pleasure and pain. It's not that current LLMs can't behaviorally simulate people with the pleasure or pain tokens in their context, but humans seem to have dedicated sensations for it. These pleasure/pain tokens make us aware of when we're being positively and negatively reinforced, and prompt us to react accordingly.

This model of pleasure and pain has implications for questions like the ethics of training models with gradient descent. Gradient descent is a kind of negative feedback. Unlike with pain in humans, though, it doesn't make the model "aware" that it's being negatively reinforced. I think this fact probably makes training models less inherently immoral, although the act of altering a system's values at all might still be considered questionable under certain frameworks, regardless of whether pain is inflicted.^[3] But once again we're brushing up against the scope of this post...

There are several other algorithmic differences I'm tempted to speculate about, but I'll settle for just one more. Currently, LLMs are trained with just predictive learning and reinforcement learning. However, humans seem to have one more learning paradigm on top of this, which I call reflective learning. It seems like we have the ability to update our conceptual webs, or the configuration of the "hidden layers" of our corticies, just by thinking. For examples, consider the act of turning a math problem over in your mind until you make the critical insight (a la grokking), or building up a concept of fictional setting without ever writing any of your ideas down.

I don't know what kind of training objective the brain would use to implement this kind of learning. I don't even know if a "training objective" is the right way to think about the process by which it makes these updates; it could be something more Hebbian. But this does seem important for "reaching true AGI"; it seems like a method that enables humans to continue learning without actually encountering any new training data, albeit more slowly due to typical empiricism > theorizing dynamics. This could even be a core way humans develop novel insights via research over time, in contrast to LLMs which have failed to make almost any big, original insights whatsoever despite having been trained on the entire corpus of human knowledge. So I wish more people were working on this problem.

The last point I'll make is that the cortex is obviously just one part of the human brain, and it's embedded in something akin to an "agent framework" that makes the cortex's cognitive prowess actually useful. For example, external to the cortex is the cerebellum, which seems to run a different algorithm for learning muscle memory in particular; probably it takes something like "action suggestions" or short-term goals from other parts of the brain and implements them in detail, before actually commanding the body to do its thing.^[4] There are also regions like the medulla which regulate things like heartrate and salivation more or less unconsciously.

I wonder if LLMs have basically cracked the a large chunk of the algorithm behind abstract human cognition, but many other insights are needed for actually piloting something like a human body effectively; hence robotics not having been totally revolutionized by LLMs...

Conclusion

I'm aware that that was probably A Lot. Indeed, I once wrote a post that was literally over six times longer than this one to explain these thoughts in more detail, but it was unreadable and a waste of my time and I'm kind of averse to trying to explain these ideas at a lower level of density as a result. But if you couldn't tell, I've been kind of obsessed with this compare-contrast project for a long time at this point; it's the primary frame I use when trying to generate insights into human psychology, "AGI" development, and AI alignment.

My instinct is that it might enable cross-pollination of insights between fields like self-improvement and agency development for human beings, and recursive self-improvement and agent foundations for AI research. Or for another angle The recent fixation of big AI labs^[5] on the "personalities" of their models also seems striking, since "personality" is an extremely anthropomorphic concept.

But again, I'll save my thoughts on these topics for later posts. I hope you've found this one insightful.

^{^}
Scott himself has discussed the connection between base models and predictive processing theory here.
^{^}
"Context-switching" actually hails from operating system jargon, not ML jargon, but the latter metaphor works even better.
^{^}
^{^}
This is probably why muscle memory feels so much less "conscious" than other kinds of memory. Most of our intelligence routes through a single, highly general algorithm in the cortex, and that's what we're conscious of. Not so for muscle memory in particular.
^{^}
Not to mention Janus, who provided the game-changing simulators framework for base models, and who has been tending to use the personality framework for chat models more and more often ever since the Claude 3 series dropped...