Does GPT-4 Really Understand What We’re Saying?

One question for David Krakauer, president of the Sante Fe Institute for complexity science where he explores the evolution of intelligence and stupidity on Earth.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

In Body Image — *Photo courtesy of David Krakauer*

Does GPT-4 really understand what we’re saying?

Yes and no,” is the answer to that. In my new paper with computer scientist Melanie Mitchell, we surveyed AI researchers on the idea that large pretrained language models, like GPT-4, can understand language. When they say these models understand us, or that they don’t, it’s not clear that we’re agreeing on our concept of understanding. When Claude Shannon was inventing information theory, he made it very clear that the part of information he was interested in was communication, not meaning: You can have two messages that are equally informative, with one having loads of meaning and the other none.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

There’s a kind of understanding which is just coordination. For example, I could say, “Can you pass me the spoon?” And you’d say, “Here it is.” And I say, “Well, you understood me!” Because we coordinated. That’s not the same as generative or constructive understanding, where I say to you, “I’m going to teach you some calculus, and you get to use that knowledge on a problem that I haven’t yet told you about.” That goes beyond coordination. It’s like: Here’s the math—now apply it in your life.

So understanding, like information, has several meanings—more or less demanding. Do these language models coordinate on a shared meaning with us? Yes. Do they understand in this constructive sense? Probably not.

I’d make a big distinction between super-functional and intelligent. Let me use the following analogy: No one would say that a car runs faster than a human. They would say that a car can move faster on an even surface than a human. So it can complete a function more effectively, but it’s not running faster than a human. One of the questions here is whether we’re using “intelligence” in a consistent fashion. I don’t think we are.

They have to survive by convincing us they’re interesting to read.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

The other dimension to this for me is what I call the standard model of science, which is parsimony. Versus the machine learning model of science, which is megamony. Parsimony says, “I’m going to try and explain as much as I can, with as few resources as possible.” That means few parameters, or a small number of laws or initial conditions. The root of parsimony, by the way, is frugal action. And these language models are exactly the opposite of that. They’re massive action. It’s trained on a huge but very restricted domain: text. Most human understanding maps textual experience onto somatosensory experience. So if I say to you, “That’s painful,” you don’t map that onto another word. You map that onto a sensation. Most human insight is conveying, through tags or words, experience that has an emotional or physical aspect. And GPT-4 is just finding correlations across words.

Another dimension of difference, which is very important, is that our cognitive apparatus evolved. The environment that created the attributes that we describe as intelligence and understanding is very rich. Now look at these algorithms. What is their selective context? Us. We are cultural selection on algorithms. They don’t have to survive in the world. They have to survive by convincing us they’re interesting to read. That evolutionary process that’s taken place in the training of the algorithm is so radically different from what it means to survive in the physical world—and that’s another clue. In order to reach even remotely plausible levels of human competence, the training set that these algorithms have been presented with exceed what any human being in a nearly-infinite number of lifetimes would ever experience. So we know we can’t be doing the same thing. It’s just not possible. These things are Babel algorithms. They live in the land of Borges’ Library of Babel. They have the complete experience. They have access to all knowledge in the library. We do not.

The other fact to point out, apart from all the stuff that we adduce in the paper—brittle errors, mistakes the language models make that are telltale signs that we would never make—is humans do apply a mechanical reasoning to things. If I said to you, “There’s a trolley rolling down the hill, and there’s a cat in its path. What happens next?” You’re not just trying to estimate the next word, like GPT-4. You’re forming in your mind’s eye a little mental, physical model of that reality. And you’re going to turn around to me and say, “Well, is it a smooth surface? Is there lots of friction? What kind of wheels does it have?” Your language sits on top of a physics model. That’s how you reason through that narrative. This thing does not. It doesn’t have a physics model.

Now, the interesting point is perhaps in all of that richness of data, if we were ingenious enough, we could find a physics model. It’s tacit, implicit in its vast language database, but it doesn’t access it. You could say to it, “What physics model is behind the decision you’re now making?” And it would now confabulate. So the narrative that says we’ve rediscovered human reasoning is so misguided in so many ways. Just demonstrably false. That can’t be the way to go.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

Lead image: studiostoks / Shutterstock

Brian Gallagher

Posted on March 27, 2023

Brian Gallagher is an associate editor at Nautilus. Follow him on X (formerly known as Twitter) @bsgallagher.