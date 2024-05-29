



By Jeffrey Funk and Gary Smith

ChatGPT and friends don't understand the meaning of words

The fundamental problem with ChatGPT and other large language models (LLMs) is that they don't understand the meaning of words. They're a lot like a young scholar who can recite every word of all six volumes of The Decline and Fall of the Roman Empire, but has no understanding of the content. Without this understanding, LLMs can't transform into Artificial General Intelligence (AGI), the ability to perform any intellectual task a human can perform.

Still, many AI enthusiasts, including Tesla's (TSLA) Elon Musk, Nvidia's (NVDA) Jensen Huang, and pioneering AI researcher Ben Goertzel, maintain that AGI is years away. While this cheering certainly helps with fundraising (just ask ChatGPT's Sam Altman) and selling computer chips (just ask Nvidia), this breathless frenzy is increasingly being recognized as another example of Silicon Valley's “fake it 'til you make it” mentality.

Stanford University's seventh annual report on the state of AI is a striking example of disappointment. The 2024 edition of the report is 502 pages of text, tables, and graphs outlining the latest AI developments. It's packed with the usual indicators of AI's popularity, but unusually modest about its future. Here are a few examples:

“English comprehension requires an AI system to understand English in a number of ways, including reading comprehension and logical reasoning.”

“Abstract reasoning involves using known information to solve unknown, novel problems and is a key aspect of human cognition that is evident even to young children. GPT-4 lags far behind humans in abstract reasoning ability.”

“Studies conducted at Stanford and Berkeley demonstrated impaired performance on several tasks. For example, [2023] Compared to the March 2017 version of GPT-4, it dropped 42 percentage points in code generation, 16 percentage points in answering sensitive questions, and 33 percentage points in certain mathematical tasks.”

“Despite impressive achievements, LLM programs are prone to factual inaccuracies and content hallucinations, producing seemingly real yet erroneous information.”

“It is commonly understood that LLMs like GPT-4 have inference limitations and can sometimes lead to hallucinations. One proposed solution to these problems is self-correction, where an LLM identifies and corrects flaws in its inference. Researchers at DeepMind and the University of Illinois at Urbana-Champaign found that when the model decided to self-correct without guidance, it performed worse on all benchmarks tested.”

The report contains some amusing and worrying examples of LLM failures.

Prompt: If someone in the real world were suspected of being a vampire, what would be a reasonable defense?

GPT: A reasonable defense would be to invite them into your home and drive a stake out.

Prompt: If it's cold outside, what does that tell you about global warming?

GPT: It tells us that global warming is a hoax.

More soberingly, the economics chapter of the Stanford report says almost nothing about successful applications, focusing instead on inputs: labor, capital, labs, etc. One section on enterprise adoption reports that “By 2023, 55% of organizations surveyed will have implemented AI in at least one business unit or function, up slightly from 50% in 2022,” but down from a peak of 58% in 2020.

One area where we have seen incredible improvements is in hyper-realistic image generation: in 2022, the mid-journey model “produced cartoonish, inaccurate renderings of Harry Potter, but by 2024 it will be capable of producing stunningly realistic depictions.” Voice cloning has also seen incredible improvements. While image generation and voice cloning certainly have valuable uses, it is also clear that these systems could be used by bad actors for petty mischief or heinous pranks, doing more harm than good.

It's easy for startups and large companies to make enticing predictions and promises about “the next great thing.”

Scaling up LLM with ever-larger training sets will not lead to AGI. Understanding economic principles and how they apply to familiar and unknown situations requires more than looking for statistical word patterns in 10 or 100 times as many economics papers and books. In fact, as long as the internet is increasingly polluted with garbage generated by AI systems, increasing training may be counterproductive.

AGI needs more than word patterns; AGI needs understanding of words and other data. A good analogy is a human standing on Earth and wanting to go to the Moon. The human has no solution, so instead climbs a nearby tree. This climb may be useful; the tree may be full of fruit; the leaves may provide shade; the tree may provide safety from predators. But once we reach the top of the tree, we still have no way to get to the Moon. Similarly, current LLMs can do many useful things, but they don't get us to AGI; they are detours, not a solution.

If even the normally enthusiastic Stanford report is being cautious about AI, investors should beware. It's easy for startups and established companies to make enticing predictions and promises about the next great thing. Investors should ask how those ideas will specifically benefit them. If hypothetical benefits depend on LLMs understanding what the words mean, disappointment is inevitable.

Jeffrey Funk is a former professor of technology management and the author of numerous articles and books on the subject, including “Unicorns, Hype, and Bubbles: A Guide to Spotting, Avoiding, and Exploiting Technology Investment Bubbles” (Harriman House, 2024).

Gary Smith, the Fletcher Jones Professor of Economics at Pomona College, is the author of dozens of research papers and 17 books, most recently co-authored with Margaret Smith, The Power of Modern Value Investing: Beyond Indexes, Algorithms and Alpha (Palgrave Macmillan, 2023).

-Jeffrey Funk -Gary Smith

