OpenAI, ChatGPT, LLM, AGI: Paradox Of Understanding Language

Exploring the meaning of language, understanding, and intelligence.

Published in

Towards AI

6 min readDec 27, 2022

Everyone has been riding the hype around the utility of large language Models. So many outrageous claims have been made that “ChatGPT will kill these jobs” or “ChatGPT will change the way we interact” etc. On the flip side, there has also been a lot of criticism of the text generated by these models, mainly coming from the point of view that these are mere probabilistic representations of the next word or sentence. These models have produced outputs that sound good but are completely jibberish or are the stochastic walks of the probability weight matrix. LLMs like GPT-3, PaLM, and BERT are created to explore and recreate the general structure of the language while maintaining the general knowledge as the context. I must say that these models have achieved that goal damn well.

Does that mean the model has understood an input?

Short answer, No. However, what it does mean is that the model has recognized a general context of the input sentence in the context of the previously observed distribution of other sentences around it and created a general enough output that fits the historical context. This is likely what the G (General) in the AGI (Artificial General Intelligence) is. Still, I am not comfortable calling it Intelligence. To call an entity/process/flow intelligence, we probably need to go on a philosophical journey of answering “what is intelligence”.

According to Oxford, Artificial intelligence is “the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.” Human intelligence is defined as “the ability to learn, understand and think in a logical way about things; the ability to do this well”. This definition, however, is not that operational. To make it operational, let's build it over the lesser complex construct of “Understanding”. We could say that process of understanding is learning. Now that we are defining both “understanding” and “learning” we have a good cover over the expanse of the complexity of the definition of human intelligence.

The Logical Construct of Understanding

My hypothesis is that understanding must be more than the meaning of words. Walk this logic with me -

A person has the intent to describe or express their thoughts. Words express that intent of describing the expanse of the thought. However, there is a limited time and ability to represent all the details based on the chosen words. Mathematically speaking, one could say that words represent the lower dimensional representation of the intent of expressing all that needs to be expressed. In a good conversation, this lower dimensional representation is sufficient to express the thought. However, by definition, the lower dimensional representation comes at the cost of loss of information. Therefore, a spoken, written, or expressed language inherently has an information loss in a two-way conversation.

Further, these expressed thoughts are interpreted by a person listening to the words or reading the words (the lower dimensional representation) in the context of past and current situations that they have experienced. Let’s call the situation of the interpreter the situation of the receiver. That means the interpretation of the lower dimensional representation is like bayesian posterior, where the condition is the situation of the receiver.

For the understanding to be complete or comparable to human communication, the lower dimensional expression of the thought needs to retain enough information, and the situation of both the emitter and receiver need to be similar enough to avoid excessive distortion. In other words, it needs to be similar enough not to increase the information loss. This means that true understanding needs input from the receivers to provide the answers using the right context.

Intelligence

Coming back to the concept of Intelligence is an even higher-order task. It is a generalized ability to understand. In an applied manner, its the function of the appropriateness of the output of a General language model in a given situation. The way I would describe it, there is not enough reason to test the intelligence of a model until we can sort out the question of integrating the context of the receiver, which is a lot closer to the construct of Understanding.

Is the Understanding Universal?

LLMs are able to encode and decode the language in a way that preserves the generalized context and the structure of the language. As we discussed earlier, the context is borrowed from the textual representation of the context of the surrounding data, where multi-head attention is used to approximate the meaning of the prompt. We also discussed that the representation of the “real-life environment” (truth) is only partially captured in the textual context, which limits the ability of text-only LLM to resolve a good response to a prompt.

Going one step ahead, we know that every person has a unique set of environmental representations. Those who live in cities respond to the same question in a different way than those who live in rural areas. However, the answers by both of them are appropriate from their own contextual point of view. Another example is those who are compassionate vs. those who lack compassion/empathy react to the same situations differently. However, from their own point of view, they are both correct. This is where we start discussing the boundaries of subjective and objective AGI. An AGI is likely to be more objectively appropriate than subjectively appropriate without non-textual context.

I can say that the “process” of understanding is universal. However, the “outcome” of understanding is subjective. There is a fine boundary between subjective and objective inference. Training on too much personalized environmental (non-textual) information may increase subjective accuracy but lose objective generalizability. This increases the onus of being correct while within the context of objectivity. Otherwise, it will likely be an echo chamber where the model produces outputs that a user wants to receive.

The Paradox of Understanding Language

So, here we are.. the “language” provides the means to express the output of understanding. The “understanding” is the process (encoding reading/listening + adding environmental context) of generating a thought (encoded) representation of the output. The “language” is used to decode the output of understanding into another presentation that fits the construct of language. These words are spoken/written to express the thought/understanding.

Although the LLM may be able to complete the task of encoding and decoding outputs of understanding. It may also be able to inject limited context carried from the historical use of similar text into the decoding process. It will likely just produce a generalized output that sounds good but is likely not operational. To complete the definition of Intelligence, the LLM needs to have the ability to explore out-of-the-bounds of training space. Until then, although the LLM can regurge the definition of “Understanding”, it won't be able to define it.