Large Language Models Keep Getting Larger: What Surprises Will GPT-4 Bring?

Large Language Models Keep Getting Larger: What Surprises Will GPT-4 Bring?

The sophistication of language models has increased exponentially over the last two years. Language models are AI systems trained on text to predict the next word in a sentence or an occluded phrase. They can generate compelling, tantalizing prose or even computer code based on an initial input, giving rise to a type of natural language programming called prompt engineering. The most well-known architecture for Large Language Models (LLMs) is dubbed a transformer, which follows scaling laws. Scaling, in this context, means that with the same amount of computing power, a larger language model will produce more valuable results.


The Next Generation of Language Models: GPT-3 and GPT-4

OpenAI released GPT-3 in 2020, a language model that was easily the largest ever created at the time. Like autocomplete in text messages, GPT-3 is trained to predict the next word in a sentence. However, early users realized that it had surprising capabilities: it could write persuasive papers, create websites from just text descriptions, generate computer code, and more — all with limited and sometimes no supervision.

GPT-3 is not without flaws, however. It can create inaccurate and bigoted text and content that appears reasonable on the surface but is misleading. This phenomenon is thought to be generalizable: as language models expand and grow, their capabilities and behaviors deviate in surprising, unanticipated, and sometimes shocking ways.

GPT-3 has over 100 times the parameters of GPT-2 and was trained on 570 gigabytes of text. This increase in scale drastically altered the qualitative behavior of the model — GPT-3 can perform tasks it was not trained to do, like translating sentences from English to Spanish, with few to no training examples. This behavior was largely absent in GPT-2. Likewise, for some tasks, GPT-3 outperforms explicitly trained models to solve specific tasks. However, for other jobs, it falls short.

The emergence of these behaviors from nothing more than the simple scaling of data and computing power begs the question: what other capabilities will emerge as we continue to scale, and what will GPT-4 do (and what should we do about it)? GPT-4 is expected to have 100 trillion parameters — more than 500 times the size of GPT-3. Considering the qualitative changes in the model from a 100x leap in parameter numbers, a 500x jump up from that is challenging to picture and incites questions about large language models and their very nature.

Language models like GPT4 predicting natural language



Are language models intelligent?

A chess program solves a specific problem, but humans are “generally” intelligent. They can do anything from write novels to play basketball to run a venture capital firm. By contrast, AI systems exhibit “narrow” intelligence, meaning that they do specific tasks very well or operate competently in a defined scope of problems. GPT-3 leap-frogged the sophistication of its predecessors, representing a giant leap forward. Moreover, a model won’t be limited to learning from just text in the future. Instead, images, audio recordings, and videos will provide a more robust learning signal, spur learning, and enhance learning speed.


What is the economic impact of Large Language Models?

GPT-3 has extensive conversational, search, and code generation capabilities, among many others. Users will probably discover even more features in the future. However, forecasting the behavior and subsequent impact of a large language model (and its misuse) can make predicting the effect of models like GPT-4 difficult. Furthermore, a highly capable model’s impact on the job market is uncertain. Is it possible (or appropriate) to automate specific jobs with large language models?


The models of the future will not just reflect data — they will also reflect the values we choose to impart.

GPT-3 can show unwanted behaviors, including racial and gender biases. The issue of universally mitigating such behavior isn’t easy to define within the training data or the trained model because appropriate language use varies across contexts and cultures. Training data might be filtered, outputs might be changed, and training methods could be modified, but it is unclear how this can be solved. There is a need for more interdisciplinary research on how to imbue values into these models.


We should develop a framework of sensible principles for deploying language models.

Who should build and deploy these language models, and how should they be held accountable for dangerous or poor performance? There are various ways the community could address this, including legally required disclosure of media generated by artificial intelligence when used for generating synthetic media.


Where do we go from here?

Organizations that create large language models will have a brief window of opportunity before others develop similar products or enhance them. Those at the vanguard of the field have a necessary and unique responsibility to establish procedures and standards that others can adopt. Still, it is uncertain how these will fare. Large Language Models will undergo radical change as they scale up further — as GPT-4 comes and goes — and undoubtedly exhibit more surprising behaviors. Society and businesses will be profoundly changed by this, mainly for the better, but strategizing on how we can limit the potential downside may be shrewd.


Read an introduction to large language models and use cases here.

Read more from Omega Venture Partners here.