Large Language Models Explained Simply

A language model can write essays, explain code and answer questions. It seems like real thinking. Yet at its core it always solves the same, astonishingly simple task.

2017 transformer introduced

2022 ChatGPT appears

1 word predicts the next one

What a language model is

A large language model is an especially large neural network. It was trained on vast amounts of text from books, websites and code.

From this training it developed a fine sense for language. It knows typical word sequences, style and context, without anyone feeding it grammar rules.

The idea: the next word

The core is surprisingly simple. The model receives a text and predicts which word most likely comes next.

Then it appends that word and repeats the step. Word by word, a whole text emerges. From this simple loop grow answers, stories and translations.

The transformer and attention

This was made possible by the transformer, introduced in 2017. Its strength is so-called attention. For each word, the model decides which parts of the text so far it pays special attention to.

This way it keeps the context even over long passages. This technique sits behind almost all large language models today.

Training and fine-tuning

First the model only learns to predict words on huge amounts of text. Then comes fine-tuning, where humans distinguish good from bad answers.

This way the model learns to answer helpfully and politely rather than just continuing some text. This second step turns a text generator into a useful assistant.

What they do well and where they fail

Language models are strong at language, summaries and first drafts. They save time and bridge language barriers.

But they can confidently produce nonsense, so-called hallucinations. They have no real understanding and no knowledge of the world in the human sense. Why this is also a safety topic is shown in the spoke on AI risks and alignment.

Frequently asked questions

Does a language model understand what it writes?

No, not like a human. It computes which word most likely follows. This often seems clever but rests on patterns from the training data, not real understanding.

Why do language models sometimes make things up?

Because they always produce a plausible continuation, even when they do not know something. These invented but convincing answers are called hallucinations.

What does the word „large“ mean in large language models?

It refers to the sheer size of the network and the training data. Modern models often have hundreds of billions of parameters and were trained on trillions of words.

What is the difference between a language model and a search engine?

A search engine finds existing texts and points to sources. A language model does not search but generates each answer anew word by word, which is why it does not provide reliable sources on its own.

What is a token in a language model?

A token is the smallest unit of text the model computes with, often a word or part of a word. The model always processes and produces text token by token, not letter by letter.

Can a language model do math or reason logically?

Only to a limited extent. It imitates calculation and reasoning steps from its training data rather than truly executing them. On multi-step tasks it therefore makes mistakes easily, even when the answer sounds confident.

Large Language Models Explained Simply

What a language model is

The idea: the next word

The transformer and attention

Training and fine-tuning

What they do well and where they fail

Frequently asked questions

Sources and further reading

Update note (as of: 06/05/2026)

The cosmos in your inbox

Large Language Models Explained Simply

What a language model is

The idea: the next word

The transformer and attention

Training and fine-tuning

What they do well and where they fail

Frequently asked questions

Sources and further reading

Update note (as of: 06/05/2026)

More on Artificial intelligence

The cosmos in your inbox