Knowledge · Praktiker
Large Language Models Explained Simply
On this page
A language model can write essays, explain code and answer questions. It seems like real thinking. Yet at its core it always solves the same, astonishingly simple task.
What a language model is
A large language model is an especially large neural network. It was trained on vast amounts of text from books, websites and code.
From this training it developed a fine sense for language. It knows typical word sequences, style and context, without anyone feeding it grammar rules.
The idea: the next word
The core is surprisingly simple. The model receives a text and predicts which word most likely comes next.
Then it appends that word and repeats the step. Word by word, a whole text emerges. From this simple loop grow answers, stories and translations.
The transformer and attention
This was made possible by the transformer, introduced in 2017. Its strength is so-called attention. For each word, the model decides which parts of the text so far it pays special attention to.
This way it keeps the context even over long passages. This technique sits behind almost all large language models today.
Training and fine-tuning
First the model only learns to predict words on huge amounts of text. Then comes fine-tuning, where humans distinguish good from bad answers.
This way the model learns to answer helpfully and politely rather than just continuing some text. This second step turns a text generator into a useful assistant.
What they do well and where they fail
Language models are strong at language, summaries and first drafts. They save time and bridge language barriers.
But they can confidently produce nonsense, so-called hallucinations. They have no real understanding and no knowledge of the world in the human sense. Why this is also a safety topic is shown in the spoke on AI risks and alignment.
Frequently asked questions
Does a language model understand what it writes?
No, not like a human. It computes which word most likely follows. This often seems clever but rests on patterns from the training data, not real understanding.
Why do language models sometimes make things up?
Because they always produce a plausible continuation, even when they do not know something. These invented but convincing answers are called hallucinations.
What does the word „large“ mean in large language models?
It refers to the sheer size of the network and the training data. Modern models often have hundreds of billions of parameters and were trained on trillions of words.
What is the difference between a language model and a search engine?
A search engine finds existing texts and points to sources. A language model does not search but generates each answer anew word by word, which is why it does not provide reliable sources on its own.
What is a token in a language model?
A token is the smallest unit of text the model computes with, often a word or part of a word. The model always processes and produces text token by token, not letter by letter.
Can a language model do math or reason logically?
Only to a limited extent. It imitates calculation and reasoning steps from its training data rather than truly executing them. On multi-step tasks it therefore makes mistakes easily, even when the answer sounds confident.
Sources and further reading
- Large Language Models — IBM
- Attention Is All You Need — arXiv
Update note (as of: 06/05/2026)
First publication of the large language models spoke.
The cosmos in your inbox
Once a week: the best of the universe, made simple.