All the YouTube videos about how AI works, with stress on the chatbots based on LLMs, are superficial. It can’t be otherwise for two reasons: any such presentation is popularized science (which the French call “science vulgarisée”), so it can’t be too specific; and it’s not made by the professionals who work in the field. Not by experts, and not for experts.

❶ But it can help to some extent, and this is why I’m dropping here a few links to YT videos, all by Grant Sanderson from the channel 3Blue1Brown. I’ll start with the most recent one:

🎞️ Large Language Models explained briefly (7m57s)

🎞️ But what is a neural network? | Deep learning chapter 1

🎞️ Gradient descent, how neural networks learn | DL2

🎞️ Backpropagation, step-by-step | DL3

🎞️ Backpropagation calculus | DL4

🎞️ Transformers (how LLMs work) explained visually | DL5

🎞️ Attention in transformers, step-by-step | DL6

🎞️ How might LLMs store facts | DL7

❷ And now, a few comments.

What pisses me off the most is this oversimplification: “A Large Language Model is a sophisticated mathematical function that predicts what word comes next for any piece of text.” This is rather for the LLM seen as the innermost black box, the model itself. But a chatbot or an AI assistant is much more than that.

If a LLM were to be used solely based on the above definition, that of a simple prediction, it could only work as an autocomplete or an autocorrect function, for grammar checking, for basic fact retrieval through statistical inference à la “Paris is the capital of France” (not of Japan), or for “creative” text generation (made possible by setting the “temperature”).

But even as LLMs are merely statistical inference machines, they don’t generate answers by themselves. The entire architecture of a chatbot is much more complicated, and there are ZERO videos on YT to explain it fully and properly! You see, even when you download a “model” that can be run locally on your machine, it’s never just a model. It’s a full chatbot, and you can “talk” to it at a prompt. (Obviously, it can be invoked in various ways, but it also has a prompt.)

So any chatbot must have natural language processing at both ends: at input, natural language interpretation would classify your question into what it might mean; at output, natural language generation would package the answer the way you can see in various chatbots.

But how is reasoning mimicked? When I asked DeepSeek, “How was it possible for Kowloon Walled City to never have had a disastrous fire able to burn it down completely?”, and it answered without searching the web and without having any knowledge base outside of its training data sources—it didn’t use Retrieval-Augmented Generation (RAG)—the process was fascinating, once you inspect its apparent chain-of-thought as shown by DeepThink.

It didn’t actually “think” (there’s no such thing as General AI), but it packaged in words and in a human-like approach the iterative and surely recursive process that, let me say it again, is based on an architecture that nobody bothers to explain to us, plebeians!

Claude’s analysis of DeepSeek’s DeepThink intermediate output was pertinent—read it! Then ask yourself why such a fundamental aspect of a chatbot’s architecture is never explained to the laymen!

This being said, the above videos have their utility. It’s like, “Here’s how a transistor works, including the three basic single-stage bipolar-junction-transistor (BJT) amplifier topologies, but you don’t need to know how to design a full amplifier, from line input to speakers.” Because it’s really too complicated.

Or maybe I need to get a better understanding of “attention”!

❸ A couple of bookmarks only from the above videos.

Transformers are an important concept in LLMs, so the above video on them includes a segment on “temperature,” which is a crucial parameter you must know about.

“Attention” is another fundamental aspect, so make sure you understand the attention pattern. It’s overly simplified, which in my book counts as probably even wrong from a purely technical standpoint, but it is what it is.

Have you noticed how everything is supposed to be exclusively in English? What happens if a model is trained with multilingual data? Is “gatto” translated into “cat” or is it taken as is, and the model eventually classifies it alongside “cat”? And what happens when a user asks something in French: is the question translated to English, the answer concocted to English, then translated, or is it processed in French? Oversimplification is killing me.

The last video above, “How might LLMs store facts,” includes a number of links in the description (they all do). And I was interested in this one: Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. This is not something to be read; just marvel at it. But let me quote from the beginning:

Unfortunately, the most natural computational unit of the neural network – the neuron itself – turns out not to be a natural unit for human understanding. This is because many neurons are polysemantic: they respond to mixtures of seemingly unrelated inputs. In the vision model Inception v1, a single neuron responds to faces of cats and fronts of cars. In a small language model we discuss in this paper, a single neuron responds to a mixture of academic citations, English dialogue, HTTP requests, and Korean text. Polysemanticity makes it difficult to reason about the behavior of the network in terms of the activity of individual neurons.

One potential cause of polysemanticity is superposition , a hypothesized phenomenon where a neural network represents more independent “features” of the data than it has neurons by assigning each feature its own linear combination of neurons.

Isn’t it funny how they need to hypothetize, because even the designers of such systems have lost the grasp on how a model eventually stores the information? Not the data per se, but the concepts. Wunderbar.

But hey, even A Mathematical Framework for Transformer Circuits is unreadable, so let’s keep it magic.

❹ So I decided to add a few more videos, this time from the channel vcubingx:

🎞️ What does it mean for computers to understand language? | LM1

🎞️ Why Recurrent Neural Networks are cursed | LM2

🎞️ How did the Attention Mechanism start an AI frenzy? | LM3

This guy is too slow a talker; I had to use an increased speed of playing. Either way, modern LLMs don’t use RNNs (Recurrent Neural Networks). They are based on the transformer architecture which, as you should already know by now, was introduced in the “Attention is All You Needpaper in 2017. NLP (Natural Language Processing) is definitely using the transformer architecture for large-scale language modeling.

❺ I couldn’t end without something that’s not strictly related, but which will be the next bubble. It already is, but still in its infancy.

Once some more trillions of dollars will be spent on AI, with a very poor return, quantum computing will request its own trillions of dollars!

For now, quantum computing is pathetic. Qubits are, so to speak, few and far between, and they cannot be used for actual, accurate computing. But the power of numbers makes them useful in a probabilistic way. Unlike with anything derived from deep learning, where a lot of computing power is used to create weights and biases from big data, a few qubits would do something different, but at the same time similar, in a way. Operating on quantum mechanical principles like superposition and entanglement (not fully understood because it’s not about a discrete status, but a probabilistic one), quantum computers (still a misnomer) could perform certain probability calculations exponentially faster than classical computers.

So quantum computing might be the next step in AI. It might eventually enhance some specific AI tasks, without replacing classical AI methods. But here goes your money. Goodbye, trillions of dollars!

This planet is unable to solve more fundamental issues (health care, education, social issues, public safety, poverty, wars, environmental issues) because “there is no money,” yet AI and quantum computing will engulf an enormous amount of money in the decades to come.

And here’s Microsoft coming with some vaporware: 🎞️ Majorana 1 Explained: The Path to a Million Qubits.

This is 100% fraud, and there’s nothing explained. One million qubits on that thing? Not in a million years!

Microsoft, a company that couldn’t create a solid OS after Windows 7. A company that managed to kill Windows Phone out of stupidity and mismanagement (no apps!). One of the most hated corporations on Earth, producing low-quality software, now is pioneering quantum computing hardware?!

No problemo, this is a sure path to Idiocracy. Already, the exact functioning of generative AI is unknown to 99.998% of the population. The math is complex, the architecture is complex, and the functioning seems magic. But CEOs, bankers, and politicians don’t need to understand anything; they just spend money and look for profits. And power.

Just think of how little of anything knows and understands Trump. How many billions are even dumber? We’re fucking doomed, and AI won’t help; if anything, it will make things worse.