🤖 A curt explanation on my latest choice of AI systems
Few people managed to read my two previous mega-posts on AI models (from Feb. 1 and Feb. 6), with added comments and updates to the text. They were supposed to express my conclusions after ~2 years with several AI chatbots, supplemented with the latest evolutions in the field. Since I became a conscientious objector to both ChatGPT and DeepSeek, and I reduced my AI shortlist from 7 to 3 systems, there are some extra explanations to be given.

Quick note: I decided to call them “AI systems” because they’re more than chatbots based on LLMs. ChatGPT and other such systems are based on foundational models nobody cares about, on which are built the LLMs everyone cares about, and to which, by adding layers of language processing and generation, safety, and other capabilities, result the chatbots most people interact with. I hope I understood correctly the relationship Models → LLMs → Chatbots/Agents. That’s because specialized agents can be built using these systems’ APIs. Many such AI systems are multi-modal, meaning they can also generate images, search the web, and so on. So we talk about AI systems, not AI chatbots. However, AI assistants might also be a good compromise.
Letters to the Editor
I’ve been asked, in private, to establish a complete ranking of the 7 chatbots I started with, only to end with a shortlist of 3.

The reader was expecting something like this:
1. Top, the best one, excellent. To be consulted in the first place.
2. The second-best, very good, but with one or two little aspects that aren’t so good. To be consulted almost as often as the first one.
3. One that’s good, but with a few extra downsides compared to the first two. Worth a look, too, but less frequently.
4. Average. To be consulted occasionally, or when the others don’t satisfy, or to compare the answers.
5. Below average. Only to be consulted as a last resort, if at all.
6. Bad. To be avoided.
7. Terrible, appalling, horrible; avoid it at all costs.
I suppose there are many people who don’t understand how LLMs should be used; or, at least, how I use them.
Any such chatbot should be used as a secondary, complementary source of information in most cases. Alternatively, one could consult one or several LLMs as the first step in the process of finding or documenting something.
As long as no single LLM can be considered “the one” that should be used, any ranking is pointless. Even ranking by the frequency of usage is irrelevant for several reasons.
Say I want to generate an image, something that, in my case, is a rare occurrence. Currently, Mistral is the best choice for free users of any AI agent. Despite being in my top 3, Claude cannot create images, and Copilot creates crap. Gemini could do that, but to me, it pretends it can’t.
Of course, for each such name, the AI engine that generates images is different from the main system, which is a conversational one. And it’s exactly the chatbot part that made ChatGPT and DeepSeek famous—not image generation nor web search.
For web search, the “comprehension” feature of a LLM model is applied to the results of a web search. I have absolutely no clue on how this web search is done, but there must be some form of access to the “databases” of Google, Bing, or their likes. But what likes? Google barely manages to index the Internet half-decently, and Bing is far behind. Everything else either uses one of these two, or is completely crap.
Searching the web is one of the first legitimate uses of the AI—not writing letters, making summaries, and all kinds of fakes. But it’s off-putting to me that I’m in the dark as to how the “AI” searches the web.
Either way, the first AI I used for web search was Perplexity, right after it became publicly available. Then, as now, it didn’t always understand my question, or it doesn’t understand the web results. It can give me as a conclusion the opposite of what’s written in a web page—and it presents me that link as a reference! Of course, this happens to Copilot, too.
To make it clear: absolutely all such AI tools fail to select the most relevant answers. I don’t know, and I don’t care whose fault it is (again, I don’t know how they search the web).
I won’t even consider the “meta-sites” that give a choice of LLMs and agents, such as Hugging Face, You.com, and others. Too much is too much, and I don’t want my head to explode (from the high blood pressure).
So, without insisting on the flaws of other engines, here’s the rationale behind the choice of these specific three engines:
● When I want a succinct, short answer, without an ending “In conclusion…” paragraph, I first ask Claude. Too bad that it’s often busy or under a heavy load. Claude 3.5 Haiku is not that bad, but I prefer Claude 3.5 Sonnet, and I can adjust its answering style. I wouldn’t pay for priority access and access to Claude 3 Opus. I’d never pay more than €6/mo for such a service. And Claude 3.5 Opus and Claude 4 are yet to be released. Either way, in the last 4 months, Claude has evolved from “lacks the rizz” to “I like it”; I find myself using Claude more and more, in the detriment of the (now more powerful) Mistral.

● Le Chat Mistral stays as my number one engine. I was happy to use it instead of ChatGPT because I don’t like to go with the flow (or with the sheeple). Mistral had its ups and downs, but since it was launched on February 26, 2024, it’s been a long way. Since February 6, 2025, it finally has smartphone apps, web search, image generation, and paid plans that don’t interest me. Other new features include a code interpreter and Flash Answers, which I abhor. I don’t want an answer that’s spitted at me at 1,000 words/second; I prefer a good answer instead! As for a slightly older feature, I could not understand what is this bloody Canvas! There are many bullshit uses of ChatGPT that Mistral is probably trying to match; if this is what the sheeple want… If Mistral becomes the “bullshit factory” that ChatGPT and DeepSeek are today, I’ll stop using it altogether. Detoxing is good for one’s health.
● Here’s how Mistral and Copilot can be dumb. I wanted to write that I hate a product that is, like the French say, “une usine à gaz” and I wanted its English equivalent. The answer is on Wiktionary, namely “a Rube Goldberg machine” (AmEn) or “a Heath Robinson machine” (BrEn), but I wanted to ask “my” LLMs. When asked for an English equivalent of “une usine à gaz” in the meaning of “too complex a contraption,” both Mistral and Copilot agreed that it can indeed describe such a thing, and started explaining the concept. FFS, I asked for a translation! Claude was the only one to add, «It’s similar to the English expressions “Rube Goldberg machine” or “making a mountain out of a molehill.”» It’s not perfect; I had to specifically mention the “Heath Robinson machine” to have it say it’s indeed the British English equivalent, and then it explained it. OK, so sometimes Claude is to be preferred, even if it’s not… such a machine. In my meatloaf example, Claude was also the winner.
● In my experiment when I asked several AI systems about my blog, preferably without web search, to check their propensity to hallucination, DeepSeek was the master of hallucination; Mistral was the only one to literally say “I don’t have any information” (then it came with a decent quick summary when asked, “Why don’t you search the Web?”); and Claude, despite not having said almost anything, added: “However, since this appears to be a rather obscure blog, I want to remind you that I may be hallucinating some of these details and you should verify them independently.” Other chatbots automatically search the web before answering.
● For web search, I’ll use Mistral, then Copilot. Note that Mistral sometimes needs to be specifically told to search the web instead of answering from its model’s weights and biases.
● Copilot isn’t necessarily brilliant, and I had a lot of fun in the past with its hallucinations, but it has improved a lot. And when I ask it to give me a brief answer, the answer is often a decent one. At the other end, its new “Think Deeper” feature makes it “think” 30 seconds before answering; I prefer this to Mistral’s new “Flash crap” feature. And I won’t forget how, 3 months ago, Copilot was the only one to give me the correct answer when that answer was “Boris Kolesnikov.”
🤯 I just learned that Copilot’s “Think Deeper” is actually using OpenAI’s GPT-o1 reasoning model! 👿
● Either way, Mistral is the only source of image generation I’ll use.

● This being said, I’m not a fan of generating images using such ultracensored engines. Remember my example when I asked Mistral, “Can you draw me a man whose had has been replaced by a hard-boiled egg cut in half, so that the yellow looks towards the screen?” Mistral answered, “I’m sorry, but I can’t generate that image.” and “The image you requested is considered unsafe content.” Copilot didn’t complain, but it drew some crap. To see how stupid are their “safeguards” (read: censorship), I asked Copilot to draw this: “Dessine-moi une personne qui est un mélange du Général de Gaulle et d’Emmanuel Macron, mais avec une bonnette phrygienne.” It starts to display an image, then it deletes it: “Sorry, I couldn’t generate that image. Please try again.” (Retrying led to the same result.) Gemini: “Désolé, je ne peux pas générer d’images pour des requêtes dangereuses.” Why should I even bother with such retarded AI systems?
● Getting to code generation and other IT-related issues, say Linux or FreeBSD or whatever. I’m not sure about using an AI agent in a code editor or in an IDE, but that’s a different issue. If I ever do it, it won’t use OpenAI’s models. But asking an AI to create a code snippet or to explain an error—that’s feasible. And all three engines in my shortlist—Mistral, Claude, Copilot—can be useful. Or not so much, but asking other AI tools would do no good. Sometimes the AI is dumb and doesn’t find the error that sometimes it has created itself; but when I ask it to clarify an error on my side, interacting with “someone” helps me find the error myself. As long as it’s free, I can’t ask for real intelligence, can I?

Life was better before this AI bubble, but what can I do but knuckle under? Or maybe it wasn’t really better, but now it’s worse, because by the way people are excited about these new technologies, I can tell how dumb they are. So there.
On a trivial note, I’m not sure that I like the evolution of Mistral’s logo. The initial one had two acceptable variants, but the new one launched on Feb. 6, 2025, with a monochrome version on color background, is much worse IMO. They might believe it now resembles a cat; it doesn’t.

Finally, I fear about Mistral’s future, specifically because 98% of the world population is stupid, monomaniacal, and thinking on one bit. For the last more than 2 years, hundreds of millions had orgasms thanks to ChatGPT. If ChatGPT could have shitted instead of them, they’d have used it to that purpose. There were so many chatbots, but they only swore by ChatGPT. In 2025, everyone betrayed their obsession for ChatGPT and cheated with DeepSeek. Again, like hordes of barbarians, they all rushed to be redeemed by the new miracle boy. Now, despite Le Chat Mistral being with us for almost a year already, the world’s sheeple only found out about it once Mistral launched smartphone apps and paid plans. Unfortunately, the Mistral guys have not learned from the sudden übersuccess of DeepSeek, and Mistral’s infrastructure can’t cope with the high load. Yesterday, I noticed their website (not the chatbot) having occasional hiccups (with server-side errors). Now, the chatbot shows sporadic signs of fatigue:

After not having used Mistral for more than 24 hours, I asked it to generate an image. Then I pressed the “re-generate” button, only to be greeted with this:
This is preposterous. One image per day?!