Reassessing the usefulness of major LLMs
I keep being disappointed by major chatbots. You see, it’s not enough if such an LLM knows about those newfangled frameworks or programming languages; it has to know, or to be able to find on the Internet, relevant information about technologies that were current 15–20 years ago. And, to my surprise, too many of them fail at such tasks!
I just needed to be reminded of specifics about the format of the page-template.xpgt files that were popular in the times of EPUB 2.0 and of ADE 1.7.2 or 2.0 and which were massively used by Barnes & Noble. Oh, and I didn’t even remember the file extension, so the task of any such chatbot was to give me the right answer and only then to help me with the format.
If a chatbot insisted that CSS was the answer, one more follow-up was reasonable: “No, not CSS.” If it still didn’t know the answer… Oh, but then, even if it did, it needed to be able to know or to find out what’s inside such a file! No guesswork, nor stuff that “might be there because it’s sort of plausible”!
Finally, I asked each chatbot to comment on and to express an opinion about such a file extracted from an old ePub. If the chatbot didn’t accept the upload of the file, I renamed it to page-template.xpgt.txt.
The results should surprise you:
| Chatbot | Rating | Notes |
|---|---|---|
| Kimi | 🟢 4/5 | Most of its knowledge is retrieved and synthesized from web searches. |
| Grok | 🟢 3.5/5 | Still solid. |
| Gemini | 🟡 3/5 | It seems to have improved. |
| Copilot | 🟡 2.5/5 | Only supports the uploading of a limited number of file extensions. Meh. |
| DeepSeek | 🟠 1.5/5 | Only supports the uploading of a limited number of file extensions. Meh. |
| Mistral | 🔴 1/5 | Only supports the uploading of a limited number of file extensions. Dumb. |
| ChatGPT | 🔴 0/5 | Strange. It should have scored like Copilot. Crap. |
| Claude | 🔴 0/5 | Sonnet 4.5 Extended. Too much guesswork in all kinds of knowledge fields. |
Some major chatbots are getting worse.
As a result, I further restrict my use of AI. Four weeks ago, I went from 12 down to 8 LLMs:

Now I go down to 6:

I keep Le Chat Mistral because it’s OK for “embedded knowledge and reasoning,” the same way DeepSeek produces nice wording for its “stochastic probabilities reasoning.”

Yes, but also because you are using the free version rather than their more advanced paid versions, which are supposed to be more intelligent. 😉
Supposed to, but not necessarily so. But how could I pay for a service/product that does not impress?!
No more Claude? Interesting.
I think ChatGPT is still somewhat relevant because it sets the trends. And just the other day, ChatGPT rolled out advertising in the US.
On a different note, I’ve started a new blog focused on LLMs.
Every single idiot on Earth is using ChatGPT. Most of them never heard of any other chatbot.
You have too many blogs, really.
And this is exactly one of the reasons why ChatGPT stays relevant: everyone uses it. That’s how you see the overall direction/trend. Besides that, I don’t think it’s actually worse than the others – just tuned more poorly, so it gives you the most moderate and politically correct answer possible (from what I’ve seen).
It might not be a bad idea to occasionally test the ones you’ve excluded too, since they’re changing as well. Some improve, others get worse, and it also depends on the prompt. With a different question, the ranking could easily have been completely reversed.
Maybe, but this blog is going to be dedicated specifically to LLMs and AI. There aren’t many that cover the topic consistently, aside from yours.
I’ve read a lot of testimonials saying that ChatGPT is getting dumber and dumber. I’m still dumbfounded as to why GPT-5.1 performs so much better in Copilot regarding the topics I was interested in.
OTOH, Claude has a long history of disappointing me, but lately it really tends to agree with me on everything that seems logical, even bad grammar. I cannot access for free more than Sonnet 4.5 Extended, but Sonnet 3.5 was definitely better, then everything went downhill.