Reassessing the usefulness of major LLMs
I keep being disappointed by major chatbots. You see, it’s not enough if such an LLM knows about those newfangled frameworks or programming languages; it has to know, or to be able to find on the Internet, relevant information about technologies that were current 15–20 years ago. And, to my surprise, too many of them fail at such tasks!
I just needed to be reminded of specifics about the format of the page-template.xpgt files that were popular in the times of EPUB 2.0 and of ADE 1.7.2 or 2.0 and which were massively used by Barnes & Noble. Oh, and I didn’t even remember the file extension, so the task of any such chatbot was to give me the right answer and only then to help me with the format.
If a chatbot insisted that CSS was the answer, one more follow-up was reasonable: “No, not CSS.” If it still didn’t know the answer… Oh, but then, even if it did, it needed to be able to know or to find out what’s inside such a file! No guesswork, nor stuff that “might be there because it’s sort of plausible”!
Finally, I asked each chatbot to comment on and to express an opinion about such a file extracted from an old ePub. If the chatbot didn’t accept the upload of the file, I renamed it to page-template.xpgt.txt.
The results should surprise you:
| Chatbot | Rating | Notes |
|---|---|---|
| Kimi | 🟢 4/5 | Most of its knowledge is retrieved and synthesized from web searches. |
| Grok | 🟢 3.5/5 | Still solid. |
| Gemini | 🟡 3/5 | It seems to have improved. |
| Copilot | 🟡 2.5/5 | Only supports the uploading of a limited number of file extensions. Meh. |
| DeepSeek | 🟠 1.5/5 | Only supports the uploading of a limited number of file extensions. Meh. |
| Mistral | 🔴 1/5 | Only supports the uploading of a limited number of file extensions. Dumb. |
| ChatGPT | 🔴 0/5 | Strange. It should have scored like Copilot. Crap. |
| Claude | 🔴 0/5 | Sonnet 4.5 Extended. Too much guesswork in all kinds of knowledge fields. |
Some major chatbots are getting worse.
As a result, I further restrict my use of AI. Four weeks ago, I went from 12 down to 8 LLMs:

Now I go down to 6:

I keep Le Chat Mistral because it’s OK for “embedded knowledge and reasoning,” the same way DeepSeek produces nice wording for its “stochastic probabilities reasoning.”

Yes, but also because you are using the free version rather than their more advanced paid versions, which are supposed to be more intelligent. 😉
Supposed to, but not necessarily so. But how could I pay for a service/product that does not impress?!