Reassessing the usefulness of major LLMs

February 12, 2026 Reassessing the usefulness of major LLMs2026-02-12T13:00:55+01:00 7 Comments AI, ChatGPT, Claude, Copilot, DeepSeek, epub, Gemini, Grok, Kimi, Mistral

I keep being disappointed by major chatbots. You see, it’s not enough if such an LLM knows about those newfangled frameworks or programming languages; it has to know, or to be able to find on the Internet, relevant information about technologies that were current 15–20 years ago. And, to my surprise, too many of them fail at such tasks!

I just needed to be reminded of specifics about the format of the page-template.xpgt files that were popular in the times of EPUB 2.0 and of ADE 1.7.2 or 2.0 and which were massively used by Barnes & Noble. Oh, and I didn’t even remember the file extension, so the task of any such chatbot was to give me the right answer and only then to help me with the format.

If a chatbot insisted that CSS was the answer, one more follow-up was reasonable: “No, not CSS.” If it still didn’t know the answer… Oh, but then, even if it did, it needed to be able to know or to find out what’s inside such a file! No guesswork, nor stuff that “might be there because it’s sort of plausible”!

Finally, I asked each chatbot to comment on and to express an opinion about such a file extracted from an old ePub. If the chatbot didn’t accept the upload of the file, I renamed it to page-template.xpgt.txt.

The results should surprise you:

Chatbot	Rating	Notes
Kimi	🟢 4/5	Most of its knowledge is retrieved and synthesized from web searches.
Grok	🟢 3.5/5	Still solid.
Gemini	🟡 3/5	It seems to have improved.
Copilot	🟡 2.5/5	Only supports the uploading of a limited number of file extensions. Meh.
DeepSeek	🟠 1.5/5	Only supports the uploading of a limited number of file extensions. Meh.
Mistral	🔴 1/5	Only supports the uploading of a limited number of file extensions. Dumb.
ChatGPT	🔴 0/5	Strange. It should have scored like Copilot. Crap.
Claude	🔴 0/5	Sonnet 4.5 Extended. Too much guesswork in all kinds of knowledge fields.

Some major chatbots are getting worse.

As a result, I further restrict my use of AI. Four weeks ago, I went from 12 down to 8 LLMs:

Now I go down to 6:

I keep Le Chat Mistral because it’s OK for “embedded knowledge and reasoning,” the same way DeepSeek produces nice wording for its “stochastic probabilities reasoning.”

AI, ChatGPT, Claude, Copilot, DeepSeek, epub, Gemini, Grok, Kimi, Mistral

◄ SSDs are not magic—but YT shorts are somewhat dumb, too ◄ [newer] | [older ] ► Bolojan și oferta lui de nerefuzat ►

7 Comments Already

HAL - February 12th, 2026 at 2:43 PM none Comment author #116522 on Reassessing the usefulness of major LLMs by Homo Ludditus

Some major chatbots are getting worse.

Yes, but also because you are using the free version rather than their more advanced paid versions, which are supposed to be more intelligent. 😉

Reply
- Béranger - February 12th, 2026 at 2:45 PM none Comment author #116523 on Reassessing the usefulness of major LLMs by Homo Ludditus
  
  Supposed to, but not necessarily so. But how could I pay for a service/product that does not impress?!
  
  Reply
Aldus - February 13th, 2026 at 12:02 AM none Comment author #116524 on Reassessing the usefulness of major LLMs by Homo Ludditus

No more Claude? Interesting.

I think ChatGPT is still somewhat relevant because it sets the trends. And just the other day, ChatGPT rolled out advertising in the US.

On a different note, I’ve started a new blog focused on LLMs.

Reply
- Béranger - February 13th, 2026 at 12:03 AM none Comment author #116525 on Reassessing the usefulness of major LLMs by Homo Ludditus
  
  Every single idiot on Earth is using ChatGPT. Most of them never heard of any other chatbot.
  
  You have too many blogs, really.
  
  Reply
  - Aldus - February 13th, 2026 at 12:20 AM none Comment author #116526 on Reassessing the usefulness of major LLMs by Homo Ludditus
    
    And this is exactly one of the reasons why ChatGPT stays relevant: everyone uses it. That’s how you see the overall direction/trend. Besides that, I don’t think it’s actually worse than the others – just tuned more poorly, so it gives you the most moderate and politically correct answer possible (from what I’ve seen).
    
    It might not be a bad idea to occasionally test the ones you’ve excluded too, since they’re changing as well. Some improve, others get worse, and it also depends on the prompt. With a different question, the ranking could easily have been completely reversed.
    
    Maybe, but this blog is going to be dedicated specifically to LLMs and AI. There aren’t many that cover the topic consistently, aside from yours.
    
    Reply
    - Béranger - February 13th, 2026 at 12:22 AM none Comment author #116527 on Reassessing the usefulness of major LLMs by Homo Ludditus
      
      I’ve read a lot of testimonials saying that ChatGPT is getting dumber and dumber. I’m still dumbfounded as to why GPT-5.1 performs so much better in Copilot regarding the topics I was interested in.
      
      OTOH, Claude has a long history of disappointing me, but lately it really tends to agree with me on everything that seems logical, even bad grammar. I cannot access for free more than Sonnet 4.5 Extended, but Sonnet 3.5 was definitely better, then everything went downhill.
      
      Reply
Béranger - March 19th, 2026 at 3:44 PM none Comment author #116733 on Reassessing the usefulness of major LLMs by Homo Ludditus

Le Chat Mistral disappointed me repeatedly and annoyingly, so I stopped using it one more time. The free tier never uses Mistral Large 3, so I suppose this is why it feels dumber than Qwen, which is so unreliable that I stopped using it.

Mistral out, Claude back in.

Reply

Reassessing the usefulness of major LLMs

7 Comments Already

Leave a Reply Cancel reply