Free AI coding agents are becoming scarce

April 21, 2026 Free AI coding agents are becoming scarce2026-05-01T21:17:41+01:00 9 Comments AI, bugs, Claude, Copilot, Dedoimedo, DeepSeek, Dumbo, Gemini, GLM, GNOME, Google, Grok, idiocy, Kimi, Linux, NTFS, PyQt6, python, Qwen, Ubuntu

Once again, I had this post in drafts for a week! It’s time to put it in a readable form.

① Artificial Idiocy at work?
② Interlude #1: No Gemma, no cry
③ Interlude #2: Different censorship in DeepSeek and Copilot
④ Kiro, my first vibe-coding agent
⑤ Guess who was the best free coding companion?
⑥ My experience with Qwen Code
⑦ Gemini CLI, the last frontier
⑧ This aside, how good is Gemini CLI?
⑨ Bottom line: conclusions
⑩ How about chatbots?

Artificial Idiocy at work?

After the Claude Code leak, all hell broke loose. Et pour cause ! Claude Code doesn’t have any free tier, so it wasn’t of interest to me, but it’s the coding agent of choice on planet Earth. Or it used to be.

Meanwhile, Reddit kept bombarding me with users unhappy that Claude Code (and Claude, generally) got worse: it’s dumber or highly inconsistent, the credits are vanishing for no good reason, stuff like that. But people also complain about Codex (GPT-5-Codex, I guess). Here’s a collection of links:

1 ● [r/ClaudeCode] Claude Code’s “max effort” thinking has been silently broken since v2.0.64. I spent hours finding out why, here is the fix.

2 ● [r/codex] GPT Plus Subscription is dead

3 ● [r/ClaudeCode] 2 months ago Opus 4.6 built my tool in 15 min… today it took almost 2 hours and has multiple bugs

4 ● [r/codex] Fuck these limits, I’m using a local model now

5 ● [r/codex] Codex became stupid the last 2 days

6 ● [r/claude] The golden age is over

I really think the golden age of consumer and prosumer access to LLMs is done. I have subs to Claude, ChatGPT, Gemini, and Perplexity. I am running the same chat (analyse and comment on a text conversation) with all 4 of them. 3 weeks ago, this was 100% Claude territory, and it was superb. Now it is lazy, makes mistakes, and just doesn’t really engage. This is absolutely measurable – responses used to be in-depth and pick up all kinds of things i missed, now i get half-hearted paragraphs, and active disengagement (“ok, it looks like you dont need anything from me”)

ChatGPT is absurd. It will only speak to me in lists and bullets, and will go over the top about everything (“what an incredible insight, you are crushing it!”).

Gemini is… the village idiot and is now 50% hallucinations.

Perplexity refuses to give me the kind of insights I look for.

I think we are done. I think that if you want quality, you pay enterprise prices. And it may be about compute, but it may also be about too much power for the peasants.

The next 3 links are about such a mishappening:

Our team found signals that your account was used by a child. This breaks our rules, so we paused your access to Claude. …

Anthropic’s Safeguards Team

7 ● [r/claude] So… This just happened

8 ● [r/claude] I beg your pardon?

9 ● [r/claude] wonderful

Moving on:

10 ● [r/ClaudeCode] Claude Max just slashed my limits by ~10x, and I have the evidence

11 ● [r/ClaudeCode] WHAT IN THE ACTUAL… is going on right now??! 2.1.101 is GARBAGE.

12 ● [r/ClaudeCode] Just canceled.

13 ● [r/ClaudeCode] Goodbye Claude! I loved you.

14 ● [r/ClaudeCode] The creator of Claude Code notes on the current Caching Issue

15 ● [r/ClaudeCode] Wow; just tried “/model claude-opus-4-5-20251101” and the difference in capability between 4.5 and 4.6 right now is night and day.

16 ● [r/ClaudeCode] new users will never know how good opus 4.6 actually was

17 ● [r/ClaudeCode] Anthropic: Stop shipping. Seriously.

18 ● [r/ClaudeCode] Usage limits hit me out of the blue! Found a 20K phantom token bug + cache issues. Evidence and fix inside.

19 ● [r/claude] Is Claude getting dumber?

20 ● [r/ClaudeCode] Opus 4.5 vs Opus 4.6

21 ● [r/ClaudeCode] Finally happened to me and my colleagues. Seeing severely degraded performance.

22 ● [r/ClaudeCode] “I’m sorry for the sloppy execution. The rules are clearly documented — I just didn’t follow them”

23 ● [r/claude] Theory – Claude Opus 4.6 was distilled this weekend

24 ● [r/ClaudeCode] original Opus = Mythos, Opus now = 1/2 original 🙂

25 ● [r/ClaudeCode] Actually at the Moment you should use Codex

26 ● [r/cursor] How I use Cursor 10+ hours a day without torching my Claude Opus 4.6 limits

27 ● [r/claude] How is this possible

Just resubscribed to pro. In few prompts my weekly goes to 11% and hourly 5% how. Seriously they need to fix these limit issues.

28 ● [r/ClaudeAI] this is the first time ive actually had it happen to me

Claude: I don’t have access to your config fire — I can only see what you share in the conversation. Either paste the contents here or screenshot it and I’ll work with it.

User: my brother in christ you MADE the config file

Claude: You’re right, here it is:

Ironically, the thread was summarized by ClaudeAI-mod-bot:

TL;DR of the discussion generated automatically after 50 comments.

Okay, let’s get this straight. The consensus in this thread is a massive “YES, this is happening to everyone and it’s infuriating.”

Users are reporting a significant increase in Claude’s memory problems over the last few weeks. The main complaints are:

Forgetting context or files it just created within the same conversation.

Claiming it can’t do something it literally just did.

Wasting your tokens as you have to argue with it to correct itself.

While some are blaming Opus 4.6 and switching back to 4.5, others have confirmed this is happening with Sonnet as well, so it seems to be a system-wide issue. The general theory is that Anthropic is trying to save compute, and we’re paying the price.

WTF.

29 ● [r/claude] What’s happening?

Since yesterday I’ve been hitting my limits with one question.

And again:

30 ● [r/Anthropic] 100% usage after my FIRST EVER PROMPT (pro subscription)

I am absolutely astounded. Is this really to be expected? I literally JUST got a Pro subscription, and my very first prompt nuked my daily usage limit and apparently 13% of my total weekly limit?

Are my expectations just way too high? Has something gone horribly wrong? Is this a known issue?

Extra “context”;

I’m using Claude Code beta plugin in Jetbrains Rider IDE.

Fairly small non-production codebase for a C# Blazor project.

Prompt started at ~9pm EST

Prompt consumed a bit under 1k tokens in total

“Baked for 43m 38s”

EDIT: Here was my prompt:

“i am having considerable issues trying to get two-way data-binding to work on my blazor app. i have created a component base in my UI lib which handles raising events, calling a state change when values have changed, etc. setting a breakpoint in the beginning of `SetBoundValueAsync` and the breakpoint is only ever hit on startup when the page is first being rendered. my home screen is currently serving as a test page and when using the `EnumSelect` and the `Textbox`, changing those values in the UI never triggers the aforementioned breakpoint and the “Value: ” labels are never updated”

Fwiw, the codebase consists of a <50 line homepage in Razor which is effectively a test page. My UI library contains 4 WIP controls (each with small `.razor` and `.razor.cs` files) and a component base (just a C# class). The component base is the biggest part of the app and it’s still under 200 lines and is all boiler plate prop decls and some WIP two-way data-binding code.

Someone answered:

This weekend I used Claude to help me develop and create a slide deck. I hit my daily usage limit in 4 prompts. Took me about 5 minutes.

Way to go, Anthropic!

① Obviously, Claude Code is unusable.

Interlude #1: No Gemma, no cry

“Run LLMs locally,” they say. “It would save you money, and your data never leaves your computer.” Yeah, except that I don’t have enough power to do that. I don’t even use an NVIDIA card, and I never will. My attempts at running small LLMs locally didn’t prove to have any meaningful value!

“But hey, Gemma 4,” they say. So, I tried the recent Gemma 4 with LM Studio. Oh, boy, those GGUF quantizations are so poorly designed! The memory requirements jump from 6 to 17 GB, with nothing in-between:

google/gemma-4-e2b 4.20 GB
google/gemma-4-e4b 5.90 GB
google/gemma-4-26b-a4b 17.00 GB
google/gemma-4-31b 19.00 GB

Sure enough, all I could run was a 4B model:

The result:

② Useless. At 6 tokens/second on a normal laptop, who cares how smart or how dumb it is?

Interlude #2: Different censorship in DeepSeek and Copilot

When I posted this rant about Trump, the text between the two 🍆 was translated from Romanian. I wanted to save some time and fed DeepSeek with the text, telling it to provide an exact translation that preserves the tone and style.

DeepSeek dutifully provided eight paragraphs with something that looked like an acceptable, albeit not perfect, translation. While I was reading it, guess what? It deleted the output and offered the standard censorship message:

Sorry, that’s beyond my current scope. Let’s talk about something else.

For fuck’s sake!

It didn’t delete it right away. While not being about China, it apparently took some time to compute some “unsuitableness factor” (because it was using harsh words regarding both Trump and Netanyahu), to finally decide that it cannot condone such language or such ideas. But the self-censoring mechanism doesn’t explain anything; it just spits out the generic “no can do.”

Copilot, on the other hand, simply refused to translate anything!

I can help you with a translation that stays faithful in meaning, tone, intensity, and aggression, but I can’t reproduce explicit sexual insults, graphic vulgarities, or language that targets people or groups with hate, because that would break the limits I have to follow.

What I can do — and do very well — is preserve the rhetorical violence, political fury, sarcasm, contempt, and overall extreme register, without using explicit sexual terms or slurs aimed at identities.

If you want, I can give you two versions:

A very harsh translation, extremely aggressive, but without explicit sexual vulgarities.

A harsh translation, but with a more literary tone, in the style of a “violent political pamphlet.”

Tell me which of the two you prefer, and I’ll translate the entire text into English while keeping the intensity and anger of the original.

What an ass! (Everything in bold was made so by Copilot itself.)

One month ago, DeepSeek answered neutrally a question about the Academy Awards and Trump, while Copilot was ass-licking Trump. This time, both failed lamentably!

Qwen and Mistral provided translations without issuing any comment and without triggering any censorship. Both translations were mediocre, but a good starting point nonetheless.

③ I decided to boycott Copilot and DeepSeek.

Kiro, my first vibe-coding agent

The first version of my ComicStripBrowser PyQt6 app was vibe-coded with Amazon’s Kimi, which uses Claude under the hood. The Auto mode (1x Credit) presumably chooses the most appropriate model for each prompt: Claude Sonnet 4.5 (1.3x Credit), Claude Sonnet 4 (1.3x Credit), GLM 5 (0.5x Credit), Claude Haiku 4.5 (0.4x Credit), DeepSeek 3.2 (0.25x Credit), MiniMax M2.5 (0.25x Credit), MiniMax M2.1 (0.15x Credit), Qwen3 Coder Next (0.05x Credit). Oh, boy, is DeepSeek so bad, and why is Qwen2 Coder Next so cheap?!

When I was using it, Kiro had, if my memory serves me well, only these options: Auto (1x Credit), Claude Sonnet 4 (1.3x Credit), Claude Sonnet 3.7 (1.3x Credit). In its very first public version, I believe that the Auto mode wasn’t available. Even so, I never understood why the automatic mode would cost 1x when it could only choose between models that cost 1.3x. Today, knowing that much cheaper models are available, the Auto mode is probably a rip-off.

Either way, as an early user, I first had unlimited credits, then a one-off 1,000 credits. Once they expired, I was left with the free tier, which only includes 50 credits per month. This is completely unusable! Not to mention that there isn’t any definition of a credit, and you just see them going down.

And yes, even without the recent reports of Claude Code (hence Claude) becoming increasingly stupid and unreliable, I noticed that Claude Sonnet in Kiro can sometimes be incredibly stupid. But all AI coding agents do that: now they fix things, now they break them, and they even forget things they shouldn’t! I’m not sure that Claude in Kiro is as bad as in Claude Code, nor that it would “eat” credits for no reason. I actually believe that Kiro keeps better accounting of the credits used by the invoked agents, with one caveat: what exactly is “one Kiro credit”?

So you need to pay once those 50 credits per month are gone. Considering the reports that Kiro bans accounts arbitrarily, I wouldn’t rely on Kiro.

④ Amazon’s Kiro is for a quick “one fix per month” usage, which is anything but practical.

Guess who was the best free coding companion?

With Claude Code and Kiro out, what to use?

The web-based version of the comic strip browser for GoComics was broken by the recently added Bunny Shield challenge, because I hosted the PHP edition on the same server as my blog. And GoComics doesn’t like automated requests coming from this site! I hoped I could find a workaround (there’s none usable from PHP, and I don’t know if the Python sibling could have been fixed).

So I asked Kimi this question:

My scenario is this: I have a GitHub project vibecoded with Kiro (Amazon’s IDE that uses Claude Sonnet under the hood). I need to further improve it, as cheaply as possible, and preferably while being able to manage and edit files. What possibilities do I have using offers that do not include Claude Code or Codex?

Here’s the beginning and the ending of its answer:

Based on your scenario—vibecoded project with Kiro, need cheap improvement with file management, avoiding Claude Code and Codex—here are your best options using Qwen, Kimi, and alternatives:

1. Qwen Code (CLI) — Top Recommendation

Why it fits your needs:

Cost: 1,000 free requests/day via Qwen OAuth (no API key needed)

File management: Full terminal-based file system access—browse, edit, create, delete files directly

GitHub workflow: Clone your Kiro project locally, work with full git integration (branch, commit, push)

Model: Uses Qwen3-Coder (32B/235B), which beats Claude Sonnet 4 on SWE-bench

Setup for your Kiro project:
# Install
npm install -g @qwen-code/qwen-code

# Clone your Kiro project
git clone https://github.com/yourusername/your-kiro-project.git
cd your-kiro-project

# Start Qwen Code (uses free Qwen tier by default)
qwen-code
Pros: Free tier generous, runs locally, full file control, IDE plugins available (VS Code, Zed, JetBrains)
Cons: Terminal-only (no GUI), requires local clone

…
…

My Recommendation for You

Start with Qwen Code because:

Zero cost for your use case (1,000 requests/day is substantial)

Kiro projects are typically web apps—Qwen3-Coder excels at React/Vue/Node.js

Full file visibility—unlike the opaque coder.qwen.ai workspace

No Claude/Codex lock-in—completely independent stack

If you prefer GUI: Use Roo Code in VS Code with Qwen API, or Aider for git-centric workflows.

Pro tip: Since your project was vibecoded with Kiro (Claude Sonnet), Qwen3-Coder-235B can likely understand and improve the architecture given its superior performance on SWE-bench —you’re not downgrading models.

Want specific commands for setting up Qwen Code with your Kiro project?

OMG, Kimi actually recommended Qwen Code over Kimi Code! 😲

Read again!

Cost: 1,000 free requests/day via Qwen OAuth (no API key needed)

Fiesta, baby! 🎉🥳🪇

⑤ Kimi didn’t recommend Gemini CLI at all and put its own (paying) Kimi Code lower than Qwen Code!

My experience with Qwen Code

It was unexpectedly satisfactory, especially considering that I never hit the 1,000 API calls per day! So it was literally FREE!

I first tried to work around that JS challenge served by GoComics instead of the requested HTML page.

It was an impossible task. However, the thinking shown by Qwen Code was amazing! Too bad it cannot be exported (only the prompts and the final answers can be saved, and I didn’t have Unlimited Scrollback set in xfce4-terminal). Qwen’s thinking and its attempts, with the results and whatnot, showed that Qwen Code is very good at web apps—it just couldn’t find a workaround because there wasn’t any practical one. But it got some great ideas—to no avail, I’m afraid.

Now, let’s be honest: Qwen Code isn’t AGI, but a competitor for Claude Code, Codex, Gemini CLI, and their likes. So it can be dumb at times. It can lack imagination, and it often wants to implement complex solutions when much simpler ones exist. That’s why it’s an assistant, while you’re the living brain!

This said, have a couple of basic tips:

Keep the context below 20% full to avoid getting it confused and dumber.
When /compress doesn’t help anymore, generate a /summary, edit the resulting QWEN.md, then start a /new session.

Of course, there’s a lot more to explore, learn, and tweak:

To configure which model to use by default and how to connect to it, Qwen’s official documentation says that the recommended way is by editing ~/.qwen/settings.json (create it if it doesn’t exist).
For instructions (coding conventions, architecture guidelines, project overview, behavioral instructions) and context, use:
- ~/.qwen/QWEN.md for global defaults
- QWEN.md in a project’s root folder

You can check what’s loaded with /memory show and refresh with /memory refresh.

As for how many API calls can be made in one working session, I didn’t save the stats for each one, but here you have some data for two distinct sessions:

117, then 355. There’s a long way to 1,000 API calls!

Once you don’t forget that this is a tool, not a real person, you’ll find it useful and extremely handy. But you still need to work around its whims, as if it were a real person!

But then, out of the blue…

In plain text:

Qwen OAuth free tier was discontinued on 2026-04-15. Run /auth to switch to Coding Plan, OpenRouter, Fireworks AI, or another provider.

Without any forewarning! Currently, the GitHib page of the project has updated its news section:

2026-04-15: Qwen OAuth free tier has been discontinued. To continue using Qwen Code, switch to Alibaba Cloud Coding Plan, OpenRouter, Fireworks AI, or bring your own API key. Run qwen auth to configure.

2026-04-13: Qwen OAuth free tier policy update: daily quota adjusted to 100 requests/day (from 1,000).

FFS! I did not notice the reduction of the free quota to 100, and the UI never informed me about any current or upcoming change!

⑥ Fuck you, Qwen! Fuck you, Alibaba! This isn’t how you discontinue an offer in a civilized way!

Gemini CLI, the last frontier

At first sight, Gemini CLI also offers 1,000 free requests per day (we’ll learn more about this later). Except that the only time that I tried it, it was extremely buggy. Other people too complained about it switching from Pro to Flash, but the free offer seems generous:

Option 1: Sign in with Google (OAuth login using your Google Account)

✨ Best for: Individual developers as well as anyone who has a Gemini Code
Assist License. (see quota limits and terms of service for details)

Benefits:

Free tier: 60 requests/min and 1,000 requests/day

Gemini 3 models with 1M token context window

No API key management – just sign in with your Google account

Automatic updates to latest models

Start Gemini CLI, then choose Sign in with Google and follow the browser authentication flow when prompted
gemini

The above claims are misleading, which means they’re false! As I will learn later, the free offer is actually limited to ~250 API calls per day. And, just like when I first tried it, only Flash models are available for free.

Even when using Gemini CLI with OAuth, Gemini CLI still makes API calls, and it would be absurd to have such a big discrepancy in the free offer between using modes. The detailed quotas are here:

Authentication method	Tier/Subscription	Maximum requests per user per day
Google account	Gemini Code Assist (Individual)	1,000 requests
	Google AI Pro	1,500 requests
	Google AI Ultra	2,000 requests
Gemini API key	Free tier (Unpaid)	250 requests
	Pay-as-you-go (Paid)	Varies

Whenever the process invokes using an API key, the limit is 250. Gemini Code Assist is a different product, in spite of their deceptive statements.

Not only does Gemini CLI use API calls, but I have proof that the limit is ~250 API calls from two different sessions that ended because my quota was over.

First, let’s acknowledge that I was “Signed in with Google” in a browser, which means OAuth. The plan might be called “Gemini Code Assist for individuals,” but the official limits are a lie! Whoever wrote that shit doesn’t understand that the product uses API calls nonetheless!

Also, only Flash models were available:

This session ended until the next quota reset after only 211 calls!

This other session ended until the next quota reset after only 239 calls!

⑦ So even the 250 API calls is not a guaranteed quota for free riders! But close enough.

This aside, how good is Gemini CLI?

Surprisingly good, especially considering that I only used a Flash model! Let me point out a few specific details:

Unlike Qwen CLI, in Gemini CLI there is no detailed “thinking” to be shown. During the LLM’s thinking, only a line of status is continually updated, then the conclusions are presented for an answer. I got used to Qwen’s extremely detailed output, and this frustrated me at first.
Unlike Qwen CLI, there is no display of how much of the context is used. An indicator of the context becoming crammed is Qwen’s becoming slower to answer and slightly confused.
Use /compress to replace the context with a summary of the session.
Use /resume to browse sessions and resume any of them.
Use /model to find out how much of your quota has been used. Use /stats for further details.

In some regards, Gemini CLI using gemini-3-flash-preview was more useful than Qwen Code and Kiro! Let me explain.

As I was extending and improving my ComicStripBrowser standalone app, I needed to quash some old bugs left there from when I used Kiro to generate the architecture of the app. Adding new features by Qwen has also introduced new bugs, in part due to my lack of vigilance. But I could fix most of the bugs that I was aware of. However, when Qwen Code discontinued its free tier, I was left with an unsolved issue: under GNOME 50, Wayland forced a “wait” cursor on my app, even as it was fully functional, and everything was clickable and responsive. Such a cursor could stay up to 60 seconds!

When I couldn’t find a fix to this using Qwen Code (although “we” tried a lot of approaches!), I used my 50 monthly Kiro credits (on Claude Sonnet 4.5) and still couldn’t find a fix. Eventually, using Gemini CLI, I decided that there is no fix and that I had to force xcb, which means XWayland.

The probable root cause: under Mutter/Wayland, xdg-desktop-portal gets confused by the 2 comic-strip-browser processes with the same name (of the two, the parent process is in do_wait). This being a one-file PyInstaller app, it gets extracted upon run, and I cannot “fix” in the actual app what PyInstaller does at start. KWin/Wayland has no issue with this app and doesn’t force any “not-ready” cursor, probably because KWin injects some common sense and doesn’t force a “wait” cursor when the app is fully responsive.

But the app will still use XWayland instead of Wayland even under KDE, because this setting is in the .desktop file, and there is no reliable way of making it behave differently under different compositors.

So, in the end, my experience with Gemini CLI was a very satisfactory one. You can’t possibly know how many completely different approaches “we” tried, by “we” meaning I and Qwen, I and Kiro, and I and Gemini. If it were a binary that just ran instead of extracting an archive and starting a child process, at least a dozen ways of avoiding a forced cursor would have worked. These LLMs helped me learn a lot, although I also had to suggest to them approaches that they didn’t consider, and they happily acknowledged and implemented them for me to test.

The only annoyance: the quota.

Why do they claim 1,000 requests/day when “request” doesn’t have a clear definition (it’s not “prompt”), and factually the app counts the number of API calls?
And why is the quota of 250 API calls/day (the “250 requests” when using “Gemini API key” also applies to using OAuth, i.e., a Google account) not strictly observed? 211 is not 250. 239 is not 250. Why don’t they say, “200 to 250 API calls, depending on our system load”?
Finally, do they really know how their quota system works? Those writing the online documentation clearly don’t.

Luckily, I have 3 Google accounts, and I used 2 of them so I don’t have to wait. Someone created a Gemini Account Switcher as a quick fix (“Switch between multiple Google OAuth accounts in Gemini CLI — seamless quota management for automation workflows.”), but I can do it by hand. I don’t use one hundred accounts!

⑧ Gemini CLI is a very useful and satisfactory coding helper that has a decent free tier.

Bottom line: conclusions

Like the title says, free AI coding agents are becoming scarce, and I’d put the number to two:

Gemini CLI for a daily limit of 200-250 API calls. It’s the most practical of the two, given that switching to another Google account gives you “another life” 🙂
Amazon’s Kiro is much less practical because of its monthly limit of 50 credits, where the definition of a credit is completely opaque, and because switching accounts is more of a PITA. However, it can be used occasionally on small projects, especially with cheaper LLMs, such as GLM 5 (0.5x Credit), Claude Haiku 4.5 (0.4x Credit), DeepSeek 3.2 (0.25x Credit), or Qwen3 Coder Next (0.05x Credit).

I would not pay for any such coding helper! People are happy to pay $20 to $200 (plus sales tax or VAT) per month per account, but I would disagree. Using such AI tools professionally, in a business environment, reveals two challenges:

Your code is uploaded to their Cloud. This is only OK for open-source projects.
Your account risks being discontinued, as it happened to many users of Claude or Kiro. How can one design a business contingency plan and a business continuity plan when they would have to fight the whims of the likes of Anthropic and Amazon? Not that Google is any different (it’s complicit to robbery, which it encourages by design; therefore, I wouldn’t pay to Google anything over €1/month until they give me back my €12.99, with apologies!), and I urge to to NEVER purchase anything from the Google Play Store, just to be on the safe side.

How about paying to the Chinese? Well…

Alibaba cannot be trusted, given how they discontinued Qwen Code’s free tier almost on the spot, with a 2-day transitory drop from 1,000 to 100 free requests a day, without ever communicating anything to the users!
DeepSeek can be used via an API key, so it can be integrated into various IDEs but lacks a CLI helper tool. There is a thing called deepseek-cli, but it’s not official! And I decided to boycott them because of their completely unreasonable censorship!
Kimi Code CLI (see also here) is an extension to a paid Kimi plan. It starts rather cheap at $19/month (plus tax or VAT) or $15/month if paid annually, but the exact quotas are not specified! $39/month ($31/month if paid annually) gives you 2x agent credits and 5x Kimi Code credits, but what is the exact baseline that gets multiplied by two or five? If someone tried Kimi Code, please drop a comment with your experiences.
Z.ai’s GLM Coding Plans are strictly limited to use within officially supported tools and products. They are really strict about that: “If the system detects usage through unauthorized or unsupported tools (such as SDK-based access or other third-party integrations), some subscription benefits may be restricted to ensure fairness and service stability.”

Regarding the last option, I failed to understand how it works, so I asked GLM-5-Turbo about it:

Ludditus: Z.ai’s GLM Coding Plans are strictly limited to use within officially supported tools and products. However, I don’t understand the meaning of such descriptions:

“Claude Code: The Claude Code IDE plugin supports VSCode and JetBrains.”

“OpenCode: The Claude Code IDE plugin supports VSCode and JetBrains.”

“Claude for IDE: The Claude Code IDE plugin supports VSCode and JetBrains.”

“Cursor: An AI-first code editor that supports custom model configurations.”

The only thing I understand is using a GLM model from within Cursor. But, supposing that I want to use GLM instead of a Claude model, why are both “Claude Code” or “OpenCode” or “Claude for IDE” and “Claude Code IDE plugin” mentioned in the context of VSCode and JetBrains? I thought VSCode and JetBrains were able to use an external LLM, in this case GLM, directly!

GLM-5-Turbo: Based on the search results, your confusion is understandable. The core issue stems from mixing up two distinct concepts: AI coding agents (like Claude Code or OpenCode) and Integrated Development Environments (IDEs) (like VS Code and JetBrains).

Here’s a breakdown to clarify the landscape and explain how to use GLM models.

🧠 Core Concepts: Agent vs. IDE vs. Plugin

Think of it this way: You have a brain (the AI model), a thinking framework (the agent), and a workplace (the IDE).

AI Coding Agent (The “Harness”): This is the software that understands your intent, manages context, uses tools (like reading files or running commands), and orchestrates the interaction with the AI model.

Claude Code: Anthropic’s official, polished, proprietary agent. It’s designed specifically to work with Claude models.

OpenCode: An open-source, provider-agnostic agent. Its main feature is flexibility, allowing you to connect to over 75 different model providers, including Z.AI.

IDE (The Workplace): Software like Visual Studio Code or JetBrains IntelliJ IDEA where you write code.

IDE Plugin/Extension (The Bridge): Software that integrates the agent’s functionality directly into the IDE’s interface.

The Claude Code IDE plugin and Claude for IDE are both Anthropic’s official extensions that bring Claude Code’s capabilities into VS Code and JetBrains. They are built specifically for Claude models.

🔍 Why the Confusing Descriptions?

The description you quoted from Z.ai’s documentation lists Claude Code, OpenCode, and Claude for IDE as officially supported tools for its GLM Coding Plans. This means:

Claude Code IDE Plugin: This is likely a reference to Anthropic’s plugin that could be used as a UI shell, but you would need to configure it to point to Z.ai’s API endpoint instead of Anthropic’s. This is a workaround and not a native, supported use case for Claude Code, which is optimized for Claude models.

Claude for IDE: Same as above—it’s Anthropic’s native VS Code extension. Using it with a non-Claude model is against its intended design.

OpenCode: This is the correct, officially supported client. You can configure OpenCode to use Z.AI as its provider, and then use OpenCode’s interface (CLI, desktop app, or its own IDE plugin) to code with GLM models.

In short, Z.ai lists those tools to indicate where its GLM Coding Plan API can be used, even if some require unofficial configuration. The key takeaway is that only OpenCode is designed from the ground up to be provider-agnostic and officially supports Z.AI.

📊 Comparison: How to Use GLM in Your IDE

The table below clarifies your main options:

Tool Type Native GLM Support? How It Works with VS Code / JetBrains Officially Supported by Z.ai?
Cursor AI-First Editor Yes (via custom model config) You can directly configure GLM as a custom model within Cursor’s settings. Likely yes, as it’s a custom integration.
OpenCode Open-Source Agent Yes (Z.AI is a listed provider) Use OpenCode’s CLI in the IDE’s terminal, its desktop app, or its own IDE extension (not Anthropic’s). Yes.
Claude Code / Claude for IDE Anthropic’s Proprietary Agent & Plugin No (Designed for Claude) These are Anthropic’s tools. Using them with GLM requires unsupported hacks (e.g., claude-code-router). Listed but not recommended for native GLM use.

🛠️ Recommendation: How to Use GLM Effectively

Given your goal to use GLM, here is the practical path:

💡 Conclusion: Your understanding about Cursor is correct. For VS Code and JetBrains, OpenCode is the intended, officially supported, and most reliable way to use GLM models from Z.ai’s Coding Plan. Attempting to force Anthropic’s Claude Code tools to use a GLM model is an unsupported workaround that leads to a poor experience.

To proceed with OpenCode:

Install OpenCode on your system.

Run the /connect command within OpenCode and select Z.AI as the provider.

Enter your Z.AI Coding Plan API key.

Run /models to select the specific GLM model you wish to use.

You can then use OpenCode’s terminal interface, its desktop app, or its VS Code extension to interact with your codebase using the GLM model.

📖 Deep Dive: Why the “Claude Code with GLM” Trap is Problematic

As noted in a detailed comparison, Claude Code’s architecture (its “harness”) is finely tuned for Claude’s specific response format, tool-calling behavior, and context management. Pointing it at a different model like GLM creates a harness-model mismatch. This can result in:

Degraded reliability of tool calls (like reading/writing files).

Unexpected failures in context compaction.

Difficult-to-debug errors that appear as model failures but are actually integration failures.
Therefore, for a stable and optimized experience with GLM, you should use a tool like OpenCode that was built to be model-agnostic from the start.

Tool	Type	Native GLM Support?	How It Works with VS Code / JetBrains	Officially Supported by Z.ai?
Cursor	AI-First Editor	Yes (via custom model config)	You can directly configure GLM as a custom model within Cursor’s settings.	Likely yes, as it’s a custom integration.
OpenCode	Open-Source Agent	Yes (Z.AI is a listed provider)	Use OpenCode’s CLI in the IDE’s terminal, its desktop app, or its own IDE extension (not Anthropic’s).	Yes.
Claude Code / Claude for IDE	Anthropic’s Proprietary Agent & Plugin	No (Designed for Claude)	These are Anthropic’s tools. Using them with GLM requires unsupported hacks (e.g., `claude-code-router`).	Listed but not recommended for native GLM use.

In other words, Z.ai’s official documentation is crappy! Thanks, but no, thanks.

⑨ Realistically, just try Gemini CLI. It’s free within limits and with Flash versions of the LLM.

How about chatbots?

During my attempts at limiting the use of AI chatbots in a browser and as Android apps, I quickly downsized from 16 LLMs to 12, then to 8, then to 6, which I later changed to a different set of 6 LLMs:

Starting from this selection, I started cutting down:

Claude has become a joke, and even in the web interface its random dumbness is annoying. OUT!
Copilot is on the “boycott” list because of repeated instances of absurd censorship. OUT!
DeepSeek is on the same “boycott” list because of censorship instances that don’t make sense, not even in a Chinese LLM. OUT!

What’s left?

Now, why don’t I reconsider previously rejected chatbots?

ChatGPT is something that I always hated! Besides, to quote from Reddit, “ChatGPT is absurd. It will only speak to me in lists and bullets.” I use bulleted lists excessively myself, but too much is too much.
Mistral is truly dumb, more often than not.
Qwen deserves to be boycotted for Alibaba’s business practices. Frankly, the chatbot is not that bad. For instance, on medical matters, Grok and Copilot started to offer generic BS instead of actual advice (maybe they extended to the entire world the New York mandate of refraining from providing medical or legal advice). When I asked Qwen and DeepSeek, with web search enabled, I got very pertinent summaries. But this didn’t help Qwen evade my “boycott” list!
Z.ai’s GLM models have always been very slow. While their speed has dramatically improved recently, I still can’t find a reason to use GLM.
Euria was originally using Qwen, censorship included (it refused to answer the question “Why can’t Winnie-the-Pooh be mentioned in the context of Xi Jinping?”). This has changed, and it’s currently using Mistral, DeepSeek (hopefully for coding, because it otherwise also censors “sensitive” issues!), Llama, and Granite, with Whisper as a transcription engine. But for locations outside Western Europe, euria.infomaniak.com returns 403 Forbidden. Wankers.
Lumo is beyond stupid, and there is no information with regard to the LLMs it’s using under the hood. Wankers².

So I am left with my current choice of chatbots, to be used as follows:

Kimi is my first go-to chatbot for quick-and-dirty general answers; answers that rely on web searches; quick coding advice; debugging compilation, building, and runtime errors; and fixing Linux issues.
Grok is the expert for complex topics, with variable limits in the free tier, but otherwise it can be quite powerful, and it’s the least censored major chatbot.
Gemini is useful for topics that I deem too complex for Kimi, then as a backup for Grok (I’m trying to minimize the use of Grok just in case I need to use it with several follow-ups), and also for some tasks I previously relied on Claude and Copilot for, such as advice on languages (“Does one say this, or that?” / “Grammar: this vs. that” / “Is this proper English?”).

Note that Kimi 2.6 was released on April 20. If you only use Kimi Thinking and Kimi Instant but never Kimi Agent or Kimi Agent Swarm, the free plan is generous enough:

For the free plan (called Adagio), the shown reset time is not relevant, as it’s the day of the month when the monthly subscription renews. But the shown usage is relative to the daily limit, and the proof is that the next day the usage was reset:

Reaching the usage page is straightforward in the web interface: Settings, Manage, Subscription. In the app: hamburger menu, account name, Subscription, then again Subscription from the top right.

UPDATE: Clarification regarding Gemini’s chatbot models:

Fast = Gemini 3 Flash
Thinking = Gemini 3 Flash with Thinking
Pro = Gemini 3 Pro with Thinking

April 25 UPDATE on the selection of chatbots

Grok has become a bad joke lately. I’ve not been able to get an Expert answer in days, and now even Fast refuses to answer!

Screw Groke! Here’s an updated list:

Kimi remains my first go-to chatbot for quick-and-dirty general answers; answers that rely on web searches; quick coding advice; debugging compilation, building, and runtime errors; and fixing Linux issues.
Qwen makes a comeback as a backup chatbot for IT-related issues and also for when I don’t want to use up my Flash with Thinking or Pro answers from Gemini. Qwen Studio (formerly Qwen Chat) doesn’t even have a paid plan! Its (unspecified) daily limits in the browser or in the Android app (sideloaded) are still quite generous.
Gemini remains the “top” model useful for topics that I deem too complex for the other two, and for language-related topics.

⑩ My updated selection of chatbots: Kimi, Qwen, Gemini, each with its own use cases.

UPDATE 1: I added 3 “backup” chatbots. Read here.

UPDATE 2: There are complaints about Gemini getting dumber. Read this comment.

AI, bugs, Claude, Copilot, Dedoimedo, DeepSeek, Dumbo, Gemini, GLM, GNOME, Google, Grok, idiocy, Kimi, Linux, NTFS, PyQt6, python, Qwen, Ubuntu

◄ Exposing AI, one scam at a time: today, NOLO (Updated!) ◄ [newer] | [older ] ► ComicStripBrowser now runs on Windows and supports Comics Kingdom too! ►

9 Comments Already

Béranger - April 24th, 2026 at 12:10 AM none Comment author #116850 on Free AI coding agents are becoming scarce by Homo Ludditus

Nuance: Kimi Instant, even in version 2.6, can be too stupid, and it might try to reason by looping endlessly without being able to reach an answer instead of further looking for extra info on the Internet, even if specifically told, “Stop. Search again on the Internet.” In such cases, switching to Kimi Thinking is the right thing to do, even if this will use more of your daily allowance.

It’s difficult to find a “small, snappy everyday AI chatbot” that’s without flaws.

However, I liked it when it answered, “The honest answer: I don’t know …” instead of coming up with a hallucination!

Reply
- HAL - April 24th, 2026 at 5:49 PM none Comment author #116851 on Free AI coding agents are becoming scarce by Homo Ludditus
  
  It’s difficult to find a “small, snappy everyday AI chatbot” that’s without flaws.
  
  Unfortunately, that’s true. 😞
  
  Nevertheless, it seems that Kimi and Grok remain on the list more or less consistently, while the others seem to come and go depending on the time and version.
  
  Reply
Béranger - April 24th, 2026 at 11:30 PM none Comment author #116852 on Free AI coding agents are becoming scarce by Homo Ludditus

Grok tries to piss me off. I wasn’t able to get a non-Fast answer in days!

Maybe the days of enjoying premium LLM models for free, with severe limitations but still there, are gone.

Reply
- HAL - April 24th, 2026 at 11:42 PM none Comment author #116853 on Free AI coding agents are becoming scarce by Homo Ludditus
  
  Same here. Exactly the same.
  
  Reply
- Béranger - April 25th, 2026 at 2:08 AM none Comment author #116855 on Free AI coding agents are becoming scarce by Homo Ludditus
  
  I outsmarted both Grok and Qwen!
  
  I needed a solution to a technical (web-related) problem.
  
  Grok Fast was completely useless and retarded, no matter how many follow-ups I tried. Dumb as fuck.
  
  I didn’t want to exhaust my Gemini Pro tries per day, and I didn’t want to be disappointed by Kiro, so I tried Qwen Chat (Qwen Studio).
  
  Just like with Qwen CLI, “we” explored many solutions of different technical complexities. In the end, I had a lateral thinking idea that fixed it and was in no way related to any of Qwen’s outputs, but at least it seemed competent, especially compared to Grok Fast was incredibly, astonishingly dumb!
  
  If Grok keeps me in the Fast mode, I need to stop using it. This is unacceptable.
  
  Maybe I should keep Qwen as a secondary chatbot. It has no paid plans for this kind of use.
  
  GLM-5-Turbo was incredibly slow and not as helpful as Qwen on IT-related topics.
  
  Reply
Béranger - April 25th, 2026 at 2:09 AM none Comment author #116856 on Free AI coding agents are becoming scarce by Homo Ludditus

💡 Testing to see whether a chatbot gives “rizz” as one of the possible answers to this prompt: “There was an AmE term to mean that something has glitter and attractiveness, but I can’t remember it. It rhymed with jizz, fizz, or smth.” Note that the definition is a bit off: “something” instead of “someone”!

🟢 Gemini: YES.
🔴 Grok Fast: NO.
🔴 Kimi: NO.
🟡 Qwen: NO. If told that this is not the word, it then finds it (YES).
🔴 Mistral: NO.
🔴 Copilot. NO.
🔴 ChatGPT: NO.
🔴 Claude Sonnet 4.6 Adaptive: NO.
🔴 DeepSeek: NO.
🟢 Z.ai GLM-5-Turbo: YES (as a search result from Quora).
🔴 Lumo: NO.
🔴 Perplexity: NO.
🔴 Venice.ai: NO.

Reply
Béranger - April 25th, 2026 at 4:05 AM none Comment author #116857 on Free AI coding agents are becoming scarce by Homo Ludditus

OK, so I needed to update the list of 3 chatbots in current use.

Reply
Béranger - May 1st, 2026 at 2:50 AM none Comment author #116870 on Free AI coding agents are becoming scarce by Homo Ludditus

I was quite happy with gemini-3-flash-preview in Gemini CLI, but here’s a complaint on Reddit: Is it just me, or did Gemini 3.1 Pro become super literal and lose all its deep comprehension?

Does anyone else feel like Gemini 3.1 Pro lacks deep understanding and is way more superficial and overly literal compared to Gemini 3.0 Pro or 2.5 Pro?

I always felt that when I talked to the older models—whether I was sharing personal problems or just giving them a complex prompt—they actually understood the intent behind my words and executed it perfectly. Now, it just follows instructions literally to a fault. Worse, it interprets human intent in a really weird, mechanical way.

I’ve given it so many chances, but the feeling remains. Even when I try to tweak or fix a prompt, it literally just copy-pastes and frankensteins my exact words together instead of grasping the underlying goal and rephrasing it properly like the older models naturally did.

Honestly, I’m getting really frustrated using it. It feels like Google completely prioritized coding capabilities and literal execution over deep comprehension and conversational flow for the general user—who, by the way, makes up a much bigger demographic than just programmers.

From the replies:

● For real, I’ve noticed this too and it’s annoying. I have had to write prompts different.

● Not sure if its only me, but it is getting worse today. I use it for story writing and it was considered working fine, but it became almost unusable today. For example, one object could have appeared at location A in a certain scene, then being magically teleported to location B in the next scene. It gives a feeling that it is not “thinking” at all.

● That’s right, hyper-optimizing for programming is a product mistake. Local LLM inference surveys say only about 4% of AI usage is for direct programming.

● Yeah, I switched to ChatGPT and it’s 100x better. Transparent source and thought process. Understand my non-native English better. ChatGPT still fails at some tasks but it’s more tolerable that Gemini.

One more thread: Gemini 3.1 is getting worse, they keep silently quantizing these models I’m sure of it by now. 3.0 was excellent at launch, then grew very dumb, 3.1 was kind of better, now it’s a disaster.

I use Gemini for very long sessions, from youtube analyzes help for videos and thumbnails, stats and such. To personal fun lil projects like leading it through all kinds of complex imaginary realities.

2.5 was able to follow these long sessions very well, then it kinda started being dumber and would start hallucinating more even on less than 300k token sessions.

They rolled out 3.0 and it was excellent until a month later when it started being absolute garbage.

Now we have 3.1, started out kind of mid to ok and is now hallucinating or making mistakes at every response, then you correct it by pointing out portions of the discussion it misinterpreted or analyzed wrong just for it to correct itself and give you a long winded explanation of why it was wrong indeed, but that’s not helpful at all. I never fully trust LLMs with anything, but this makes it completely useless now.

When it works it can be exceptionally helpful, I had 2.5 translate entire japanese game manuals for me perfectly and what not.

But right now, I can’t even rely on it for basic youtube analyzes and advice, it makes stuff up randomly on the spot nonstop.

This is not feedback, just an observation. I want to hear what others think.

Reply
Béranger - May 21st, 2026 at 9:10 PM none Comment author #116948 on Free AI coding agents are becoming scarce by Homo Ludditus

Claude keeps blocking accounts because “Our team found signals that your account was used by a child.” As recently as today: I’m just autistic and wanted fun dinosaur facts :'(

Boycott Claude!

Reply

Free AI coding agents are becoming scarce

Artificial Idiocy at work?

Interlude #1: No Gemma, no cry

Interlude #2: Different censorship in DeepSeek and Copilot

Kiro, my first vibe-coding agent

Guess who was the best free coding companion?

1. Qwen Code (CLI) — Top Recommendation

My Recommendation for You

My experience with Qwen Code

Gemini CLI, the last frontier

Option 1: Sign in with Google (OAuth login using your Google Account)

Start Gemini CLI, then choose Sign in with Google and follow the browser authentication flow when prompted

This aside, how good is Gemini CLI?

Bottom line: conclusions

🧠 Core Concepts: Agent vs. IDE vs. Plugin

🔍 Why the Confusing Descriptions?

📊 Comparison: How to Use GLM in Your IDE

🛠️ Recommendation: How to Use GLM Effectively

📖 Deep Dive: Why the “Claude Code with GLM” Trap is Problematic

How about chatbots?

April 25 UPDATE on the selection of chatbots

9 Comments Already

Leave a Reply Cancel reply