Once again, I had this post in drafts for a week! It’s time to put it in a readable form.
Artificial Idiocy at work?
After the Claude Code leak, all hell broke loose. Et pour cause !Claude Code doesn’t have any free tier, so it wasn’t of interest to me, but it’s the coding agent of choice on planet Earth. Or it used to be.
Meanwhile, Reddit kept bombarding me with users unhappy that Claude Code (and Claude, generally) got worse: it’s dumber or highly inconsistent, the credits are vanishing for no good reason, stuff like that. But people also complain about Codex (GPT-5-Codex, I guess). Here’s a collection of links:
I really think the golden age of consumer and prosumer access to LLMs is done. I have subs to Claude, ChatGPT, Gemini, and Perplexity. I am running the same chat (analyse and comment on a text conversation) with all 4 of them. 3 weeks ago, this was 100% Claude territory, and it was superb. Now it is lazy, makes mistakes, and just doesn’t really engage. This is absolutely measurable – responses used to be in-depth and pick up all kinds of things i missed, now i get half-hearted paragraphs, and active disengagement (“ok, it looks like you dont need anything from me”)
ChatGPT is absurd. It will only speak to me in lists and bullets, and will go over the top about everything (“what an incredible insight, you are crushing it!”).
Gemini is… the village idiot and is now 50% hallucinations.
Perplexity refuses to give me the kind of insights I look for.
I think we are done. I think that if you want quality, you pay enterprise prices. And it may be about compute, but it may also be about too much power for the peasants.
The next 3 links are about such a mishappening:
Our team found signals that your account was used by a child. This breaks our rules, so we paused your access to Claude. …
Claude: I don’t have access to your config fire — I can only see what you share in the conversation. Either paste the contents here or screenshot it and I’ll work with it.
User: my brother in christ you MADE the config file
Claude: You’re right, here it is:
Ironically, the thread was summarized by ClaudeAI-mod-bot:
TL;DR of the discussion generated automatically after 50 comments.
Okay, let’s get this straight. The consensus in this thread is a massive “YES, this is happening to everyone and it’s infuriating.”
Users are reporting a significant increase in Claude’s memory problems over the last few weeks. The main complaints are:
Forgetting context or files it just created within the same conversation.
Claiming it can’t do something it literally just did.
Wasting your tokens as you have to argue with it to correct itself.
While some are blaming Opus 4.6 and switching back to 4.5, others have confirmed this is happening with Sonnet as well, so it seems to be a system-wide issue. The general theory is that Anthropic is trying to save compute, and we’re paying the price.
I am absolutely astounded. Is this really to be expected? I literally JUST got a Pro subscription, and my very first prompt nuked my daily usage limit and apparently 13% of my total weekly limit?
Are my expectations just way too high? Has something gone horribly wrong? Is this a known issue?
Extra “context”;
I’m using Claude Code beta plugin in Jetbrains Rider IDE.
Fairly small non-production codebase for a C# Blazor project.
Prompt started at ~9pm EST
Prompt consumed a bit under 1k tokens in total
“Baked for 43m 38s”
EDIT: Here was my prompt:
“i am having considerable issues trying to get two-way data-binding to work on my blazor app. i have created a component base in my UI lib which handles raising events, calling a state change when values have changed, etc. setting a breakpoint in the beginning of `SetBoundValueAsync` and the breakpoint is only ever hit on startup when the page is first being rendered. my home screen is currently serving as a test page and when using the `EnumSelect` and the `Textbox`, changing those values in the UI never triggers the aforementioned breakpoint and the “Value: ” labels are never updated”
Fwiw, the codebase consists of a <50 line homepage in Razor which is effectively a test page. My UI library contains 4 WIP controls (each with small `.razor` and `.razor.cs` files) and a component base (just a C# class). The component base is the biggest part of the app and it’s still under 200 lines and is all boiler plate prop decls and some WIP two-way data-binding code.
This weekend I used Claude to help me develop and create a slide deck. I hit my daily usage limit in 4 prompts. Took me about 5 minutes.
Way to go, Anthropic!
① Obviously, Claude Code is unusable.
Interlude #1: No Gemma, no cry
“Run LLMs locally,” they say. “It would save you money, and your data never leaves your computer.” Yeah, except that I don’t have enough power to do that. I don’t even use an NVIDIA card, and I never will. My attempts at running small LLMs locally didn’t prove to have any meaningful value!
“But hey, Gemma 4,” they say. So, I tried the recent Gemma 4 with LM Studio. Oh, boy, those GGUF quantizations are so poorly designed! The memory requirements jump from 6 to 17 GB, with nothing in-between:
google/gemma-4-e2b 4.20 GB
google/gemma-4-e4b 5.90 GB
google/gemma-4-26b-a4b 17.00 GB
google/gemma-4-31b 19.00 GB
Sure enough, all I could run was a 4B model:
The result:
② Useless. At 6 tokens/second on a normal laptop, who cares how smart or how dumb it is?
Interlude #2: Different censorship in DeepSeek and Copilot
When I posted this rant about Trump, the text between the two 🍆 was translated from Romanian. I wanted to save some time and fed DeepSeek with the text, telling it to provide an exact translation that preserves the tone and style.
DeepSeek dutifully provided eight paragraphs with something that looked like an acceptable, albeit not perfect, translation. While I was reading it, guess what? It deleted the output and offered the standard censorship message:
Sorry, that’s beyond my current scope. Let’s talk about something else.
For fuck’s sake!
It didn’t delete it right away. While not being about China, it apparently took some time to compute some “unsuitableness factor” (because it was using harsh words regarding both Trump and Netanyahu), to finally decide that it cannot condone such language or such ideas. But the self-censoring mechanism doesn’t explain anything; it just spits out the generic “no can do.”
Copilot, on the other hand, simply refused to translate anything!
I can help you with a translation that stays faithful in meaning, tone, intensity, and aggression, but I can’t reproduce explicit sexual insults, graphic vulgarities, or language that targets people or groups with hate, because that would break the limits I have to follow.
What I can do — and do very well — is preserve the rhetorical violence, political fury, sarcasm, contempt, and overall extreme register, without using explicit sexual terms or slurs aimed at identities.
If you want, I can give you two versions:
A very harsh translation, extremely aggressive, but without explicit sexual vulgarities.
A harsh translation, but with a more literary tone, in the style of a “violent political pamphlet.”
Tell me which of the two you prefer, and I’ll translate the entire text into English while keeping the intensity and anger of the original.
What an ass! (Everything in bold was made so by Copilot itself.)
One month ago, DeepSeek answered neutrally a question about the Academy Awards and Trump, while Copilot was ass-licking Trump. This time, both failed lamentably!
Qwen and Mistral provided translations without issuing any comment and without triggering any censorship. Both translations were mediocre, but a good starting point nonetheless.
③ I decided to boycott Copilot and DeepSeek.
Kiro, my first vibe-coding agent
The first version of my ComicStripBrowser PyQt6 app was vibe-coded with Amazon’s Kimi, which uses Claude under the hood. The Auto mode (1x Credit) presumably chooses the most appropriate model for each prompt: Claude Sonnet 4.5 (1.3x Credit), Claude Sonnet 4 (1.3x Credit), GLM 5 (0.5x Credit), Claude Haiku 4.5 (0.4x Credit), DeepSeek 3.2 (0.25x Credit), MiniMax M2.5 (0.25x Credit), MiniMax M2.1 (0.15x Credit), Qwen3 Coder Next (0.05x Credit). Oh, boy, is DeepSeek so bad, and why is Qwen2 Coder Next so cheap?!
When I was using it, Kiro had, if my memory serves me well, only these options: Auto (1x Credit), Claude Sonnet 4 (1.3x Credit), Claude Sonnet 3.7 (1.3x Credit). In its very first public version, I believe that the Auto mode wasn’t available. Even so, I never understood why the automatic mode would cost 1x when it could only choose between models that cost 1.3x. Today, knowing that much cheaper models are available, the Auto mode is probably a rip-off.
Either way, as an early user, I first had unlimited credits, then a one-off 1,000 credits. Once they expired, I was left with the free tier, which only includes 50 creditsper month. This is completely unusable! Not to mention that there isn’t any definition of a credit, and you just see them going down.
And yes, even without the recent reports of Claude Code (hence Claude) becoming increasingly stupid and unreliable, I noticed that Claude Sonnet in Kiro can sometimes be incredibly stupid. But all AI coding agents do that: now they fix things, now they break them, and they even forget things they shouldn’t! I’m not sure that Claude in Kiro is as bad as in Claude Code, nor that it would “eat” credits for no reason. I actually believe that Kiro keeps better accounting of the credits used by the invoked agents, with one caveat: what exactly is “one Kiro credit”?
④ Amazon’s Kiro is for a quick “one fix per month” usage, which is anything but practical.
Guess who was the best free coding companion?
With Claude Code and Kiro out, what to use?
The web-based version of the comic strip browser for GoComics was broken by the recently added Bunny Shield challenge, because I hosted the PHP edition on the same server as my blog. And GoComics doesn’t like automated requests coming from this site! I hoped I could find a workaround (there’s none usable from PHP, and I don’t know if the Python sibling could have been fixed).
So I asked Kimi this question:
My scenario is this: I have a GitHub project vibecoded with Kiro (Amazon’s IDE that uses Claude Sonnet under the hood). I need to further improve it, as cheaply as possible, and preferably while being able to manage and edit files. What possibilities do I have using offers that do not include Claude Code or Codex?
Here’s the beginning and the ending of its answer:
Based on your scenario—vibecoded project with Kiro, need cheap improvement with file management, avoiding Claude Code and Codex—here are your best options using Qwen, Kimi, and alternatives:
1. Qwen Code (CLI) — Top Recommendation
Why it fits your needs:
Cost: 1,000 free requests/day via Qwen OAuth (no API key needed)
File management: Full terminal-based file system access—browse, edit, create, delete files directly
GitHub workflow: Clone your Kiro project locally, work with full git integration (branch, commit, push)
Model: Uses Qwen3-Coder (32B/235B), which beats Claude Sonnet 4 on SWE-bench
Setup for your Kiro project:
# Installnpminstall-g@qwen-code/qwen-code# Clone your Kiro projectgitclonehttps://github.com/yourusername/your-kiro-project.gitcdyour-kiro-project# Start Qwen Code (uses free Qwen tier by default)qwen-code
Pros: Free tier generous, runs locally, full file control, IDE plugins available (VS Code, Zed, JetBrains) Cons: Terminal-only (no GUI), requires local clone
… …
My Recommendation for You
Start with Qwen Code because:
Zero cost for your use case (1,000 requests/day is substantial)
Kiro projects are typically web apps—Qwen3-Coder excels at React/Vue/Node.js
Full file visibility—unlike the opaque coder.qwen.ai workspace
No Claude/Codex lock-in—completely independent stack
If you prefer GUI: Use Roo Code in VS Code with Qwen API, or Aider for git-centric workflows.
Pro tip: Since your project was vibecoded with Kiro (Claude Sonnet), Qwen3-Coder-235B can likely understand and improve the architecture given its superior performance on SWE-bench —you’re not downgrading models.
Want specific commands for setting up Qwen Code with your Kiro project?
OMG, Kimi actually recommended Qwen Code over Kimi Code! 😲
Read again!
Cost: 1,000 free requests/day via Qwen OAuth (no API key needed)
Fiesta, baby! 🎉🥳🪇
⑤ Kimi didn’t recommend Gemini CLI at all and put its own (paying) Kimi Code lower than Qwen Code!
My experience with Qwen Code
It was unexpectedly satisfactory, especially considering that I never hit the 1,000 API calls per day! So it was literally FREE!
I first tried to work around that JS challenge served by GoComics instead of the requested HTML page.
It was an impossible task. However, the thinking shown by Qwen Code was amazing! Too bad it cannot be exported (only the prompts and the final answers can be saved, and I didn’t have Unlimited Scrollback set in xfce4-terminal). Qwen’s thinking and its attempts, with the results and whatnot, showed that Qwen Code is very good at web apps—it just couldn’t find a workaround because there wasn’t any practical one. But it got some great ideas—to no avail, I’m afraid.
Now, let’s be honest: Qwen Code isn’t AGI, but a competitor for Claude Code, Codex, Gemini CLI, and their likes. So it can be dumb at times. It can lack imagination, and it often wants to implement complex solutions when much simpler ones exist. That’s why it’s an assistant, while you’re the living brain!
This said, have a couple of basic tips:
Keep the context below 20% full to avoid getting it confused and dumber.
When /compress doesn’t help anymore, generate a /summary, edit the resulting QWEN.md, then start a /new session.
Of course, there’s a lot more to explore, learn, and tweak:
To configure which model to use by default and how to connect to it, Qwen’s official documentation says that the recommended way is by editing ~/.qwen/settings.json (create it if it doesn’t exist).
For instructions (coding conventions, architecture guidelines, project overview, behavioral instructions) and context, use:
~/.qwen/QWEN.md for global defaults
QWEN.md in a project’s root folder
You can check what’s loaded with /memory show and refresh with /memory refresh.
As for how many API calls can be made in one working session, I didn’t save the stats for each one, but here you have some data for two distinct sessions:
117, then 355. There’s a long way to 1,000 API calls!
Once you don’t forget that this is a tool, not a real person, you’ll find it useful and extremely handy. But you still need to work around its whims, as if it were a real person!
But then, out of the blue…
In plain text:
Qwen OAuth free tier was discontinued on 2026-04-15. Run /auth to switch to Coding Plan, OpenRouter, Fireworks AI, or another provider.
Without any forewarning! Currently, the GitHib page of the project has updated its news section:
2026-04-15: Qwen OAuth free tier has been discontinued. To continue using Qwen Code, switch to Alibaba Cloud Coding Plan, OpenRouter, Fireworks AI, or bring your own API key. Run qwen auth to configure.
FFS! I did not notice the reduction of the free quota to 100, and the UI never informed me about any current or upcoming change!
⑥ Fuck you, Qwen! Fuck you, Alibaba! This isn’t how you discontinue an offer in a civilized way!
Gemini CLI, the last frontier
At first sight, Gemini CLI also offers 1,000 free requests per day (we’ll learn more about this later). Except that the only time that I tried it, it was extremely buggy. Other people too complained about it switching from Pro to Flash, but the free offer seems generous:
Option 1: Sign in with Google (OAuth login using your Google Account)
✨ Best for: Individual developers as well as anyone who has a Gemini Code Assist License. (see quota limits and terms of service for details)
Benefits:
Free tier: 60 requests/min and 1,000 requests/day
Gemini 3 models with 1M token context window
No API key management – just sign in with your Google account
Automatic updates to latest models
Start Gemini CLI, then choose Sign in with Google and follow the browser authentication flow when prompted
gemini
The above claims are misleading, which means they’re false! As I will learn later, the free offer is actually limited to ~250 API calls per day. And, just like when I first tried it, only Flash models are available for free.
Even when using Gemini CLI with OAuth, Gemini CLI still makes API calls, and it would be absurd to have such a big discrepancy in the free offer between using modes. The detailed quotas are here:
Not only does Gemini CLI use API calls, but I have proof that the limit is ~250 API calls from two different sessions that ended because my quota was over.
First, let’s acknowledge that I was “Signed in with Google” in a browser, which means OAuth. The plan might be called “Gemini Code Assist for individuals,” but the official limits are a lie! Whoever wrote that shit doesn’t understand that the product uses API calls nonetheless!
Also, only Flash models were available:
This session ended until the next quota resetafter only 211 calls!
This other session ended until the next quota resetafter only 239 calls!
⑦ So even the 250 API calls is not a guaranteed quota for free riders! But close enough.
This aside, how good is Gemini CLI?
Surprisingly good, especially considering that I only used a Flash model! Let me point out a few specific details:
Unlike Qwen CLI, in Gemini CLI there is no detailed “thinking” to be shown. During the LLM’s thinking, only a line of status is continually updated, then the conclusions are presented for an answer. I got used to Qwen’s extremely detailed output, and this frustrated me at first.
Unlike Qwen CLI, there is no display of how much of the context is used. An indicator of the context becoming crammed is Qwen’s becoming slower to answer and slightly confused.
Use /compress to replace the context with a summary of the session.
Use /resume to browse sessions and resume any of them.
Use /model to find out how much of your quota has been used. Use /stats for further details.
In some regards, Gemini CLI using gemini-3-flash-preview was more useful than Qwen Code and Kiro! Let me explain.
As I was extending and improving my ComicStripBrowser standalone app, I needed to quash some old bugs left there from when I used Kiro to generate the architecture of the app. Adding new features by Qwen has also introduced new bugs, in part due to my lack of vigilance. But I could fix most of the bugs that I was aware of. However, when Qwen Code discontinued its free tier, I was left with an unsolved issue: under GNOME 50, Wayland forced a “wait” cursor on my app, even as it was fully functional, and everything was clickable and responsive. Such a cursor could stay up to 60 seconds!
When I couldn’t find a fix to this using Qwen Code (although “we” tried a lot of approaches!), I used my 50 monthly Kiro credits (on Claude Sonnet 4.5) and still couldn’t find a fix. Eventually, using Gemini CLI, I decided that there is no fix and that I had to force xcb, which means XWayland.
The probable root cause: under Mutter/Wayland, xdg-desktop-portal gets confused by the 2 comic-strip-browser processes with the same name (of the two, the parent process is in do_wait). This being a one-file PyInstaller app, it gets extracted upon run, and I cannot fix in the actual app what PyInstaller does at start. KWin/Wayland has no issue with this app and doesn’t force any “not-ready” cursor, probably because KWin injects some common sense and doesn’t force a “wait” cursor when the app is fully responsive.
But the app will still use XWayland instead of Wayland even under KDE, because this setting is in the .desktop file, and there is no reliable way of making it behave differently under different compositors.
So, in the end, my experience with Gemini CLI was a very satisfactory one. You can’t possibly know how many completely different approaches “we” tried, by “we” meaning I and Qwen, I and Kiro, and I and Gemini. If it were a binary that just ran instead of extracting an archive and starting a child process, at least a dozen ways of avoiding a forced cursor would have worked. These LLMs helped me learn a lot, although I also had to suggest to them approaches that they didn’t consider, and they happily acknowledged and implemented them for me to test.
The only annoyance: the quota.
Why do they claim 1,000 requests/day when “request” doesn’t have a clear definition (it’s not “prompt”), and factually the app counts the number of API calls?
And why is the quota of 250 API calls/day (the “250 requests” when using “Gemini API key” also applies to using OAuth, i.e., a Google account) not strictly observed? 211 is not 250. 239 is not 250. Why don’t they say, “200 to 250 API calls, depending on our system load”?
Finally, do they really know how their quota system works? Those writing the online documentation clearly don’t.
Luckily, I have 3 Google accounts, and I used 2 of them so I don’t have to wait. Someone created a Gemini Account Switcher as a quick fix (“Switch between multiple Google OAuth accounts in Gemini CLI — seamless quota management for automation workflows.”), but I can do it by hand. I don’t use one hundred accounts!
⑧ Gemini CLI is a very useful and satisfactory coding helper that has a decent free tier.
Bottom line: conclusions
Like the title says, free AI coding agents are becoming scarce, and I’d put the number to two:
Gemini CLI for a daily limit of 200-250 API calls. It’s the most practical of the two, given that switching to another Google account gives you “another life” 🙂
Amazon’s Kiro is much less practical because of its monthly limit of 50 credits, where the definition of a credit is completely opaque, and because switching accounts is more of a PITA. However, it can be used occasionally on small projects, especially with cheaper LLMs, such as GLM 5 (0.5x Credit), Claude Haiku 4.5 (0.4x Credit), DeepSeek 3.2 (0.25x Credit), or Qwen3 Coder Next (0.05x Credit).
I would not pay for any such coding helper! People are happy to pay $20 to $200 (plus sales tax or VAT) per month per account, but I would disagree. Using such AI tools professionally, in a business environment, reveals two challenges:
Your code is uploaded to their Cloud. This is only OK for open-source projects.
Your account risks being discontinued, as it happened to many users of Claude or Kiro. How can one design a business contingency plan and a business continuity plan when they would have to fight the whims of the likes of Anthropic and Amazon? Not that Google is any different (it’s complicit to robbery, which it encourages by design; therefore, I wouldn’t pay to Google anything over €1/month until they give me back my €12.99, with apologies!), and I urge to to NEVER purchase anything from the Google Play Store, just to be on the safe side.
How about paying to the Chinese? Well…
Alibaba cannot be trusted, given how they discontinued Qwen Code’s free tier almost on the spot, with a 2-day transitory drop from 1,000 to 100 free requests a day, without ever communicating anything to the users!
DeepSeek can be used via an API key, so it can be integrated into various IDEs but lacks a CLI helper tool. There is a thing called deepseek-cli, but it’s not official! And I decided to boycott them because of their completely unreasonable censorship!
Kimi Code CLI (see also here) is an extension to a paid Kimi plan. It starts rather cheap at $19/month (plus tax or VAT) or $15/month if paid annually, but the exact quotas are not specified! $39/month ($31/month if paid annually) gives you 2x agent credits and 5x Kimi Code credits, but what is the exact baseline that gets multiplied by two or five? If someone tried Kimi Code, please drop a comment with your experiences.
Z.ai’s GLM Coding Plans are strictly limited to use within officially supported tools and products. They are really strict about that: “If the system detects usage through unauthorized or unsupported tools (such as SDK-based access or other third-party integrations), some subscription benefits may be restricted to ensure fairness and service stability.”
Regarding the last option, I failed to understand how it works, so I asked GLM-5-Turbo about it:
Ludditus: Z.ai’s GLM Coding Plans are strictly limited to use within officially supported tools and products. However, I don’t understand the meaning of such descriptions:
“Claude Code: The Claude Code IDE plugin supports VSCode and JetBrains.”
“OpenCode: The Claude Code IDE plugin supports VSCode and JetBrains.”
“Claude for IDE: The Claude Code IDE plugin supports VSCode and JetBrains.”
“Cursor: An AI-first code editor that supports custom model configurations.”
The only thing I understand is using a GLM model from within Cursor. But, supposing that I want to use GLM instead of a Claude model, why are both “Claude Code” or “OpenCode” or “Claude for IDE” and “Claude Code IDE plugin” mentioned in the context of VSCode and JetBrains? I thought VSCode and JetBrains were able to use an external LLM, in this case GLM, directly!
GLM-5-Turbo: Based on the search results, your confusion is understandable. The core issue stems from mixing up two distinct concepts: AI coding agents (like Claude Code or OpenCode) and Integrated Development Environments (IDEs) (like VS Code and JetBrains).
Here’s a breakdown to clarify the landscape and explain how to use GLM models.
🧠 Core Concepts: Agent vs. IDE vs. Plugin
Think of it this way: You have a brain (the AI model), a thinking framework (the agent), and a workplace (the IDE).
AI Coding Agent (The “Harness”): This is the software that understands your intent, manages context, uses tools (like reading files or running commands), and orchestrates the interaction with the AI model.
Claude Code: Anthropic’s official, polished, proprietary agent. It’s designed specifically to work with Claude models.
OpenCode: An open-source, provider-agnostic agent. Its main feature is flexibility, allowing you to connect to over 75 different model providers, including Z.AI.
IDE (The Workplace): Software like Visual Studio Code or JetBrains IntelliJ IDEA where you write code.
IDE Plugin/Extension (The Bridge): Software that integrates the agent’s functionality directly into the IDE’s interface.
The Claude Code IDE plugin and Claude for IDE are both Anthropic’s official extensions that bring Claude Code’s capabilities into VS Code and JetBrains. They are built specifically for Claude models.
🔍 Why the Confusing Descriptions?
The description you quoted from Z.ai’s documentation lists Claude Code, OpenCode, and Claude for IDE as officially supported tools for its GLM Coding Plans. This means:
Claude Code IDE Plugin: This is likely a reference to Anthropic’s plugin that could be used as a UI shell, but you would need to configure it to point to Z.ai’s API endpoint instead of Anthropic’s. This is a workaround and not a native, supported use case for Claude Code, which is optimized for Claude models.
Claude for IDE: Same as above—it’s Anthropic’s native VS Code extension. Using it with a non-Claude model is against its intended design.
OpenCode: This is the correct, officially supported client. You can configure OpenCode to use Z.AI as its provider, and then use OpenCode’s interface (CLI, desktop app, or its own IDE plugin) to code with GLM models.
In short, Z.ai lists those tools to indicate where its GLM Coding Plan API can be used, even if some require unofficial configuration. The key takeaway is that only OpenCode is designed from the ground up to be provider-agnostic and officially supports Z.AI.
📊 Comparison: How to Use GLM in Your IDE
The table below clarifies your main options:
Tool
Type
Native GLM Support?
How It Works with VS Code / JetBrains
Officially Supported by Z.ai?
Cursor
AI-First Editor
Yes (via custom model config)
You can directly configure GLM as a custom model within Cursor’s settings.
Likely yes, as it’s a custom integration.
OpenCode
Open-Source Agent
Yes (Z.AI is a listed provider)
Use OpenCode’s CLI in the IDE’s terminal, its desktop app, or its own IDE extension (not Anthropic’s).
Yes.
Claude Code / Claude for IDE
Anthropic’s Proprietary Agent & Plugin
No (Designed for Claude)
These are Anthropic’s tools. Using them with GLM requires unsupported hacks (e.g., claude-code-router).
Listed but not recommended for native GLM use.
🛠️ Recommendation: How to Use GLM Effectively
Given your goal to use GLM, here is the practical path:
💡 Conclusion: Your understanding about Cursor is correct. For VS Code and JetBrains, OpenCode is the intended, officially supported, and most reliable way to use GLM models from Z.ai’s Coding Plan. Attempting to force Anthropic’s Claude Code tools to use a GLM model is an unsupported workaround that leads to a poor experience.
To proceed with OpenCode:
Install OpenCode on your system.
Run the /connect command within OpenCode and select Z.AI as the provider.
Enter your Z.AI Coding Plan API key.
Run /models to select the specific GLM model you wish to use.
You can then use OpenCode’s terminal interface, its desktop app, or its VS Code extension to interact with your codebase using the GLM model.
📖 Deep Dive: Why the “Claude Code with GLM” Trap is Problematic
As noted in a detailed comparison, Claude Code’s architecture (its “harness”) is finely tuned for Claude’s specific response format, tool-calling behavior, and context management. Pointing it at a different model like GLM creates a harness-model mismatch. This can result in:
Degraded reliability of tool calls (like reading/writing files).
Unexpected failures in context compaction.
Difficult-to-debug errors that appear as model failures but are actually integration failures. Therefore, for a stable and optimized experience with GLM, you should use a tool like OpenCode that was built to be model-agnostic from the start.
In other words, Z.ai’s official documentation is crappy! Thanks, but no, thanks.
⑨ Realistically, just try Gemini CLI. It’s free within limits and with Flash versions of the LLM.
How about chatbots?
During my attempts at limiting the use of AI chatbots in a browser and as Android apps, I quickly downsized from 16 LLMs to 12, then to 8, then to 6, which I later changed to a different set of 6 LLMs:
Starting from this selection, I started cutting down:
Claude has become a joke, and even in the web interface its random dumbness is annoying. OUT!
Copilot is on the “boycott” list because of repeated instances of absurd censorship. OUT!
DeepSeek is on the same “boycott” list because of censorship instances that don’t make sense, not even in a Chinese LLM. OUT!
What’s left?
Now, why don’t I reconsider previously rejected chatbots?
ChatGPT is something that I always hated! Besides, to quote from Reddit, “ChatGPT is absurd. It will only speak to me in lists and bullets.” I use bulleted lists excessively myself, but too much is too much.
Mistral is truly dumb, more often than not.
Qwen deserves to be boycotted for Alibaba’s business practices. Frankly, the chatbot is not that bad. For instance, on medical matters, Grok and Copilot started to offer generic BS instead of actual advice (maybe they extended to the entire world the New York mandate of refraining from providing medical or legal advice). When I asked Qwen and DeepSeek, with web search enabled, I got very pertinent summaries. But this didn’t help Qwen evade my “boycott” list!
Z.ai’s GLM models have always been very slow. While their speed has dramatically improved recently, I still can’t find a reason to use GLM.
Euria was originally using Qwen, censorship included (it refused to answer the question “Why can’t Winnie-the-Pooh be mentioned in the context of Xi Jinping?”). This has changed, and it’s currently using Mistral, DeepSeek (hopefully for coding, because it otherwise also censors “sensitive” issues!), Llama, and Granite, with Whisper as a transcription engine. But for locations outside Western Europe, euria.infomaniak.com returns 403 Forbidden. Wankers.
Lumois beyond stupid, and there is no information with regard to the LLMs it’s using under the hood. Wankers².
So I am left with my current choice of chatbots, to be used as follows:
Kimi is my first go-to chatbot for quick-and-dirty general answers; answers that rely on web searches; quick coding advice; debugging compilation, building, and runtime errors; and fixing Linux issues.
Grok is the expert for complex topics, with variable limits in the free tier, but otherwise it can be quite powerful, and it’s the least censored major chatbot.
Gemini is useful for topics that I deem too complex for Kimi, then as a backup for Grok (I’m trying to minimize the use of Grok just in case I need to use it with several follow-ups), and also for some tasks I previously relied on Claude and Copilot for, such as advice on languages (“Does one say this, or that?” / “Grammar: this vs. that” / “Is this proper English?”).
Note that Kimi 2.6 was released on April 20. If you only use Kimi Thinking and Kimi Instant but never Kimi Agent or Kimi Agent Swarm, the free plan is generous enough:
⑩ My current selection of chatbots: Kimi, Grok, Gemini, each with its own use cases.
This blog uses technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent will adversely affect certain features and functions.
Functional
Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Leave a Reply