I’m trying to collect my thoughts on the current situation regarding the public availability of LLMs. This post comes as a quick follow-up to “Free AI coding agents are becoming scarce” and was triggered by the news that Google is working on a credit-based system for Gemini:

Google appears to be preparing a major shift in how consumers interact with the Gemini app, with new strings referencing usage limits surfacing in the latest build. The signals point toward a credit-based system coming to the core chat surface, where users would receive a monthly allowance to spend across models and features, with the option to top up when they run out. Currently, Gemini relies on fixed prompt quotas and time-bound caps tied to each subscription tier, while Google’s credit mechanics have been confined to Flow, Whisk, and Antigravity, plus top-ups available to AI Pro and AI Ultra members.

Extending credits into the main Gemini app would bring Google closer to the flexible consumption model already in place at OpenAI, Anthropic, and Notion, and xAI is expected to follow suit with the Grok Build rollout. For power users, the change would mean more predictable budgeting for heavy workloads, particularly those involving agentic tasks, Deep Research, Deep Think, or long multimodal sessions. It would also give Google a cleaner lever to introduce premium features without forcing users to make a steep jump from AI Pro at $19.99 to AI Ultra at $249.99.

My take on the current status and the perspective of using LLMs

To simplify the approach of talking about LLMs, let me categorize as follows the main ways people use them:

  1. Browser chatbots. This is how most people use AI, usually for free.
    • Niche case: ChatGPT Go might have made some people pay for a subscription merely because of its affordability.
    • Niche case: Copilot can also be used embedded in Microsoft 365 apps.
    • Niche case: Gemini paid plans are part of bundles that include other Google services.
    • Niche case: Qwen’s web interface has no paid plan; higher limits require an API key.
  2. API-based access. Intended to be used directly with your own key, but much more popular through a coding assistant, or through an aggregator. No free tier, except for Gemini CLI (which uses OAuth rather than an API key in the free tier), and Kiro (whose free credits are opaque as to what they represent).
    • Coding assistants as full IDEs (Cursor, Windsurf, Kiro, etc.): In most cases, VS Code forks or heavy extensions with their own subscription or free tier.
    • Coding assistants as CLIs (Gemini CLI, Claude Code, Codex CLI, Qwen Code, etc.): terminal-native tools, closer to raw API usage, sometimes with a free tier (Gemini CLI via OAuth), but typically billed per token.
    • Niche case: OpenRouter lets you access many providers through a single API key.
  3. Hosted open-source models. You pay for inference, not hardware. HuggingFace is the central reference here (by popularity), and it’s worth noting it plays three distinct roles: a repository of freely downloadable models (including quantized versions for modest hardware, see below), a hosted inference service (Inference Endpoints, which belongs to this category), and a GPU rental platform (see below). In engineering/startup environments, Together.ai, Groq, Replicate, and Fireworks are more popular.
  4. Self-hosted open-source models. Free, but requires adequate hardware. Alternatively, you can rent GPU time. Common helpers for local run: Ollama, LM Studio, AnythingLLM, Msty.
    • Niche case: Renting a GPU on HuggingFace, or on more specialized platforms like Lambda Labs, Vast.ai, or RunPod.
    • Overlap: can be used as a local coding assistant in various IDEs.
  5. Generation of images, videos, and music. I’m going to totally ignore this topic. My focus is on common usage: general knowledge and reasoning, web search, and coding. However, I’d note that this use case is available in two main ways: integrated into regular chatbot subscriptions (ChatGPT uses DALL·E, Gemini uses Nano Banana, etc.) or through dedicated platforms (Midjourney, Stable Diffusion, etc.). Going beyond images: Suno, Udio, or ElevenLabs for music; Runway, Kling, Luma, or Google’s Veo for video (OpenAI just killed Sora). I couldn’t care less about any of them.

What’s the problem with LLMs right now?

For several months already, if not more, people have been complaining about several forms of enshittification:

  • Models becoming occasionally stupid or increasingly stupid, even for paying customers. Claude and GPT are the most criticized ones, with Gemini coming close.
  • Certain vendors, but especially Anthropic, are cheating on subscriptions in such a way that one’s daily quota could be exhausted in as little as one prompt. Anthropic eventually acknowledged a bug but didn’t rush to fix it and never reimbursed its customers.
  • Anthropic’s Claude and Amazon’s Kiro blocking or discontinuing accounts or subscriptions, also with no reimbursement.
  • Each tier’s limits are becoming lower and lower, usually in a non-transparent way: the actual limits are vague (unless you pay per token), and their decrease isn’t even documented (people notice when they’re hit by them).
  • Access to certain models (especially for image or video generation) or to agentic modes is severely limited in all subscription plans, and even the most expensive plans cannot satisfy advanced needs. I’m not sure that third-party subscriptions or pay-per-token are available as solutions for all cases.
  • Several free tiers have been removed (e.g., in Qwen Code) or severely reduced, and access to advanced models is no more accessible for free (Grok, Gemini CLI). ChatGPT and Copilot don’t even show the version of the model you’re using in the free tier!

It’s true that all vendors are still losing money. Even the most expensive plans cannot compensate for the overall costs, considering how many free riders there are.

That said, what’s the future of using LLMs by the public? I’m thinking of normal individuals or freelancers who are not willing to spend north of $200/mo and still not have their needs covered. Private individuals and occasional coders wouldn’t normally pay more than $20/mo!

Using a chatbot for free is increasingly limited (because of lower limits and dumber models) and frustrating (because newer models seem to become stupider or less reliable), but even paying doesn’t solve much!

The subsidized “honeymoon phase” of virtually unlimited AI came to an end. AI companies are trying to aggressively recover their compute costs.

The $20/mo (plus tax, which in euros plus VAT usually translates to much more) subscriptions will most likely become even more heavily throttled. They will only be fine for casual use, but not for heavy coding or deep logic.

Running models locally should become more and more popular, but this would incur investments in hardware, which isn’t what most people are willing to do. After all, this is what the Cloud was invented for, right? Renting GPU time should probably be a more sensible workaround.

Paying per token (API use) will continue to offer the best cost control, and it’s the most transparent option. However, users of chatbots in a browser or in an app typically can’t do that. Regular users would accept an “AI tax” to be added to their “Netflix/HBO/Disney+/Amazon Prime, etc. tax” as a regular subscription, but where to insert an API key in one’s browser or official smartphone app for a chatbot? They wouldn’t even know what an API key is!

I asked Gemini about this conundrum, and here’s what it answered:

While pay-per-token is the most transparent and cost-effective method, companies like OpenAI and Anthropic will never allow you to input an API key into their official web interfaces or smartphone apps. Why? Because the flat $20/month subscription model relies heavily on “breakage”—the industry term for users who pay the monthly fee but rarely use the service, thus subsidizing the heavy power users. If everyone paid exactly for what they used via the official app, the vendors’ predictable Monthly Recurring Revenue (MRR) would collapse.

It delicately avoided mentioning Google 🙂

It also offered this argument:

We are moving away from needing a massive $2,000 desktop GPU to run AI. Instead, we are in the era of the NPU (Neural Processing Unit). Apple’s M-series chips, Intel’s Core Ultra, and Qualcomm’s Snapdragon X are baking AI acceleration directly into the laptops and smartphones people are already buying. For the average user, the “hardware investment” is just upgrading their phone or laptop as they normally would every few years. The OS handles routing basic tasks (like summarizing an email) to the local, on-device model for free, saving the heavy lifting for the cloud.

Well, no, I am not buying. I’m from the crowd that wants very affordable, quiet laptops that are not by Apple, that are still x86_64, and that do not include any NPU-capable CPU or any NVIDIA shit!

So the world seems to have evolved in circles:

  • Mainframes for computing and dumb displays for displaying the results and entering commands.
  • The PC revolution. The computing power moves on one’s desk.
  • The laptop revolution. The computing power is even more portable.
  • The smartphone and tablet revolution. Some even claimed they would replace the home computer. They didn’t.
  • The Cloud. Most computing power moves back to large servers (the modern equivalent of the mainframes), with local computing power generally used for a browser and an office suite. Gaming is a different use case.
  • AI arrives for the public. Now the “mainframe” is a cluster of H100s in a warehouse in Iowa, and the “terminal” is a $600 laptop running a browser tab.
  • Enshittification happens, so they want us to make our laptops and desktops “AI data centers”! But the huge AI data centers will continue to exist and expand. Note that only open-source LLMs can be run locally, and they still need to be limited in size and quantized to run on affordable hardware. Dumber and slower: what’s not to like?

A suggestion for Luddites like me, from Germini:

Since you aren’t buying the hardware hype, your path forward is actually the most “honest” one left:

1. The “Wholesale” Browser
Since you refuse to let your hardware be the “AI Data Center,” you use the browser for what it was meant for: a window into someone else’s expensive computer. But skip the official apps.
Use a self-hosted frontend (like a Docker container of Open WebUI) running on a cheap home server or even just a lightweight tab. Connect it to an API like OpenRouter. This gives you a “clean” interface where you can swap between the smartest models (Claude 3.7, GPT-5-mini) and only pay the $0.05 it actually cost to generate that specific answer. No $20/month tax, no hidden “stupidity” tiers.

2. Renting “Spot” GPUs
If you ever do need to run a massive open-source model (like a 400B Llama) for a specific project, don’t buy hardware. Use RunPod or Lambda Labs. You can rent a high-end GPU for about $0.40/hour, do your heavy lifting, and then “shut down the mainframe.” It’s the ultimate “anti-subscription” move.

You are successfully avoiding the “AI Hardware Tax,” but you will likely have to pay the “Complexity Tax” instead. The world is being built for people who either:

  • A: Pay $20/month for a lobotomized, “enshittified” official app.
  • B: Buy a $1,500 “AI PC” that runs local models to save the vendors money.

By choosing C (Affordable, quiet x86), you’re taking the path of the “Digital Sovereign.” You’ll have the cleanest, most predictable machine, but you’ll have to be your own sysadmin to get the high-end AI performance you want without the vendor games.

Digital shithole, rather!

It also suggested third-party fixes for the general case:

If official apps won’t take API keys and local models aren’t smart enough for complex tasks, the market is evolving to offer middle-ground solutions for the “Netflix-tax” crowd:

  1. The “Cable TV” Aggregators:
    Platforms like Quora’s Poe or Perplexity are leading this charge. Instead of paying $20 to OpenAI and being subject to their hidden limits, users pay a flat fee to an aggregator that offers a unified, polished app interface. The subscription grants you a monthly allowance of “compute points.” You can spend those points on Claude, GPT, Gemini, or open-source models seamlessly. It gamifies the pay-per-token model into something a regular consumer understands.
  2. Polished “Bring Your Own Key” (BYOK) Apps:
    While official apps lock you out of APIs, a booming ecosystem of third-party frontends (like TypingMind, Chatbox, or MSTY) is stepping in. They are designed to look and feel exactly like the ChatGPT app, complete with smartphone versions and cloud syncing. You paste your API key in once during setup, and from then on, you get a premium, app-like experience while paying wholesale token prices in the background.

OMG, Poe and Perplexity! 🤮 Then, I know Msty, but… What a shitty world!

Subscriptions and their vague limits

Subscriptions are still the most straightforward and easy upgrade to the free tier. Here’s a partial landscape of some major vendors, priced for Europe. When the VAT is not mentioned, the price is VAT-inclusive.

ModelPlanPriceIf paid annually
GeminiGoogle AI Plus€7.99/mo€79.99/yr
Google AI Pro€21.99/mo€219.99/yr
Google AI Ultra€139.99/mo
ChatGPTGo (core model)€8/mo
Business€21/mo
Plus€23/mo
Pro (5x Plus)€103/mo
Pro (20x Plus)€229/mo
CopilotPersonal€10/mo€99/yr
Premium€22/mo€219/yr
ClaudePro€18+VAT/mo€15+VAT/mo
Max (5x Pro)€90+VAT/mo
Max (5x Pro)€180+VAT/mo
GrokSuperGrok Lite (2x longer, 1 agent)$10/mo$100/yr
SuperGrok (5x longer, 4 agents)$30/mo$300/yr
KimiModerato$19+VAT/mo$15+VAT/mo
Allegretto$39+VAT/mo$31+VAT/mo
Allegro$99+VAT/mo$79+VAT/mo
Vivace$199+VAT/mo$159+VAT/mo
MistralPro€14.99+VAT/mo€11.99+VAT/mo

Outside the Eurozone, special prices might apply. E.g., for Romania:

ModelPlanPrice
ChatGPTGo (core model)29.99 lei/mo
Business86 lei/mo
Plus99.99 lei/mo
Pro (5x Plus)520 lei/mo
Pro (20x Plus)999.99 lei/mo

The way I look at these prices, they’re not exactly enticing people to subscribe, once we consider these aspects:

  • Not a single model specifies the exact usage limits! All paid plans offer “increased access” to the models, “access to advanced models,” or “twice as much” as the free plan, but there’s no clue as to how many prompts per day or per any other interval one may get!
  • More expensive plans offer “5x as much” or “20x as much” as a cheaper plan, but what’s the base reference? There is no indication whatsoever!
  • Google does offer a number of monthly AI credits, but they’re also opaque. What exactly can you do with one credit?
  • In most cases, the user finds that their quota expired when they hit it. AFAIK, only in Kimi can one find how much of their quota they’ve used. As a percentage of an unknown quantity, but this is still better than nothing.

I understand that different prompts and contexts lead to different token usage, but they could specify a number of tokens included in the subscription and a count of how many are used, updated in real time. They don’t, and this is dishonest.

What with the aforementioned disincentives, why would anyone pay a subscription or even purchase tokens to use with an API key?

  • Models are getting dumber, confused, or unreliable, or they’re “eating” tokens like there’s no tomorrow.
  • Your account could be blocked.
  • When you’re hitting the wall, often there’s no way for you to top up to continue working.

I’m even more disenchanted than people who pay

Even without paying, which limits my access to the most advanced models, I have a long experience in noticing things, such as:

  • ChatGPT, which at some point became quite good, seems to have decreased in quality with each update, at least for free users. It also no longer shows explicit model numbers in the free tier.
  • I always hated ChatGPT, and I hate it even more since it only answers in bulleted lists excessively adorned with emojis. It’s also designed to overpraise every single prompt that’s not stupid. I only use it once in a (very long) while to check that it’s still alive.
  • Claude, who also was very decent in Sonnet 3.5 and Hauku 3.5, seemed to me to have become less reliable with every single update. I used to love it for its more natural answers. It still gives acceptable answers to general questions, but developers complain a lot about its inconsistent coding abilities.
  • Copilot, which used to be, in my experience, better than ChatGPT until not so long ago (when it still let you know it uses GPT-5.1), can sometimes be very stupid. It also applies absurd levels of censorship.
  • I never liked Perplexity, full stop.
  • Grok, once extremely generous in the free tier (even in Heavy mode that uses a “Team of Experts”!), first started to only offer a variable number of Experts prompts, usually between 2 and 4, and now only offers Free, quick responses!
  • Every single chatbot has random high system loads that make it refuse to answer users in the free tier or to limit their quota to a much lower value than usual.
  • At its beginnings, Le Chat Mistral had its moments when it felt smarter than ChatGPT. Unfortunately, it became dumber and dumber, and no matter how many times I tried to offer it a second chance, I had to admit that the model available to free riders is completely retarded. Here’s its level of mental retardation. The prompt never mentioned Amazon Bedrock! Here’s a correct answer.
  • DeepSeek, once the proud open-source Chinese competitor to ChatGPT, failed to impress me beyond its initial heydays.
  • Qwen also had its better days and its worse days. While Qwen Code was quite decent as long as it had a free tier (which they abruptly discontinued), I’m not particularly convinced by its models for general use. I keep it as a backup solution, though.
  • For some reason, I love Kimi, with all its limitations. I become used to steering it. But its paying plans are quite vague.
  • Gemini, which was terribly dumb for a long time, seems to have become competent meanwhile. However, paying customers complain that Gemini 3.1 Pro is dumber than version 3 and that version 2.5 was even better. My limited use of Gemini 3 Flash in Gemini CLI was however quite satisfactory.

So, what does the future have in store for us?!

More uncertainty and harsher limits, I’m afraid.

For the time being, Gemini CLI still has a free tier. But for how long?

Then, in times when any chatbot can refuse to answer to free users at random moments of time, my ever-changing short list of chatbots might prove to short to cover my ass when I need the help of AI. I might need to add some backup to my latest iteration:

Preferred chatbots:

  • Kimi as my go-to chatbot for quick-and-dirty general answers; answers that rely on web searches; quick coding advice; debugging compilation, building, and runtime errors; and fixing Linux issues.
  • Gemini as the “top” model useful for topics that I deem too complex for Kimi or other chatbots, and for language-related topics.

Backup models:

  • Qwen as a backup chatbot for some IT-related issues and for some niche topics where I was satisfied by its answers. Its (unspecified) daily limits in the browser or in the sideloaded Android app are still quite generous.
  • Claude as an old friend used for quick answers that sometimes are very satisfactory. Using it occasionally would keep me updated with the evolution or the involution of Sonnet and Haiku (Opus is not available to free users).
  • Copilot only for some answers that involve web searches (it recently surprised me in one case!) or for general answers as a second or third opinion. Oh, and I’ll never ask it to translate anything, because it hates “politically incorrect” language! I need to be very cautious and never fully trust a chatbot, so I’ll apply this principle to Copilot, too.

    Out of habit, I sometimes feel the need to use one of Claude and Copilot. 🤷

    As the enshittification advances, the dilemma is growing more difficult:

    Should I really need more access to some LLMs, I’d be in trouble, because none of the existing solutions would satisfy me in the slightest!