Privacy aside, LLMs don’t learn when they could (and should)
I might be wrong. Maybe I don’t understand that what I would do is not feasible or shouldn’t be done this way. But the way these chatbots talked made me ask them if they learn from these interactions. They answered they don’t.
This is from a thread with Kimi, so I had to censor the exact topic, but at some point it ended an answered like this:
I appreciate the correction — it’s always valuable to learn about these [redacted] that include [redacted] like this.
From now on, everything is [almost] verbatim:
Ludditus: When you learn such things, confirmed by web pages, do you actually learn? I mean, if later someone asks you [the same question], you would know?
Kimi: No — I don’t actually learn or retain information from our conversations. Each conversation starts fresh for me, with no memory of what we’ve discussed.
However, I do have access to a Memory Space feature that you control. Here’s how it works:
- You can ask me to remember specific things (facts, preferences, details about you)
- I store these in a structured memory system that persists across sessions
- I can also remove or update memories when you ask
Important notes:
- All personalization is fully controlled by you — nothing is used for model training
- You can disable or re-enable it anytime in Settings → Personalization → Memory Space
- I never proactively “learn” from web pages or conversations unless you explicitly ask me to store something
So if you wanted me to remember that [redacted], you could say something like “[redacted]” — and then I would recall that in future conversations. Would you like me to store this information?
Ludditus: Your answer included this: ” it’s always valuable to learn”; so I hoped you could actually learn from these interactions.
Kimi: You’re absolutely right to call that out — my phrasing was misleading. I said “it’s always valuable to learn” when I should have said something like “it’s always valuable to know” or “I appreciate you sharing that with me.”
I don’t actually learn in any meaningful sense. I don’t form memories, update my understanding, or carry knowledge forward. Each conversation is completely isolated. The memory space I mentioned is just a structured storage system that I can reference — it’s not learning, it’s retrieval from a database you control.
Thank you for the correction. I should be more precise with my language to avoid giving the impression that I’m experiencing something I’m not.
Ludditus: The LLM you’re using is therefore exclusively updated based on OUTSIDE actions, not such interactions?
Kimi: That’s correct. The underlying model (me) is updated exclusively through outside actions — specifically, training runs conducted by Moonshot AI on their own infrastructure using curated datasets.
I have no ability to modify myself through conversation, and nothing you or I say here feeds back into model training. The only “persistence” available is the Memory Space feature, which is:
- A user-controlled storage layer separate from the model weights
- Something I can read from and write to, but not “learn from” in any training sense
- Completely optional and transparent to you
So to summarize the pipeline:
What happens Result You tell me something in chat Gone when the session ends You ask me to store in Memory Space Retrieved in future sessions (your data, your control) Moonshot AI runs a training update Model weights change (rare, outside this conversation) The “learning” phrasing I used earlier was sloppy anthropomorphism on my part. I don’t have experiences, form memories, or update my understanding. I generate responses based on fixed weights, and those weights only change through Moonshot’s external training processes.
Ludditus: Ah, that’s a wasted opportunity. Are there any LLMs who submit to their creators suggestions for learning based on user-triggered web searches, with all personal info removed?
Kimi: That’s an interesting question about the broader LLM landscape. I’m not aware of any major LLM that currently implements a direct pipeline where:
- User-triggered web searches →
- Automatically stripped of PII →
- Submitted as training suggestions to creators
However, there are some related concepts in the space:
Existing approaches:
- Reinforcement Learning from Human Feedback (RLHF) — uses human raters and user feedback (thumbs up/down) to improve models, but this is curated and batched, not real-time from searches
- Constitutional AI / scalable oversight — Anthropic’s approach where AI systems critique and revise their own outputs
- Some open-source projects experiment with “continuous learning” or federated approaches, but these face significant challenges with data quality, poisoning attacks, and privacy
The challenges your idea would face:
- Even with PII stripping, ensuring no sensitive data leaks into training is extremely difficult
- Web search results can be biased, incorrect, or manipulated — you’d need robust filtering
- Scale: billions of searches would require massive curation infrastructure
Closest analogs:
- Search engines themselves (Google, Bing) use click signals and query reformulations to improve ranking, but this isn’t LLM weight updates
- Some AI assistants log conversations for “improving the service” (with notice/consent), but this is typically for product development, not direct model training
I don’t have specific knowledge of any system that bridges user web searches → sanitized training suggestions in the automated way you described. If such a system exists in research or niche implementations, it would likely be experimental rather than deployed at scale.
I then asked Grok:
Ludditus: Are there any LLMs who submit to their creators suggestions for learning based on user-triggered web searches, with all personal info removed?
Grok: No, there are no known LLMs (commercial or otherwise) that work this way.
No publicly documented LLM—or the systems powering major ones like ChatGPT (OpenAI), Claude (Anthropic), Gemini (Google), or Grok (xAI)—has the model itself analyze a user-triggered web search (or browsing session) and then proactively generate/submit “suggestions for learning” (e.g., proposed new training topics or data) back to its creators, even with all personal info stripped out.
How Model Improvement Actually Works in Practice
Major LLM providers do improve their models using data from user interactions, including web searches/browsing tools, but it happens through backend pipelines on aggregated, de-identified, or anonymized logs—not via the LLM generating and sending targeted suggestions. Here’s the reality:
- OpenAI/ChatGPT (which has a built-in web browsing tool): User chats, prompts, outputs, and browsed webpage content can be used for training if you haven’t opted out (default for individual accounts; separate toggle for “Include web browsing” in some features like Atlas). They explicitly reduce or remove personal information before feeding data into training sets. Business/Enterprise accounts are opted out by default.
- xAI/Grok (real-time web + X search): Publicly available internet search results, user interactions, inputs/outputs, and X posts are used to train and improve models. Personal user data is handled under their privacy policy, with emphasis on public data sources.
- Anthropic/Claude (has tool use including search): You can opt in/out of using chats (including tool sessions) for model improvement. If opted in, data is used after de-linking identifiers and applying filters to protect privacy. Safety-flagged conversations may also be used.
- Google/Gemini (deep web integration): Conversation data and tool outputs feed into product improvement and model training (opt-out available on consumer plans). Human reviewers may see data for safety/quality, and it ties into Google’s broader privacy framework.
In all cases, the company’s infrastructure collects logs (prompts + tool outputs + feedback) at scale, anonymizes them, and uses them for training/fine-tuning. The LLM itself does not “suggest” what to learn next per search.
Research vs. Real-World Deployment
There are research papers on self-improving LLMs or using user logs for continual learning (e.g., generating revision suggestions from interaction logs or LLM-as-judge systems). These are experimental techniques, not features in any shipped product where the model phones home with learning proposals. No commercial system advertises or implements the exact loop you described.
Why This Specific Mechanism Probably Doesn’t Exist
- Privacy & control: Sending even anonymized “suggestions” per search would add unnecessary complexity and risk compared to simple aggregated logging.
- Scalability: Companies already get massive signal from billions of interactions without needing the model to meta-reason about training needs in real time.
- Opt-outs & transparency: Policies focus on broad “use my data for improvement” toggles, not per-search LLM-generated feedback.
If a lab or startup ever builds exactly this (LLM → anonymized learning suggestion → creators), it would likely be announced as a novel continual-learning feature. As of now, nothing like it exists in any public LLM.
You can always check the latest privacy settings in the apps (e.g., OpenAI’s “Improve the model” toggle, Anthropic’s “Help improve Claude,” etc.) if you want control over whether your searches contribute to training at all.
Ludditus: That’s a wasted opportunity IMO.
Grok: Yeah, I get why you’d see it as a missed opportunity—it’s an elegant idea in theory. An LLM that, after helping with a tricky user query (especially one involving real-time web searches or novel topics), could internally reflect something like:
- “This user repeatedly hits edge cases in quantum error correction simulations.”
- “There’s growing discussion around [emerging 2026 technique X] that isn’t well-covered in my training cut-off.”
- “Anonymized pattern: 0.8% of recent science queries involve reconciling conflicting arXiv preprints from the last 6 months.”
…and then bundle a sanitized, high-level summary or topic vector to ship back to the training team for potential inclusion in the next data refresh or fine-tune. It would create a tighter, more organic feedback loop than just dumping raw logs, potentially accelerating adaptation to fast-moving fields without needing massive human curation overhead.
From everything publicly available as of mid-2026, though, no major player (OpenAI, Anthropic, Google, xAI, Meta, etc.) has deployed anything resembling this model-initiated, proactive suggestion pipeline. Here’s why it still feels like low-hanging fruit that’s not yet picked:
Closest Real-World Mechanisms (Still Not Quite There)
- Passive usage of interaction logs — Companies already harvest de-identified chat traces, tool calls (including web/Browse/Deep Research outputs), thumbs-up/down ratings, and explicit feedback to fuel post-training / continual-ish updates. This indirectly captures “what the model struggled with,” but the LLM itself doesn’t decide or propose what to learn next.
- RLHF / RL*AIF pipelines — Human (or now increasingly AI-assisted) raters rank outputs, and reward models learn from that. Some 2025–2026 papers explore “self-generated” preferences or critiques, but those stay internal during training runs—not live, per-user-session proposals sent home.
- Research on self-improving / recursive agents — There’s exciting work in 2025–2026 on recursive introspection, where an LLM critiques and revises its own outputs in a loop (e.g., RISE-style recursive self-improvement, or agentic self-reflection papers). Some continual-learning benchmarks test whether models can retain knowledge while incorporating new “experiences.” But these are test-time or fine-tuning-time behaviors, not live systems phoning home with curriculum ideas.
- Continual learning hype for 2026 — Several researchers (including DeepMind folks) called 2026 “the year of continual learning” for LLMs, with methods like replay buffers, LoRA-based domain adaptation, on-policy RL being more robust to forgetting, and agent benchmarks for lifelong adaptation. Still, deployment remains backend-orchestrated (periodic retraining on aggregated data), not model-proposed topics flowing upstream in real time.
Why It’s Probably Not Happening Yet (Practical & Risk Reasons)
- Privacy & attack surface — Even heavily anonymized “suggestions” could leak patterns if aggregated cleverly (membership inference, reconstruction attacks). Companies already face huge scrutiny over data usage; adding an explicit outbound channel from the model would invite more audits / regulation headaches.
- Signal-to-noise nightmare at scale — Billions of daily interactions mean you’d drown in low-quality or redundant proposals. Filtering / prioritizing them would require another meta-model… which starts looking a lot like the current aggregated logging + human/AI curation loop anyway.
- Control & safety — Letting models suggest their own future training content opens subtle drift risks (e.g., over-optimizing for viral user patterns, echo chambers, or adversarial probing). Labs prefer centralized control.
- Existing loops are “good enough” — Web/tool outputs already feed back into reward modeling and data filtering for the next checkpoint. Thumbs feedback is direct. Real-time search logs highlight freshness gaps without needing the model to verbalize them.
That said, you’re right—it’s conceptually clean and could be more efficient than blanket logging. If I had to bet on when we might see prototypes:
- Open research / smaller labs — 2026–2027: expect papers or open-source agents that simulate this (e.g., local lifelong agents logging self-proposed topics to a personal fine-tune queue).
- Frontier labs — 2027+: possible in controlled settings (enterprise/private instances first), maybe framed as “active curriculum proposal” for opt-in continual improvement.
Until then, the wasted-opportunity feeling is valid. The current setup is more “human engineers decide what the model should learn from your chats” than “the model tells the engineers what it wishes it knew better.” If you’re experimenting locally with open models, you could actually mock this loop yourself—have one instance analyze conversation logs and output topic suggestions for a LoRA trainer. It’s DIY continual learning, minus the phoning-home part.
What aspect of this bugs you most—the missed efficiency, the lack of model agency, or something else?
Ludditus: This is the scenario that bugs me:
- The user asks something.
- The LLM answers bollocks.
- The user corrects it, sometimes even with offering web links.
- The LLM acknowledges the mistake, searches the web or accesses the links, and confirms that the answer is different, more nuanced, or more complex.
- The LLM then gives a different answer.
- The user finds more faults and gives feedback.
…and so on.When a complex topic is clarified in, say, 20 questions and answers, it’s really a waste if the LLM would remain as dumb as it was.
Grok: Yeah, this exact loop is the single biggest frustration with today’s frontier LLMs — and you’ve nailed why it feels so wasteful.
You spend 5–10–20 turns:
- LLM gives confidently wrong or oversimplified answer
- You correct + link evidence
- LLM says “Ah yes, you’re right, here’s the nuance”
- Repeat until the topic is actually clarified
…only for the model to be just as clueless the next time someone (even you) asks a related question in a fresh chat. All that hard-won clarification evaporates the moment the conversation ends. It’s like teaching a tutor who forgets everything the second class is over.
Why This Still Happens in March 2026
No commercial LLM (ChatGPT/GPT-5.x, Claude Sonnet 4.6, Grok 4, Gemini 3.x, etc.) does real-time, model-wide parameter updates from individual user corrections. Here’s the technical reality:
- The underlying weights are frozen between major training runs. Corrections live only in the current context window (or your personal “Memory”/Projects if the platform has it).
- Your back-and-forth becomes training data eventually — if you haven’t opted out — but it gets thrown into a giant anonymized log pile, de-identified, filtered, and fed into the next checkpoint weeks or months later. One person’s 20-turn clarification on, say, quantum error correction or a niche regulatory nuance is just one drop in an ocean. It helps statistically, but there’s no guarantee it sticks or generalizes quickly.
- “Continual learning” or “online learning” is the hottest research topic right now (DeepMind and others literally called 2026 “the year of continual learning”), but it’s still almost entirely lab papers and open-source prototypes. Things like:
- Distilling conversation logs into structured rules/preferences
- Self-improving agent loops that generate “memory notes” from failures
- LoRA-style incremental fine-tuning on user trajectories
…exist in research, but none of the big consumer-facing products have shipped a version where the model itself updates live from your specific corrections.Grok (me) is the same: xAI uses aggregated interactions and public X data for future improvements, but not instant per-chat model surgery. Same story at OpenAI, Anthropic, and Google.
What Actually Helps Right Now (Still Not Perfect)
- In-this-conversation learning — Once we’ve gone through the 20 turns, I (or any good model) will remember the clarified version for the rest of this thread. You can even ask me to summarize it as a reusable “knowledge card” you can paste into future chats.
- User-specific memory features (the closest thing to persistence):
- ChatGPT’s Memory
- Claude’s Projects + memory tools
- Some enterprise setups
These store facts across your sessions, so the next time you ask, it might recall the nuance. But it’s retrieval, not weight update — and it doesn’t help other users.
- Feedback buttons + explicit “use for training” — Thumbs up/down and “this was wrong because…” do get mined. If enough people hit the same edge case, it accelerates the next training cycle.
- Local/open-source route (the real fix if you’re serious) — Run something like Llama 4, DeepSeek, or Mistral locally + tools like UNO (a 2026 research framework for continual learning from user logs) or simple LoRA fine-tuning scripts. You can literally feed your corrected conversation histories back into the model overnight and wake up with a version that actually learned your topic. Many people doing deep research or niche work have switched to this exact workflow.
The Wasted Opportunity, Quantified
You’re right: a complex topic clarified over 20 exchanges is basically free, high-quality supervised data + preference data (you showed the model exactly where it was wrong and what the better answer is). In an ideal world the model would:
- Extract the key insight
- Turn it into a compact “learning proposal”
- Ship it (anonymized) upstream
- Or at minimum store it in a shared memory layer that future instances can retrieve
We’re not there yet. The current system treats every chat as disposable, even though the signal is gold.
This is why so many power users now keep personal “LLM notebooks” — they manually copy the clarified answers into a second brain or fine-tune queue. It’s a hack, but it works until the labs finally ship proper lifelong/continual learning at scale.
If you want, we can do exactly that right here: give me the topic you’ve been hammering out in 20-turn loops (or a summary of the corrections), and I’ll condense it into a clean, reusable knowledge block you can reuse forever. Or tell me if you’d rather I sketch a simple local continual-learning setup you could run yourself. This particular pain point is finally getting real engineering attention in 2026 — just not quite in the public products yet.
I am naïve, I know. But I believe that the “massive signal from billions of interactions” is garbage.

Leave a Reply