The Information Machine

Guardian Angels: LLM Personalization for Productivity and Security

LessWrong (Curated) · gwern · 2026-06-17

Gwern proposes 'Guardian Angels,' personalized LLM digital twins that emulate a single user's personality and values to solve the principal-agent alignment problem and defend against AI-powered cyberattacks at scale.

Open original ↗

Extraction

Topics: llm-personalizationprincipal-agent-problemai-securityonline-learningai-agents

Claims

  • Powerful LLMs will dominate the internet and ordinary life within a few years, yet no coherent vision exists for maximizing productivity or security at that scale.
  • Current prompt-programming and in-context learning approaches are insufficient to create genuinely useful personalized AI due to limitations in frozen model weights, context windows, and passive data collection.
  • Guardian Angels would weakly solve the principal-agent problem by making the agent emulate the principal's own values and preferences, effectively unifying principal and agent.
  • Hardwiring a GA to a single, specific user neutralizes many prompt injection and spearphishing attacks because following an external prompt instruction would be absurd by the agent's own definition.
  • Building effective GAs requires online learning via dynamic evaluation, active learning with DAgger-style bounds, and a local CLI-first logging UI, not standard fine-tuning alone.
  • The GA concept is likely better pursued as a startup targeting power users and knowledge workers rather than as an open-source community project, due to high security requirements.

Key quotes

I propose a goal of creating Guardian Angels (GA): digital twin LLMs which are personalized with the goal of providing not the stereotypical 'assistant chatbot agent' persona, but emulating a single user's personality, values, and preferences.
This weakly solves the principal-agent problem by unifying the principal and agent as much as possible.
Standard techniques like prompt programming of in-context-learning for 'frozen' models will not create useful GAs due to the limitations of post-training, context windows and self-attention with frozen weights in compute-efficient-but-under-parameterized models.