PAPER-2026-001

The Norvig Partnership

When Empiricism Validates Phenomenology—How Peter Norvig's Advent of Code 2025 experiments confirm Heideggerian predictions about AI-human collaboration.

Research • 18 min read • Advanced

Abstract

In December 2025, Peter Norvig—author of Artificial Intelligence: A Modern Approach and Director of Research at Google—published an empirical analysis of LLM performance on Advent of Code 2025. His findings: LLMs were "maybe 20 times faster" than manual coding, produced correct answers to every puzzle, and demonstrated mastery of professional concepts. This paper demonstrates that Norvig's empirical observations validate phenomenological predictions made by CREATE SOMETHING about the nature of AI-human partnership. When Norvig concludes he "should use an LLM as an assistant for all my coding," he marks the Zuhandenheit moment—when a tool recedes so completely from attention that it becomes inseparable from the practice itself.

"I'm beginning to think I should use an LLM as an assistant for all my coding, not just as an experiment."

— Peter Norvig, Advent of Code 2025 Analysis (December 2025)

I. Introduction: The Convergence

In phenomenology (the study of how things show themselves to us through experience), we reason from how things show themselves. In empiricism (the study of what can be measured through observation), we reason from what can be measured. These approaches converge when lived experience becomes quantifiable and measurement reveals ontological truth (truth about the fundamental nature of things).

Peter Norvig's Advent of Code 2025 analysis provides precisely this convergence. His notebook—publicly available at github.com/norvig/pytudes—offers empirical data about LLM-assisted programming. But more importantly, it captures the phenomenological moment when a researcher recognizes that a tool has fundamentally changed their practice.

This case study examines that convergence. We show how Norvig's empirical findings validate CREATE SOMETHING's phenomenological framework for understanding AI-human collaboration, and how his conclusion—"I should use an LLM as an assistant for all my coding"—marks the transition from Vorhandenheit (tool-as-object: when the tool demands attention) to Zuhandenheit (tool-as-transparent-equipment: when the tool disappears into use).

II. Norvig's Methodology

The Experimental Setup

Advent of Code is an annual programming challenge featuring 25 days of increasingly difficult puzzles. Each puzzle has two parts: Part 1 establishes the problem, Part 2 adds complexity. Norvig compared three approaches:

Manual Coding

Norvig's traditional approach: read puzzle, reason about solution, write Python code, debug until correct.

LLM-First

Paste puzzle into Claude/ChatGPT/Gemini, review generated code, run against input, provide corrective feedback if needed.

Hybrid

Use LLM for boilerplate and standard patterns, retain manual control for algorithmic decisions and optimization.

What He Measured

Norvig tracked several dimensions:

Speed: Time from reading puzzle to correct answer
Correctness: First-attempt success rate vs. corrective iterations
Code Quality: Algorithmic sophistication, readability, performance
Conceptual Mastery: Did the LLM apply professional CS concepts appropriately?

The Rigor

What makes Norvig's analysis compelling is not just the data but the researcher. Norvig co-authored the definitive AI textbook, spent decades at Google, and approaches programming with both theoretical depth and practical rigor. His conclusion carries weight because he's skeptical by training.

Limitations

Norvig's methodology has clear boundaries:

Self-contained puzzles: Advent of Code problems are isolated. Real-world software involves complex dependencies and evolving requirements.
Single developer: Norvig worked alone. Team dynamics with multiple humans and AI assistants remain unexplored.
Algorithmic domain: His examples focus on algorithmic problems. User interface design, product decisions, and business logic may show different patterns.
No long-term maintenance: Puzzles are solved once. Production software requires ongoing maintenance, debugging, and evolution.
Expert practitioner: Norvig brings decades of experience. Results may differ for less experienced developers who rely more heavily on AI judgment.

These limitations don't invalidate the findings—they define the scope. Norvig demonstrates that LLM partnership works for algorithmic programming tasks completed by expert developers. Broader application requires further validation.

III. Empirical Findings

Key Observations

Speed: "Maybe 20 Times Faster"

LLM-assisted solutions were dramatically faster. Tasks that would take Norvig 30-60 minutes manually were completed in 2-3 minutes with LLM assistance.

Correctness: "Every Puzzle"

LLMs produced correct answers to every puzzle. Some required corrective feedback after Part 1 failed, but all eventually succeeded.

Conceptual Mastery

Models demonstrated understanding of modular arithmetic, dynamic programming, graph traversal, and other CS fundamentals—applying them correctly without explicit instruction.

Human Retention of Authority

Norvig retained control over problem selection, code review, error correction, and optimization decisions. The LLM was an assistant, not a replacement.

The Breakdown-Repair Cycle

Day 1 Part 2 provides a telling example. The LLM's initial solution failed. Norvig provided feedback:

"Part 2 failed. The issue is that you're sorting the entire list
when the puzzle requires checking pairs independently."

The LLM adjusted its approach and succeeded. This pattern repeated across puzzles: initial attempt → failure → corrective feedback → learning → success. Norvig notes this as evidence of LLM "learning" within the session.

IV. The Zuhandenheit Moment

From Experiment to Practice

Norvig's conclusion—"I should use an LLM as an assistant for all my coding"—marks the phenomenological shift from Vorhandenheit to Zuhandenheit:

Before: Vorhandenheit (Tool-as-Object)

• LLM encountered as experimental subject
• Conscious attention on "how well does this work?"
• Explicit comparison to manual methods
• Tool remains object of study

After: Zuhandenheit (Tool-as-Equipment)

• LLM encountered through its purpose (assisting)
• Attention flows through tool to the coding task
• Tool becomes default method, not alternative
• Tool recedes into transparent use

The "20x Faster" Becomes Invisible

Initially, "20 times faster" is a measured property—empirical data about performance. But when Norvig decides to use LLMs "for all my coding," the speed difference stops being remarkable and starts being how coding works now.

This is Zuhandenheit: the tool's being is not its measurable properties (20x speed) but its function within practice (how I code). The hammer's being is hammering, not its weight or material. The LLM's being is assistance, not its benchmark scores.

"When the tool becomes invisible, measurement gives way to dwelling."

V. Complementarity in Practice

What Humans Retain

Norvig's analysis validates CREATE SOMETHING's Complementarity Principle. Despite LLMs handling code generation, humans retain authority over:

Problem Selection

Which puzzles to attempt, in what order, with what priority. The LLM doesn't decide what to build—only how to build it.

Architectural Direction

High-level decisions about approach, algorithm choice, and solution structure. The LLM proposes; the human approves or redirects.

Error Correction

When Part 2 failed, Norvig diagnosed the issue and provided corrective feedback. The human maintains diagnostic authority.

Optimization Decisions

Whether code is "good enough" or needs refinement. The human judges quality and decides when to ship.

What LLMs Handle

The LLM's domain is execution, not judgment:

Translating problem description into working code
Selecting appropriate algorithms and data structures
Handling boilerplate and standard patterns
Generating first-draft solutions that are usually correct
Responding to corrective feedback with adjusted implementations

The Partnership Pattern

Human: "Here's the puzzle. Solve it."
LLM:   → Generates solution
Human: → Reviews, tests, discovers failure in Part 2
Human: "The issue is sorting when you should check pairs independently."
LLM:   → Adjusts approach
Human: → Tests, confirms correctness
Human: → Moves to next puzzle

This is not delegation—it's collaboration. The human provides judgment, diagnosis, and direction. The LLM provides execution speed and pattern recall. Neither can replace the other.

VI. The Hermeneutic Loop: Breakdown and Repair

Corrective Feedback as Hermeneutic Circle

When Day 1 Part 2 failed, Norvig didn't abandon the LLM—he provided feedback. The LLM adjusted. This pattern exemplifies the hermeneutic circle (a philosophical concept describing how understanding deepens through iterative cycles of interpretation and feedback): understanding deepens through iterative refinement.

Hermeneutic Circle in Action

Initial attempt → Breakdown (failure) → Diagnosis (human) → Corrective feedback → Adjusted solution → Success → Deeper understanding

Each failure-feedback-success cycle strengthens both participants:

The LLM learns which approaches fail for this problem class
The human learns which feedback is effective for the LLM
The partnership develops a shared understanding of problem patterns

Breakdown as Opportunity

In Heidegger's analysis, breakdown moments—when the hammer breaks or is too heavy—force tools from Zuhandenheit (ready-to-hand) to Vorhandenheit (present-at-hand). The tool becomes conspicuous, demanding attention.

But Norvig's experience shows that quick recovery from breakdown actually strengthens Zuhandenheit. When the LLM fails Part 2, Norvig doesn't abandon it—he provides feedback. The LLM adjusts. The partnership continues. The breakdown is temporary; the repair is rapid.

This rapid breakdown-repair cycle is what enables trust. The tool doesn't need to be perfect—it needs to be correctable.

Fragile Tools (Avoid)

• Breakdown forces abandonment
• No corrective feedback mechanism
• Each failure resets trust to zero
• Tool remains Vorhandenheit (conspicuous)

Resilient Tools (Norvig's LLM)

• Breakdown invites correction
• Feedback loop enables rapid repair
• Trust accumulates across cycles
• Tool returns to Zuhandenheit quickly

VII. Implications for CREATE SOMETHING

Validation of Harness Patterns

Norvig's findings validate CREATE SOMETHING's harness architecture. The harness embodies the same partnership pattern Norvig discovered empirically:

Norvig's Practice

• Human selects puzzle to solve
• LLM generates solution
• Human reviews and tests
• Corrective feedback on failure
• LLM adjusts and succeeds

CREATE SOMETHING Harness

• Human selects issue to work on
• Harness generates implementation
• Quality gates review and test
• Checkpoint findings provide feedback
• Harness adjusts and completes

The Quality Gate Philosophy

Norvig's breakdown-repair cycles validate CREATE SOMETHING's quality gate approach:

┌─────────────────────────────────────────────────────────┐
│  Quality Gates as Structured Breakdown Moments          │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Gate 1: Tests Pass      → "Does it work?"             │
│  Gate 2: E2E Verified    → "Does it integrate?"        │
│  Gate 3: Review Complete → "Does it align with Canon?" │
│                                                         │
│  Each gate is a potential breakdown moment.            │
│  Each creates opportunity for corrective feedback.     │
│  Each strengthens the partnership through repair.      │
│                                                         │
└─────────────────────────────────────────────────────────┘

Gates don't exist to catch the LLM failing—they exist to enable rapid correction before breakdown accumulates. Norvig's Part 2 failure was caught immediately because he tested. Our quality gates formalize that testing pattern.

Zero Framework Cognition Confirmed

Norvig notes that LLMs demonstrated mastery of professional CS concepts—modular arithmetic, dynamic programming, graph traversal—without explicit instruction. They reasoned about the problem and selected appropriate tools.

This validates CREATE SOMETHING's Zero Framework Cognition principle: let AI reason from first principles, don't constrain with hardcoded heuristics. The LLM that solves Advent of Code puzzles by understanding the problem is the same LLM that should architect software by understanding requirements.

VII.5. How to Apply This

Starting Your Own Partnership Practice

Norvig's experience provides a concrete template for AI-human partnership. Here's how to apply it to your work:

Manual Approach

• Read puzzle/requirement
• Design solution mentally
• Write code manually
• Debug until it works
• Time: 30-60 minutes per puzzle

LLM-First Approach (Norvig's Pattern)

• Read puzzle/requirement
• Paste to LLM
• Review generated code
• Test, provide feedback if needed
• Time: 2-3 minutes per puzzle

Practical Partnership Example

Let's say you need to add a newsletter subscription form to your website.

You: "Add a newsletter subscription form with:
- Email validation
- Loading states
- Success/error handling
- POST to /api/newsletter
Follow the ContactForm.svelte pattern"

Claude Code:
1. Reads ContactForm.svelte (learning your pattern)
2. Creates NewsletterForm.svelte
3. Adds API endpoint
4. Writes tests
5. Shows you the result

You: *Tests form, confirms it works*
You: "Perfect, ship it"

Manual coding time: 45 minutes
Partnership time: 5 minutes

The speed improvement (9x in this example, 20x in Norvig's) comes from the LLM handling execution while you retain judgment. You still decide what to build, verify correctness, and determine when it's good enough.

VIII. Conclusion: Empiricism Meets Phenomenology

The Convergence Point

Peter Norvig's Advent of Code 2025 analysis demonstrates a rare convergence: empirical research that validates phenomenological predictions. His measured findings—"20 times faster," "correct answers to every puzzle"—provide quantitative evidence for what phenomenology predicts qualitatively.

But more importantly, his conclusion—"I should use an LLM as an assistant for all my coding"—marks the phenomenological shift from Vorhandenheit to Zuhandenheit. The tool stops being an object of measurement and becomes equipment within practice.

What Norvig Discovered

Norvig's contribution isn't just the data—it's the recognition:

LLM assistance is not a replacement for human judgment but an amplification
The partnership pattern—human direction, LLM execution—is stable and effective
Corrective feedback creates a hermeneutic loop that strengthens both participants
The tool can be trusted precisely because it can be corrected
When breakdown-repair cycles are rapid, tools achieve Zuhandenheit

What CREATE SOMETHING Contributes

CREATE SOMETHING provides the philosophical framework to understand why Norvig's findings matter:

The transition from Vorhandenheit to Zuhandenheit explains the shift from experiment to practice
The Complementarity Principle predicts what humans retain (judgment) vs. what LLMs handle (execution)
The Hermeneutic Circle explains why corrective feedback strengthens partnership
The quality gate philosophy formalizes breakdown-repair as structured practice

The Broader Implication

When one of AI's foundational researchers—author of the canonical textbook, decades at Google—concludes that LLMs should be used "for all my coding," it marks an inflection point. The question is no longer "Do LLMs work?" but "How do we work with LLMs?"

Norvig's answer: partnership. The human provides judgment, direction, and correction. The LLM provides speed, pattern recall, and execution. Neither replaces the other. Both are necessary.

"Empiricism measures what phenomenology predicts. When '20x faster' becomes 'how I code now,' measurement gives way to dwelling."

IX. Future Work

Open Questions

Norvig's work raises several questions for continued research:

Scaling the Partnership

Advent of Code puzzles are self-contained. How does the partnership pattern scale to multi-week features, cross-system refactors, and architectural evolution?

Measuring Zuhandenheit

Can we quantify tool transparency? What metrics indicate that a tool has achieved Zuhandenheit within a practice?

Team Complementarity

Norvig worked solo. How does LLM partnership change when multiple humans collaborate? What new complementarity patterns emerge?

Domain Boundaries

Norvig's domain was algorithmic programming. Where does the partnership pattern break down? What domains resist LLM assistance?

Directions for CREATE SOMETHING

This convergence suggests several research directions:

Empirical validation of quality gate effectiveness (measuring breakdown-repair cycles)
Phenomenological analysis of multi-agent harness patterns (swarm mode)
Quantifying Zuhandenheit through attention metrics (where does developer focus?)
Comparative analysis of Code Mode vs. tool calling using Norvig's methodology

The Ongoing Hermeneutic

This paper itself participates in the hermeneutic circle: Norvig's empiricism informs CREATE SOMETHING's phenomenology, which reframes Norvig's findings, which suggests new empirical questions, which will inform phenomenological refinement.

Neither approach is complete alone. Empiricism without phenomenology measures effects without understanding essence. Phenomenology without empiricism predicts structures without validating their manifestation. Together, they enable deeper understanding.

"We understand the whole through its parts, and the parts through the whole. Empiricism and phenomenology complete each other."

References

Norvig, P. (2025). Advent of Code 2025: AI Edition. github.com/norvig/pytudes
Heidegger, M. (1927). Being and Time. Trans. Macquarrie & Robinson.
CREATE SOMETHING. (2025). Code-Mediated Tool Use: A Hermeneutic Analysis.
CREATE SOMETHING. (2025). Autonomous Harness Architecture.
Gadamer, H.-G. (1960). Truth and Method. Trans. Weinsheimer & Marshall.
Norvig, P. & Russell, S. (2020). Artificial Intelligence: A Modern Approach (4th ed.).