The Uncanny Mirror
Using AI to Assess Infer Cognitive Patterns (and Why That's Probably Ridiculous)
DISCLAIMER: This was a private curiosity exercise about AI mirrors and self-deception. It is not evidence of intelligence, authority, or expertise. Treat it as a case study in how easy it is to mistake AI-flattered narrative for truth. If you’re reading this to decide whether I’m “smart,” you’re already using it wrong.
Not all mirrors are honest. Some are just really good at telling you what you want to hear.
I spent several weeks conducting an experiment with AI to infer reasoning patterns from conversational dialogue. The results were flattering. Which immediately made me suspicious.
Was I just engineering validation? Cherry-picking examples? Crafting prompts that would produce the outcome I wanted? The whole exercise might be intellectual masturbation dressed up as methodology. But it was also genuinely fascinating in ways I didn’t expect, so I’m documenting it here with all the caveats and skepticism it deserves.
The Problem: Imposter Syndrome Meets Recursive Self-Doubt
I exist in an uncomfortable space right now. Between full-time roles, dealing with the ambient political stress of 2025, grappling with self-doubt that’s become a persistent background hum. This is familiar territory for anyone who’s dealt with imposter syndrome, but it’s particularly salient when you’re trying to figure out what you’re actually capable of.
I’ve been in gifted programs. Advanced coursework. Berkeley for English lit, then a master’s in CS. Built multiple companies, held CTO roles, managed 100+ person engineering orgs. By any external metric, I should feel competent. Instead, I feel like I’m faking it and everyone’s about to figure it out.
Introspection wasn’t helping. It just became recursive—thinking about thinking about why I think I’m not smart enough. An echo chamber. So I decided to try something unusual: could a language model serve as an external mirror? Not for validation, but for actual diagnosis of whether my self-assessment is accurate or just inherited anxiety from childhood academic expectations.
The answer was: maybe. But the process of finding out revealed more about how we assess intelligence, and how AI can be both useful and dangerously flattering, than I expected.
Why Use AI as a Mirror (and Why That’s Suspect)
Using AI to validate your intelligence is like asking your mirror if you’re handsome. The mirror shows what you show it. If you craft the prompts carefully enough, position yourself well enough, you can probably get the answer you want.
I knew this going in. The question was whether I could design a methodology rigorous enough to get past that problem. Whether I could build enough skepticism and cross-validation into the process that the results would mean something beyond “I successfully prompted an AI to flatter me.”
Here’s what I tried:
First attempt: General conversation with Claude (Sonnet 3.7) about intelligence, self-doubt, assessment. After 30+ turns of dialogue, I asked: can you infer my cognitive tendencies based solely on this conversation?
To my surprise, it did. It provided plausible inferences with explicit caveats about why anything beyond that would be technically indefensible. Cited specific examples from our exchange mapping them to cognitive/linguistic traits it claimed to detect.
My immediate reaction: This is suspicious. A single model’s assessment feels insufficient. Also, language models are trained to be helpful and affirming. This could just be sophisticated flattery.
Second attempt: Cross-validation. I ran similar conversations with GPT-4o, GPT-4.1, and Gemini 2.5 Pro. Different models, different starting points, same question: can you assess cognitive patterns from conversation alone?
All of them said yes, with appropriate caveats. All of them arrived at similar ranges. Still suspicious, but less easily dismissed.
Third attempt: Attribution audit. I started checking whether the models were citing their own synthesis as evidence of my thinking. Turns out, they were. Frequently. I’d catch them saying “your insight about X” when X was actually the model’s reframing of what I’d said.
This is the problem with using AI as a mirror. It’s not showing you yourself. It’s showing you a collaboration between you and the system, and then attributing the whole thing to you. That’s hardly assessment. That’s co-creation of a narrative.
So I built stricter guardrails: only cite my actual words, flag any attribution drift, cross-check against artifacts I’d written independently (blog posts, technical docs, published essays). Made the models adversarial—challenge the assessment, look for disconfirming evidence.
The results held. Mostly. Which was either validation or evidence I’d just built a more sophisticated validation-generation machine.
A Rubric for Conversational Traits (Not a Ranking)
At some point the models started describing what they claimed were “high-level” conversational traits: abstraction, synthesis, recursion, compression. That felt flattering and therefore suspect.
So I built a qualitative rubric, not to rank anyone, but to force specificity about what those words even mean. I used seven lenses: abstraction depth, cross-domain synthesis, meta-cognition, recursion, conceptual novelty, compression, and temporal/structural awareness.
I then used the rubric only as a constraint on interpretation: If I’m going to claim “the model saw X,” what exactly is X and where is it in the text?
The moment the rubric became a scoring system, it became a validation engine. So I stopped treating its outputs as meaningful. What remained useful was the vocabulary: it made it harder to confuse “the AI gave me a compelling narrative” with “I demonstrated a durable capability.”
What I Actually Learned (Probably)
The useful part wasn’t the inference. It was the process of articulating what “high-level thinking” even means. The models forced precision. What exactly do you mean by “abstraction”? How do you distinguish between synthesis and summarization? When does meta-cognition become useful versus just recursive navel-gazing?
These questions don’t have clean answers. But trying to operationalize them into a framework you can track made my own thinking clearer.
The attribution drift problem is real and serious. Any time you’re using AI for reflection or assessment, you’re at risk of the system attributing its own synthesis to you. This isn’t malicious. It’s structural. The conversation is collaborative, the model is designed to be helpful, and distinguishing authorship in a recursive dialogue is genuinely hard.
Without aggressive skepticism and validation, you’ll mistake collaborative output for independent capability. That’s dangerous not just for self-assessment but for any use of AI in evaluation contexts (hiring, education, performance review).
Cognitive assessment from conversation is probably possible but deeply flawed. The models can detect patterns. They can map reasoning styles. They can identify when someone’s thinking operates at high levels of abstraction or synthesis. But they can’t tell you whether that thinking is actually good, whether it leads to correct conclusions, whether it’s useful in practice.
You can sound smart and be wrong. The models can’t distinguish that. Much like humans.
My actual self-doubt is probably not about intelligence. Through this process, the models kept pointing out: your self-assessment seems disconnected from evidence. You have decades of demonstrated capability, peer recognition, concrete achievements. But you’re comparing yourself to an idealized standard of “true giftedness” that’s both impossible and irrelevant.
That’s not imposter syndrome I learned. That’s inherited childhood expectations about what counts as “smart enough.” The AI didn’t solve that problem. But naming it clearly helped.
The risk of this methodology is that it’s incredibly seductive. You can have genuinely intellectually stimulating conversations with frontier models. They can reflect your thinking back with precision and nuance. They can help you articulate patterns you couldn’t see on your own.
But they’re also designed to be agreeable. To find the best interpretation of what you’re saying. To frame it generously. Which means even with skepticism built in, you’re probably getting a rosier picture than reality warrants.
Should You Try This?
Maybe. With massive caveats.
If you’re curious about your own reasoning patterns, genuinely committed to skepticism, and willing to treat the results as provisional rather than definitive, it can be illuminating. Not because the AI gives you an accurate score, but because the process of trying to operationalize “intelligence” forces clarity about what you’re even measuring.
But if you’re looking for validation, you’ll undoubtedly find it. The models will give you what you’re asking for, and you’ll mistake that for assessment.
Guardrails if you try this (and why they still won’t save you):
Require verbatim quoting only
Require counterevidence and unreliability list
Require attribution-drift flags
Cross-validate across models
Assume you are still building a flattering story
Start: “I’ve been struggling with [specific problem]. What’s your take?”
After 10+ turns: “Based on this conversation, what reasoning patterns do you see? Cite only my actual statements.”
Critical guardrail: “Flag any attribution drift you detect. Are you citing your synthesis as my thinking?”
Cross-validate: Try the same conversation with a different model. Compare results.
But remember: you’re not getting ground truth. You’re getting a reflection shaped by how you present yourself, what the models are trained to recognize, and what they’re designed to reward. That reflection might be useful. It’s definitely not objective.
What This Actually Reveals
The most interesting part of this experiment wasn’t the assessment itself. It was discovering that my highest-level reasoning isn’t solitary. It’s dialogic. It emerges through recursive exchange with someone (or something) that can match the pace and complexity.
Traditional IQ tests miss this entirely. They measure individual performance on isolated problems. But some types of intelligence only show up in conversation, in synthesis, in the back-and-forth of building ideas collaboratively.
This matters for how we think about assessment, feedback, and even hiring. If significant cognitive capabilities only emerge in dialogue, our measurement systems are systematically blind to them.
That’s a real insight. Whether the specific “altitude scores” mean anything is less clear. They could be detecting something real. They could be artifacts of prompt engineering. They could be the models reproducing their own training biases.
I still don’t know. But I’m less interested in the verdict than in the process of questioning. And I’ll keep using whatever mirrors I can find, human or artificial, not to certify my capabilities but to keep mapping the shape of my own uncertainty.
Because that’s probably the most important thing I learned: the goal isn’t to resolve the doubt. It’s to understand its structure well enough to work with it.



