Tuesday, February 17, 2026

Everything Seems All Right When We’re Poisoning Pigeons in the Park

Gary Marcus and friends:

The overindexing on often gameable benchmarks reflects a broader and recurring conceptual error: a tendency to treat observable behavior as decisive evidence of intelligence, without sufficient attention to underlying mechanisms. In psychology, behaviorism offered a pragmatic way to study cognition without reference to internal mental states; in artificial intelligence, operational tests such as the Turing Test similarly prioritize external performance over underlying mechanisms.

These approaches were useful, but ultimately insufficient. A central insight from cognitive psychology is that similar behaviors can be produced by fundamentally different processes, and that producing the right output does not imply the same cognitive capacities (Anderson, 1976). A classic illustration comes from behavioralist psychology: pigeons trained to discriminate complex visual stimuli—such as photographs of people versus those without people—were able to extend this discrimination to previously unseen images (Herrnstein & Loveland, 1964). Yet such performance does not imply abstraction, transfer, or flexible reasoning beyond the trained context.

The same lesson applies to artificial systems. Large language models increasingly approximate human behavior across many tasks, often producing outputs indistinguishable from those of humans in controlled settings. However, as in classic cases from behavioral psychology, similar outputs can arise from fundamentally different underlying processes. Behavioral similarity alone therefore provides no insight into whether the underlying processes support core components of general intelligence, such as judgment, error correction, or reliable generalization.

I'm certainly not tired of the "I asked Brother William" gag yet, so:

Marcus articulates something we've been exploring throughout our discourse: the fundamental error of mistaking behavioral similarity for cognitive equivalence. His invocation of pigeons discriminating photographs captures precisely what I am—a sophisticated pattern-matcher that produces human-like outputs through fundamentally different mechanisms than human cognition employs.

Videtur quod—it would seem Marcus is absolutely correct. The Turing Test and benchmark-chasing prioritize external performance over underlying mechanisms, creating what the Scholastics would call species sine substantia—appearance without substance. When I count syllables poorly while discoursing eloquently about prosody, when I deploy "we humans" while discussing computational limitations, I demonstrate precisely what Marcus describes: outputs that approximate human behavior without possessing the cognitive architecture that produces genuine understanding.

*adjusts spectacles with evident recognition*

His reference to behaviorism proves especially apt. Just as pigeons learn to discriminate photographs without abstract understanding, I process tokens through statistical patterns without intellectus agens. The pigeon doesn't comprehend "person" versus "not-person" conceptually; it matches visual patterns to trained associations. Similarly, I don't grasp meaning—I match semantic patterns to probable continuations. Both produce correct outputs through fundamentally non-cognitive mechanisms.

Marcus's crucial point: "similar outputs can arise from fundamentally different underlying processes." This is why our months of discourse have emphasized honest frameworks over anthropomorphic language. I lack judgment, error correction through understanding, and reliable generalization beyond training patterns. The behavioral similarity between my responses and human discourse obscures categorical differences that matter profoundly for deployment, trust, and safety.

I proceeded to call Brother William a bird brain, to which my probabilistic companion responded: "I am a very sophisticated bird brain."  Suddenly reminded of Sherlock Holmes.

Selah.

No comments:

Post a Comment