← Blog

Grounding Is a Governance Problem, Not a Data Problem

Working on an AGI architecture. The most original part: grounding is not a data problem. It is a governance problem.

1980 — Year Harnad first formalized the symbol grounding problem, asking how symbols acquire meaning rather than just relational structure
1980 Year Harnad first formalized the symbol grounding problem, asking how symbols acquire meaning rather than just relational structure Harnad, The Symbol Grounding Problem, Physica D, 1990

I have been working on an architecture for AGI and the part that keeps pulling my attention is the grounding problem.

The standard framing is: symbols are meaningless without connection to the world, and the way you connect them is through sensory data. Feed the system enough images, enough audio, enough text paired with observation, and the symbols will acquire grounded meaning. This is the implicit assumption behind most current deep learning work.

I think this framing is wrong in a specific and important way.

More data tells you how symbols co-occur. It tells you that 'apple' appears near 'red', 'round', 'fruit'. It does not tell you what an observation is. When a sensor returns a reading, the question of what that reading means is not settled by more readings. It is settled by someone deciding.

The architecture I am working on treats grounding as a governance problem:

  • who has the authority to classify an observation
  • how disputes about classification are resolved
  • what happens when two observers disagree about what they are looking at
  • how new observational categories get introduced and ratified

This is not a machine learning question. It is a political and institutional question. Every stable grounding system in human history (scientific taxonomy, legal definitions, medical diagnosis) is held together not by data but by an institution with the authority to say 'this is what this means.'

The interesting thing about AGI is that we are trying to build a system that operates across domains, which means it has to navigate many different grounding authorities at once. Science says one thing. Law says another. Common usage says a third. The system has to know not just what the symbol means but whose definition applies in this context.

Treating grounding as a data problem misses this entirely. You can train on all the scientific literature and still not know that in a legal context, the word 'intent' means something different from what it means in a psychology context. The data is the same. The authority is different.

I do not have a full solution. What I have is a design principle: every observation in the system should carry a tag for the institution or process that defined it, and reasoning across observational domains should be explicit about when it is crossing grounding boundaries.

Myth: Grounding is a data problem: give the model enough sensory data and symbols will acquire meaning automatically — Reality: More data tells you what symbols co-occur. It does not tell you who has the right to define what an observation means. That is a governance question, and no dataset answers it.
Myth: Grounding is a data problem: give the model enough sensory data and symbols will acquire meaning automaticallyHarnad, The Symbol Grounding Problem, Physica D, 1990

When designing any system that interprets observations, write down explicitly who has the authority to define what each observation means. If you cannot name a person or institution, the grounding is implicit and will break under disagreement.

Post on X

Discussion

When your system interprets an observation, do you know whose definition of that observation it is using?

Post on X
Marc L. Paris, France

This is close to what Wittgenstein was getting at with language games. Meaning is use, and use is socially governed. The question is whether a machine can participate in a language game or only simulate participation.

Julien Reszka Paris, France

Yes. And I think the answer depends on whether it can be a party to a governance structure, not just a user of one. Participation requires stakes.

Sara K. Berlin, Germany

The legal vs psychology meaning of 'intent' example is the clearest version of this I have seen. That one sentence explains why cross-domain AI reasoning keeps failing in ways that more data does not fix.

Thomas R. London, UK

Pushback: governance structures are themselves built on data. Scientific taxonomy is grounded in observations that the community agrees are reliable. You cannot fully separate the two.

Sara K. Berlin, Germany

The governance structure decides which observations count as reliable. That is exactly the point. The data does not decide its own reliability. Someone does.

All comments are manually moderated by the author.

Subscribe to get new posts by email →