Who's on First?

Abbott Costello
Abbott and Costello (twm1340 CC Attribution-ShareAlike 2.0 Generic)
November 16th, 2012

TL;DR - Named entity detection is provably impossible, but that doesn't stop us from trying.

"Who's on First?" is an old standup routine from the duo Abbott and Costello. If you're a "Who's on First?" virgin, I'd suggest you take a few minutes and rectify that.

Beyond its comedic twist, this routine serves as a wonderful illustration as to how difficult named entity detection can be, even for humans. The protagonists Who, What, I Don't Know, Why... all have names which, in the proper contexts, lead to unresolvable ambiguities.

Take the title, "Who is on first". There are at least two different takes on it. Most naturally one would think of it as a question with the same meaning as "Which person is on first base?". However, it admits another meaning, as a statement, if we realize that "Who" could be the name of a person. While this may seem contrived, surprisingly it's not.

For example, Thiers is the name of a city in France

Thiers, France

Thiers, France

So spoken, sentences from the Beatitudes like "Blessed are the poor in spirit; Theirs is the kingdom of heaven" are given a totally new spin.

Also, it's often the case that words that are not named entities are repurposed as named entities. For example, near where we're located there's a chain of hostels called Wombat.


So, sentences like "Wombats are in Germany" are  true or false depending on the interpretation of "Wombat". Thus, something seemingly so simple, identifying the named entities in a sentence, is actually impossible to do reliably. The question is: What should be done?

Our philosophy on this matter owes a lot to Mao's "Let a thousand flowers bloom" campaign; however, we don't stomp the flowers just after they've to bloomed. What we do is track, in parallel, all possible interpretations of a word as a named entity, gathering evidence for and against each interpretation.

So, if we were posed with the question "Wombats are in Germany?", Wombat would be interpreted as a hostel and as an animal, and evidence would be gathered for each interpretation in parallel. It's a lot of work for our servers, but there's really no alternative. Language is slipperier than a greased pig!

Greased Pig

Chasing the Greased Pig by Richard Doyle

Leave a comment: