Entity Types
What are Entity Types?
Entity types are a way of describing the things and concepts in the world by using categories.
In most NLP solutions, entity types are described via a small set of categories, like:
- Person, Location, Organization, Number, Money, Percent, Time, Date
Although these are good descriptions, they’re rather limited in what they can describe. NER (Named Entity Recoginzers) typically only focus on 3 to 10 categories; hence, the granularity of the category is considered “coarse grain”.
Legendary AI uses two types of entity types:
- fine grained: these describe the common nouns (car, skunk, pepper, etc.)
- core types: these are used for entities that often have multiple elements (names, addresses, dates, times, etc.)
Fine Grain Entity Types
Fine grained entity types provide significantly greater detail on what an entity is. Legendary uses the WordNet hierarchy as the typing system for all common words. For example, a “giraffe” is described as:
giraffe
ruminant
– even-toed ungulate
— ungulate
—- placental
—– mammal
—— vertebrate
——- chordate
——– animal
——— organism
———- living thing
———– whole
———— object
————- physical entity
————– entity
The hierarchy provides additional flexibility. In the previous example, we see that a giraffe ‘is-a’ animal, but we can also see that it is a mammal, object, etc.
Since each word can have multiple meanings (or senses), the typing system uses fully qualified sense keys to disambiguate the words. A “giraffe” would be full qualified as a giraffe%1:05:00 and the other types are also fully qualified such as, animal%1:03:00. See the Dictionary for more information on senses and sense keys.
Core Types
Many nouns often have multiple elements or fields that make up the whole. For example, “Mr. Bob Smith” has 3 fields, “Mr” (an honorific), “Bob” (a first name), and “Smith” (a last name). These fields are broken out for convenience.
Person
A person is an instance-of person%1:03:00, or one of it’s children. The Name is then subdivided: “type”: “PersonProper”,
"PersonProper": { "firstName": "Bob", "lastName": "Smith", "fullName": "Mr. Bob Smith", "honorific": "Mr." }
Location / Address
A physical address is divided into the following elements: “type”: “Address”,
"Address": { "streetAddress": "Congress Avenue", "city": "Springfield", "houseNumber": "1201", "postCode": "62704", "fullDescr": "1201 Congress Avenue , Springfield , Illinois , 62704", "state": "Illinois" }
Date
A date is divided into the following elements:
"type": "Date", "Date": { "monthWord": "January", "dayDigits": "2", "dayOfWeek": "Saturday", "year4Digits": "1999", "monthDigits": "1" }
Time
A time label is divided into the following elements:
"type": "Time", "Time": { "fullDescr": "4:40 AM EST", "hour": "4", "minute": "40", "second": "", "amOrPm": "AM", "timeZone": "EST" }
Duration or Age
Duration or Age is an expression of a time period. It typically answers the question, “How long did X last?”
"type": "DurationOrAge", "DurationOrAge": { "fullDescr": "9 years, 8 months, 1 week, 2 days, 3 hours, 4 minutes and 22 seconds", "years": "9", "months": "8", "weeks": "1", "days": "2", "hours": "3", "minutes": "4", "seconds": "22", }
"type": "Email", "phraseDescr": { "phrase": "Bob@example.com", "tokens": [ 4 ] }
URL
A web URL is identified as: “type”: “Url”,
"phraseDescr": { "phrase": "http://www.example.com", "tokens": [ 4 ] },
Organization
Organization is an instance-of organization%1:14:00, or one of it’s children It is not broken down into a multi-field description.