Abstracting Syntactic Variations
A Million Ways to Say the Same Thing!
One of the best features of natural languages is that there are a million ways to say the same thing. Or, there are a ton of ways to indicate something similar. Or, you can say the same thing in many different ways. You get the point. Language allows us to express the same meaning (or semantics) by using different syntactical variations. Humans love the variety - but machines get confused by it. Hence, it’s often advantageous to ‘abstract away’ the syntactical variations, and just indicate the meaning. Unfortunately, this is easier said than done!
Patterns for Abstracting Syntactic Variations
This section reviews many of the common syntactic variation problems and indicates how the software resolves the issues.
Pronouns
Pronouns are a convenient way to identify a person or thing that has previously been mentioned. For example, in:
- Bob ate pizza because he was hungry.
we know that “he” refers to Bob. In NLP, this cross-referencing between pronouns and the item that they reference is known as “coreference resolution”. It’s considered “coreference” because not only does ‘Bob’ reference ‘he’, but ‘he’ also references ‘Bob’. They are considered equivalent in nature. Hence, you could rewrite the sentence as:
- Bob ate pizza because Bob was hungry. or, just note that:
‘Bob is equivalent to he’, or (Bob, equiv, he)
Summary: Pronouns are mapped back to the item that they reference.
Synonyms
Synonyms are words that have an equivalent meaning. For example, ‘auto’, ‘automobile’ and ‘car’ are considered synonymous. Since our dictionary uses WordNet at the core, we leverage their prebuilt ‘synsets’ which consider the use of any member of a synset to be interchangeable with any other member.
Summary: Synonyms are stored as ‘synsets’ are 100% interchangeable.
Word Scales
Some words often seem synonymous or antonymous, but only to an extent. Consider the words ‘large’ and ‘gigantic’. Both represent something with substantial size/amount, but ‘gigantic’ seems greater than ‘large’. Many adjectives and adverbs can easily be placed on a word scale. By abstracting a word down to a scale, you can capture the semantic intent without worrying about the exact words. Each scale looks like a small formula in the form of word, category = {Absolute High, Very High, High, Medium, Low, Very Low, Aboslute Low} This allows us to reduce hundreds of words down to a single, weighted category.
Consider the following examples:
- old%3:00:01, Age = High
- aged%5:00:01:old:02, Age = High
- childlike%5:00:00:young:00, Age = Low
- fledgling%5:00:00:inexperienced:00, Age = Low
The Word Scale categories were created by Legendary AI, and can be extend by users.
Summary: Word scales are categorically weighted descriptions of words and are used to reduce ‘word variety’
Synonymous Paraphrases & Semantic Composition
In addition to single word synonyms, it’s common for multiple words to be combined together to have the same meaning as a single word, or another combination of words. These ‘synonymous paraphrases’ will often cross part of speech boundaries. Consider the following:
- “Bob arrived late.” vs.
- “Bob was tardy.”
In the first sentence, ‘arrive’ is the verb, ‘late’ is an adverb. In the second sentence, ‘was’ is a copular verb, and ‘tardy’ is an adjective. Despite crossing parts of speech, we’d like to be able to note that the two sentences have an equivalent meaning.
Passive and Active Voice
When people write and speak they typically use active voice. They start the sentence with the agent that is doing the action. For example:
- “Bob ate the pizza.” (the subject is ‘Bob’, who is the person doing the action)
However, you can restate this by moving ‘pizza’ into the subject slot:
- “The pizza was eaten by Bob.” (this is passive voice)
Although the two sentences have equivalent meanings, the syntax used to express them is quite different. To abstract away the difference, the software paraphrases uses paraphrasing techniques to translate passive voice into active voice.
Summary: the software will automatically switch text in passive voice to active voice.
Numbers & Symbols (digits & symbols vs. words)
People often want to communicate quickly. For this reason, many common language units have a simple symbol that represents a more verbose word or set of words. Consider the following:
- ‘one’ vs. ‘1’
- ‘twenty’ vs. ‘20’
- ‘dollar’ vs ‘$’
- ‘percent’ vs. ‘%’
The software treats each pair as being equivalent. In fact, they’re actually captured in the same synset, so they’re logically treated as equals. When language is recorded in a knowledge base, the verbose words are converted to symbols.
Summary: digits and symbols are placed in a synset and are equivalent. When displaying, symbols are preferred over words.
Abbreviations & Acronyms
Light Verbs vs. Heavy Verbs
Light verbs are verbs that don’t carry much ‘semantic content’. They are used with a direct object (noun) and put the burden of semantic meaning on the direct object. Consider the following:
- We took a ride. (light verb = took, which doesn’t tell you much)
- We had a sandwhich. (light verb = had)
Our approach is to use a heavy verb approach. Consider the aforementioned examples:
- We rode. (heavy verb = rode), which doesn’t tell you much)
- We ate a sandwhich. (heavy verb = ate; this verb was captured in the dictionary)
For a full discussion on light verbs, see https://legendaryai.github.io/docs/light_verbs/
Summary: Light verbs are switched to heavy verbs.
Prepositional Disambiguation & Argument Types
Sentences generally take the format of ‘subject verb object’, followed by a number of phrases that bring extra information. Prepositions are the bridge to the ‘extra content. They are words like ‘with’, ‘in’, ‘above’, ‘for’, ‘of’ and so on. These words take on different meanings based on the context in which they’re used. They have different word senses, but oddly many of the prepositions map back to a single semantic concept. We call these concepts the ‘argument types’.
By using argument types, we are able to abstract away the specific preposition that is used, and instead, deliver a semantic category for the preposition. In doing so, the prepositions are fully disambiguated.
For a full discussion on the argument types, see: https://legendaryai.github.io/docs/argument_types/
Summary: Argument types are a set of semantic categories used to replace/represent prepositions.
Spray / Load Alternation
Existentials & Dummy “It”
(“there is…”