An experimental, potentially multilingual, highly configurable part-of-speech tagger.
This is still somewhat experimental.. Verb identification, in particular, is still limited. Please monitor the issuer tracker for progress on feature development.
There is a class per part of speech. All parts of speech inherit from the word
class:
Member | Type | Description |
---|---|---|
partOfSpeech |
string | one of the eight parts of speech |
word |
string | the word (token) |
type |
string | a high-level classification for this partOfSpeech |
types |
array | all possible types for this partOfSpeech |
Noun
Pronoun
Verb
Adjective
Adverb
Conjunction
Interjection
Preposition
These may all have additional members.
There is a dictionary class per part-of-speech. All dictionaries inherit from the Dictionary
class.
Any rules about how words can vary (inflections) are stored in the Dictionary
for that part of speech (e.g. NounDictionary
has rules about how to recognize plurals and possessives);
This is so that Isidore could have potential to scale into other languages without requiring massive rewrites.
Member | Type | Description |
---|---|---|
list |
array | sorted list of words |
language |
string | Language for the dictionary |
findWord() |
method | finds a word in the dictionary |
partOfSpeech |
string | the part of speech that dictionary has |
GrammarModel |
class | The type that the dictionary contains |
Member | Type | Description |
---|---|---|
list |
array | Nouns with types and wordCategories |
language |
string | two-letter abbreviation of language |
inflections |
object | inflections that can be applied to all nouns |
findWord(word) |
method | Searches for word in dictionary (returns Noun if successful) |
getInflections(word) |
method | word (string ), returns all possible inflections for the word |
guessInflection(word) |
method | (string ), returns a single inflection (object ) |
removeInflection(word, inflection) |
method | word(string ), inflection (object ). returns string |
All languages within the utility.
Right now we just have En
, which is a Language
;
Member | Type | Description |
---|---|---|
grammarDictionaries |
Object | NounDictionary , PronounDictionary , VerbDictionary , AdjectiveDictionary , AdverbDictionary , ConjunctionDictionary , InterjectionDictionary , PrepositionDictionary |
language |
string | two-letter description of language |
findWord() |
method | accepts a string, returns an array containing partOfSpeech or a word if no word is found |
The sentence is where the parsing magic starts.
Member | Type | Description |
---|---|---|
text |
string | raw text of the sentence |
type |
string | declarative, interrogative, imperative, exclamatory |
language |
string | Language (En ) is default |
rawWordList |
array | only the words in the sentence |
wordList |
array | each word in the sentence classified as either a or word |
types |
array | the possible types that a sentence could have |
getSentenceType() |
method | returns string, the type of sentence. Guesses what the sentence type is based on punctuation |
const { Sentence } = isidore
const mySentence = new Sentence('He gives her a car.');
const { wordList } = mySentence;
console.log(mySentence);
/*
Sentence {
text: 'He gives him a car.',
language: 'En',
rawWordList: [ 'he', 'gives', 'him', 'a', 'car' ],
wordList:
[
Pronoun {
partOfSpeech: 'pronoun',
word: 'he',
referent: 'animate',
gender: 'masculine',
type: 'subject',
person: 3,
quantity: 'singular'
},
Verb {
partOfSpeech: 'verb',
word: 'give',
type: 'transitive',
valence: 2
},
Pronoun {
partOfSpeech: 'pronoun',
word: 'him',
referent: 'animate',
gender: 'masculine',
type: 'object',
person: 3,
quantity: 'singular',
},
Adjective {
partOfSpeech: 'adjective',
word: 'a',
type: 'article',
degree: undefined
},
Noun {
partOfSpeech: 'noun',
word: 'car',
type: 'entityClass',
subType: 'common',
inflection: undefined
}
],
type: 'declarative'
}
*/