Incorrect similar words for words with 'vbz' pos #177

dhowe · 2022-05-21T07:16:33Z

For example, many incorrect verb forms in list for 'spreads':

    let word = 'spreads', pos = 'vbz';

    let rhymes = RiTa.rhymes(word, { pos });
    let sounds = RiTa.soundsLike(word, { pos });
    let spells = RiTa.spellsLike(word, { pos });

KarlieZhao · 2022-06-01T09:08:49Z

I think this problem appears because there are some words with incorrect pos in dict,
for example,

"computerized":["k-ah-m p-y-uw1 t-er ay-z-d","jj nn vb vbn"],
"discriminated":["d-ih s-k-r-ih1 m-ah n-ey t-ah-d","vbd jj nn vb"],
"expected":["ih-k s-p-eh1-k t-ah-d","vbn vbd jj vb"]

words like 'computerized' will be considered as base form verbs (because their pos contain 'vb') and hence, in this case where the target pos is vbz, conjugator will directly return 'computerizeds'. The easiest way to solve this might be just to modify the words' pos in dict?

KarlieZhao · 2022-06-01T12:24:27Z

#179 might be for the same reason

dhowe · 2022-06-01T16:11:55Z

good notice -- I wonder if we might be able to remove all the 'vbn' from the dictionary, since we can compute them from the base form

dhowe · 2022-06-04T13:52:56Z

So we have done this before with verb tenses (see earlier tickets from @cqx931 below). Once we find a pos that we want to remove from the dict, then we need to find all the places we would need to make updates to the code to deal with that pos (soundsLike, spellsLike, search, pos, conjguate, hasWord, tag etc.), then add tests (which will fail), then add the code to handle these cases, then remove the words with a script... then re-try the tests and adjust until the pass...

See:
dhowe/RiTaV1#536
dhowe/RiTaV1#366
dhowe/RiTaV1#357
dhowe/RiTaV1#365

dhowe/RiTaJSv1#37
#80

KarlieZhao · 2022-06-08T09:29:43Z

So here's a list of verbs with incorrect pos in the current dict, and the pos I think need to be removed/added are in the comment:

"beat": ["b-iy1-t", "vb jj nn vbd vbn vbp"], //-vbn
"become": ["b-ih k-ah1-m", "vb vbd vbn vbp"], //-vbd
"bit": ["b-ih1-t", "nn vbd vbn jj rb vb"], //-vb, -vbn
"bore": ["b-ao1-r", "vbd vbp jj nn vb"], //-vbd
"broke": ["b-r-ow1-k", "vbd vbn jj rb vb"], //-vb, -vbn
"build": ["b-ih1-l-d", "vb vbn vbp nn"], //-vbn
"called": ["k-ao1-l-d", "vbn vbd vb"], //-vb
"come": ["k-ah1-m", "vb vbd vbn vbp vbz jj"], //-vbd, -vbz
"committed": ["k-ah m-ih1 t-ah-d", "vbn jj vb vbd"], //-vb
"computerized": ["k-ah-m p-y-uw1 t-er ay-z-d", "jj nn vb vbn"], //-vb, -nn
"concerned": ["k-ah-n s-er1-n-d", "vbn jj vb vbd"], //-vb
"discriminated": ["d-ih s-k-r-ih1 m-ah n-ey t-ah-d", "vbd jj nn vb"], //-vb, -nn
"ended": ["eh1-n d-ah-d", "vbd jj vb vbn"], //-vb
"enter": ["eh1-n t-er", "vb vbn vbp"], //-vbn
"expected": ["ih-k s-p-eh1-k t-ah-d", "vbn vbd jj vb"], //-vb
"finished": ["f-ih1 n-ih-sh-t", "vbd jj vb vbn"], //-vb
"gained": ["g-ey1-n-d", "vbd vbn vb"], //-vb
"got": ["g-aa1-t", "vbd vbn vbp vb"], //-vb, -vbn
"have": ["hh-ae1-v", "vbp jj nn vb vbn"], //-vbn
"include": ["ih-n k-l-uw1-d", "vbp vbn vb"], //-vbn
"increased": ["ih-n k-r-iy1-s-t", "vbn jj vb vbd"], //-vb
"involved": ["ih-n v-aa1-l-v-d", "vbn vbd jj vb"], //-vb
"knit": ["n-ih1-t", "vbn jj nn vb"], //+vbd
"launched": ["l-ao1-n-ch-t", "vbn vbd vb"], //-vb
"lead": ["l-eh1-d", "vb vbn vbp jj nn"], //-vbn
"led": ["l-eh1-d", "vbn vbd vb"], //-vb
"lived": ["l-ay1-v-d", "vbd vbn vb"], //-vb
"outpaced": ["aw1-t p-ey-s-t", "vbd nn vb vbn vbp"], //-vb
"oversaw": ["ow1 v-er s-ao", "vbd vb"], //-vb
"oversold": ["ow1 v-er s-ow1-l-d", "vbn jj vb"], //-vb
"own": ["ow1-n", "jj vbn vbp vb"], //-vbn
"paled": ["p-ey1-l-d", "vbd vb vbn"], //-vb
"pay": ["p-ey1", "vb vbd vbp nn"], //-vbd
"plan": ["p-l-ae1-n", "nn vb vbn vbp"], //-vbn
"post": ["p-ow1-s-t", "nn in jj vb vbd vbp"], //-vbd
"prepaid": ["p-r-iy p-ey1-d", "jj vbn vb"], //-vb
"pressured": ["p-r-eh1 sh-er-d", "vbn jj nn vb vbd"], //-vb
"proliferated": ["p-r-ah l-ih1 f-er ey t-ih-d", "vbn vb vbd"], //-vb
"remade": ["r-iy m-ey1-d", "vbn nn vb"], //-vb, +vbd
"rent": ["r-eh1-n-t", "nn vb vbn vbp"], //-vbn
"reopened": ["r-iy ow1 p-ah-n-d", "vbd vbn vb"], //-vb
"reported": ["r-iy p-ao1-r t-ah-d", "vbd jj vb vbn vbp"], //-vb
"repurchase": ["r-iy p-er1 ch-ah-s", "nn vbd vbn jj vb"], //-vbd, -vbn
"resold": ["r-iy s-ow1-l-d", "vbn vbd vbp vb"], //-vb
"roast": ["r-ow1-s-t", "nn vb vbn"], //-vbn
"settled": ["s-eh1 t-ah-l-d", "vbd vbn jj vb"], //-vb
"spit": ["s-p-ih1-t", "vb nn vbd"], //+vbn
"started": ["s-t-aa1-r t-ah-d", "vbd jj vbn vb"], //-vb
"sublet": ["s-ah1 b-l-eh-t", "vb vbn"], //+vbd
"trouble": ["t-r-ah1 b-ah-l", "nn vbd vbp jj vb"], //-vbd
"wed": ["w-eh1-d", "vbn vb"], //+vbd
"were": ["w-er", "vbd vb"], //-vb
"weren't": ["w-er-ah-n-t", "vbd vb"], //-vb
"wet": ["w-eh1-t", "jj nn vbd vb vbp"], //+vbn

I suggest that the first step is to remove the 'vb' tags in words that are not in base form, which should fix the problem in this ticket. Then we can consider removing those verbs with only vb* tag and no other tags, as suggested in dhowe/RiTaV1#357

For step 1, below are the corresponding tests to be added, taking 'concern' ('concerned') as an example:

//hasWord
expect(RiTa.hasWord("concerned")).to.be.true;
expect(RiTa.hasWord("concerneds")).to.be.false;
expect(RiTa.hasWord("concerneded")).to.be.false;

//pos
eql(RiTa.pos("concerned"), ["vbd"]);
eql(RiTa.pos("concerned", { simple: 1 }), ["v"]);

//search
expect(RiTa.search({ pos: "vb",limit: -1 }).includes("concerned")).to.be.false;
expect(RiTa.search({ pos: "vbn",limit: -1 }).includes("concerned")).to.be.true;
expect(RiTa.search('concern', { pos: "vbd", limit: -1 })).eql([ 'concerned']);
expect(RiTa.search('concern', { pos: "vbn", limit: -1 })).eql([ 'concerned']);

//conjugate
let opt = {
        number: RiTa.SINGULAR,
        person: RiTa.FIRST,
        tense: RiTa.PAST
};
expect(RiTa.conjugate("concern", opt)).eq("concerned");

//unconjugate
expect(RiTa.conjugator.unconjugate("concerned")).eq("concern");

//allTags
expect(RiTa.tagger.allTags("concerned")).eql(['vbd','jj','vbn']);

//tag
eq(RiTa.tagger.tag(["I", "am", "concerned", "about","this", "."], { inline: true }), "I/prp am/vbp concerned/jj about/in this/dt .");

//soundsLike
expect(RiTa.soundsLike("concern", { pos: 'vb' }).includes("concerned")).to.be.false;

//spellsLike
expect(RiTa.spellsLike("concern", { pos: 'vb' }).includes("concerned")).to.be.false;

please let me know if any part of the list/tests has problems.

dhowe · 2022-06-08T12:28:06Z

This looks really good -- I think the ultimate goal is to only have 'vb' for each of the regular verbs (plus all needed forms for irregular verbs) and compute all the other forms when needed... But this is a great first step -- do you want to do a PR in ritajs to start?

KarlieZhao · 2022-06-09T02:02:10Z

yes, I'll make the tests past and create a PR

dhowe · 2022-06-09T05:15:12Z

great -- also needs to handle:

RiTa.analyze('concerned')
RiTa.analyze('concerns')

dhowe · 2022-06-18T23:31:59Z

@KarlieZhao status ?

KarlieZhao · 2022-06-19T12:47:56Z

@KarlieZhao status ?

the issue in this ticket should've been fixed, however, I think we can go ahead and try to remove the words with only vb* tags in the lexicon...

dhowe · 2022-06-19T14:03:55Z

good - this will take some thought, so first come up with a plan... then we can discuss

dhowe assigned Real-John-Cheung May 21, 2022

dhowe assigned KarlieZhao and unassigned Real-John-Cheung May 30, 2022

KarlieZhao mentioned this issue Jun 3, 2022

fix incorrect similar words dhowe/ritajs-v2#220

Closed

dhowe mentioned this issue Jun 4, 2022

Check that we have all 6 verb tenses handled #89

Open

6 tasks

KarlieZhao mentioned this issue Jun 13, 2022

fix incorrect verb forms in dict dhowe/ritajs-v2#222

Merged

KarlieZhao mentioned this issue Jun 23, 2022

sync: incorrect vb* pos dhowe/rita4j#156

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect similar words for words with 'vbz' pos #177

Incorrect similar words for words with 'vbz' pos #177

dhowe commented May 21, 2022 •

edited

Loading

KarlieZhao commented Jun 1, 2022

KarlieZhao commented Jun 1, 2022

dhowe commented Jun 1, 2022

dhowe commented Jun 4, 2022 •

edited

Loading

KarlieZhao commented Jun 8, 2022

dhowe commented Jun 8, 2022 •

edited

Loading

KarlieZhao commented Jun 9, 2022

dhowe commented Jun 9, 2022

dhowe commented Jun 18, 2022

KarlieZhao commented Jun 19, 2022

dhowe commented Jun 19, 2022 •

edited

Loading

Incorrect similar words for words with 'vbz' pos #177

Incorrect similar words for words with 'vbz' pos #177

Comments

dhowe commented May 21, 2022 • edited Loading

KarlieZhao commented Jun 1, 2022

KarlieZhao commented Jun 1, 2022

dhowe commented Jun 1, 2022

dhowe commented Jun 4, 2022 • edited Loading

KarlieZhao commented Jun 8, 2022

dhowe commented Jun 8, 2022 • edited Loading

KarlieZhao commented Jun 9, 2022

dhowe commented Jun 9, 2022

dhowe commented Jun 18, 2022

KarlieZhao commented Jun 19, 2022

dhowe commented Jun 19, 2022 • edited Loading

dhowe commented May 21, 2022 •

edited

Loading

dhowe commented Jun 4, 2022 •

edited

Loading

dhowe commented Jun 8, 2022 •

edited

Loading

dhowe commented Jun 19, 2022 •

edited

Loading