-
Notifications
You must be signed in to change notification settings - Fork 29
Term candidate extraction
jerrygao edited this page Sep 23, 2017
·
4 revisions
Some presupposition must be set up to identify multi-word terms. Even though there is no standard definition for all the domains, a commonly accepted assumption [HuiZhong 1986] can be considered:
- "Multi-word terms are mainly nominals";
- "Multi-word terms cannot go across punctuation marks";
- "Verbs may be terms by themselves but not part of a multi-word term because of 1.";
- "Function words should be excluded with the exception of prepositions, because prepositions may be part of multi-word terms (e.g. 'speed OF propaga- tion', 'revolution PER minute', etc.);";
- "Adverbs may be part of a multi-word term (e.g. 'VERY high frequency', 'POSITIVELY charged ions', etc.), but adverbs for text cohesion (e.g. 'subsequently', 'naturally', 'usually', etc.) should be excluded;"
- "No multi-word terms can end up with an adjective or adverb";
- "S-endings should be removed for the purpose of frequency counting";
-
input reader
-
term candidate extraction, and pre-filters
<TO_BE_CONTINUED>
[HuiZhong 1986] Y. Huizhong, “A New Technique for Identifying Scientific / Technical Terms and Describing Science Texts,” 1986.