Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix sqrt issue when there is only one char various in traindata #154

Closed
wants to merge 2 commits into from

Conversation

hakehuang
Copy link

this is a fix for below issue

lsi = ClassifierReborn::LSI.new
lsi.add_item 'log message Error: 1', :Error
lsi.add_item 'log message Error: 0', :passenger_ship: 
result  = lsi.classify 'log message Error: 1'

then the sqrt will have error when do the svd operation.

D:/projects/P_hobbit/AI/log_classifier/lib/classifier-reborn/extensions/vector.rb:58:in `sqrt': Numerical argument is out of domain - "sqrt" (Math::DomainError)
	from D:/projects/P_hobbit/AI/log_classifier/lib/classifier-reborn/extensions/vector.rb:58:in `block in SV_decomp'
	from D:/projects/P_hobbit/AI/log_classifier/lib/classifier-reborn/extensions/vector.rb:57:in `times'
	from D:/projects/P_hobbit/AI/log_classifier/lib/classifier-reborn/extensions/vector.rb:57:in `SV_decomp'
	from D:/projects/P_hobbit/AI/log_classifier/lib/classifier-reborn/lsi.rb:311:in `build_reduced_matrix'
	from D:/projects/P_hobbit/AI/log_classifier/lib/classifier-reborn/lsi.rb:143:in `build_index'
	from D:/projects/P_hobbit/AI/log_classifier/lib/classifier-reborn/lsi.rb:77:in `add_item'
	from D:/projects/P_hobbit/AI/log_classifier/pass_fail.rb:34:in `<main>'

Signed-off-by: Hake Huang hakehuang@gmail.com

Signed-off-by: Hake Huang <hake.huang@nxp.com>
Signed-off-by: Hake Huang <hake.huang@nxp.com>
@@ -27,7 +27,7 @@ def clean_word_hash(str, language = 'en', enable_stemmer = true)
def word_hash_for_words(words, language = 'en', enable_stemmer = true)
d = Hash.new(0)
words.each do |word|
next unless word.length > 2 && !STOPWORDS[language].include?(word)
next unless word.length > 0 && !STOPWORDS[language].include?(word)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid that this will start white-listing tokens of one and two characters which is not the current behavior. This can be better dealt with the #131 proposal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

@Ch4s3 Ch4s3 closed this Apr 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants