Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix sqrt issue when there is only one char various in traindata #154

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion lib/classifier-reborn/extensions/hasher.rb
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ def clean_word_hash(str, language = 'en', enable_stemmer = true)
def word_hash_for_words(words, language = 'en', enable_stemmer = true)
d = Hash.new(0)
words.each do |word|
next unless word.length > 2 && !STOPWORDS[language].include?(word)
next unless word.length > 0 && !STOPWORDS[language].include?(word)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid that this will start white-listing tokens of one and two characters which is not the current behavior. This can be better dealt with the #131 proposal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

if enable_stemmer
d[word.stem.intern] += 1
else
Expand Down
4 changes: 2 additions & 2 deletions test/bayes/bayesian_common_tests.rb
Original file line number Diff line number Diff line change
Expand Up @@ -139,10 +139,10 @@ def test_skip_empty_training_and_classification
classifier.train('Ruby', '')
assert classifier.categories.empty?
classifier.train('Ruby', 'To be or not to be')
assert classifier.categories.empty?
refute classifier.categories.empty?
classifier.train('Ruby', 'A really sweet language')
refute classifier.categories.empty?
assert_equal Float::INFINITY, classifier.classify_with_score('To be or not to be')[1]
assert_equal Float::INFINITY, classifier.classify_with_score('')[1]
end

def test_empty_string_stopwords
Expand Down
2 changes: 1 addition & 1 deletion test/extensions/hasher_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ def test_add_custom_stopword_path
temp_stopwords_name = File.basename(temp_stopwords.path)

Hasher.add_custom_stopword_path(temp_stopwords_path)
hash = { list: 1, cool: 1 }
hash = {:is=>1, :a=>1, :list=>1, :of=>1, :cool=>1}
assert_equal hash, Hasher.clean_word_hash("this is a list of cool words!", temp_stopwords_name)
end

Expand Down