the lsi meets `sqrt': Numerical argument is out of domain - "sqrt" (Math::DomainError) #153

hakehuang · 2017-02-28T02:40:13Z

below is my scripts

lsi = ClassifierReborn::LSI.new
lsi.add_item 'log message Error: 1', :Error
lsi.add_item 'log message Error: 0', :Pass

result  = lsi.classify 'log message Error: 1'

trace log


D:/projects/P_hobbit/AI/log_classifier/lib/classifier-reborn/extensions/vector.rb:58:in `sqrt': Numerical argument is out of domain - "sqrt" (Math::DomainError)
	from D:/projects/P_hobbit/AI/log_classifier/lib/classifier-reborn/extensions/vector.rb:58:in `block in SV_decomp'
	from D:/projects/P_hobbit/AI/log_classifier/lib/classifier-reborn/extensions/vector.rb:57:in `times'
	from D:/projects/P_hobbit/AI/log_classifier/lib/classifier-reborn/extensions/vector.rb:57:in `SV_decomp'
	from D:/projects/P_hobbit/AI/log_classifier/lib/classifier-reborn/lsi.rb:311:in `build_reduced_matrix'
	from D:/projects/P_hobbit/AI/log_classifier/lib/classifier-reborn/lsi.rb:143:in `build_index'
	from D:/projects/P_hobbit/AI/log_classifier/lib/classifier-reborn/lsi.rb:77:in `add_item'
	from D:/projects/P_hobbit/AI/log_classifier/pass_fail.rb:34:in `<main>'

I find the issue can be fixed with below change, please help to review

#154

The text was updated successfully, but these errors were encountered:

tra38 · 2017-02-28T16:04:48Z

The code could fix this specific issue (I haven't checked to be sure), but would break other code. That line was used to filter out words that have 2 or fewer characters...and while I'm not quite sure why it does this filtering, I'm afraid that the LSI might fail horribly when handling very small words. Current automated tests are failing since they are dependent on the current filtering behavior. If we can figure out why the previous programmers caused the "small words" to be filtered, we then can decide whether it is possible to add an exception that will allow us to accept digits.

In any event, I would suggest writing a new automated test for handling edge cases where numerical digits matter, so that we don't accidentally reintroduce the same behavior in the future...while also making sure all previous automated tests pass as well.

Ch4s3 · 2017-02-28T17:05:24Z

The LSI will in fact fail horribly with a NaN/NaN error if you remove this filter.

hakehuang · 2017-03-01T05:15:29Z

can you give me some test examples? @Ch4s3 . I have fixed the unit test issues. the 1 byte judgement is a real user case in my application, and I believe this requirement is universal

Ch4s3 · 2017-03-01T15:22:28Z

let me take a look tonight

Ch4s3 · 2017-03-02T05:14:57Z

could you better describe your use case @hakehuang?

hakehuang · 2017-03-02T13:55:24Z

I want to classify my build log, which usually appears as below: Error: 0 means there are no error Error: <other number> mean there are error. 2017-03-02 13:14 GMT+08:00 Chase Gilliam <notifications@github.com>:

…

could you better describe your use case @hakehuang <https://github.com/hakehuang>? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#153 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAw1xoZ1FvUTftzmVQZQ51LuHLIKirQEks5rhlBRgaJpZM4MN5bP> .

Ch4s3 · 2017-03-02T16:59:55Z

It seems like you could do that more reliably with a regex or simple string match.

hakehuang · 2017-03-03T04:42:20Z

yes o no, some times the string goes this way: Errors is: 0 Errors for this is : 3 it is very difficult to use a regex to match the diferences. 2017-03-03 0:59 GMT+08:00 Chase Gilliam <notifications@github.com>:

…

It seems like you could do that more reliably with a regex or simple string match. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#153 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAw1xt7BJM4F7H2-JzyqbH3FM9Ou2ZKmks5rhvWMgaJpZM4MN5bP> .

Ch4s3 · 2017-03-03T18:44:49Z

I'm still not sure LSI is correct. Have you tried the Bayesian classifier? You can set it up not to use stop words. However if I were you, I would just write a simple parser and match on the number.

You could also use scan

foo = "Errors is: 0"
bar = "Errors for this is : 3"
foo_num = foo.scan.scan(/\d/)
bar_num = bar.scan(/\d/)

hakehuang · 2017-03-05T11:48:37Z

there are many of such patterns, below I just list a few. and all those errors are mixed in a log, with many human readable context for debugging purpose. My idea is to have a log parser, which can classify all the error types, and give me a summary of all. I tried Bayesian and Naive Bayes, which works, but only LSI can give me a search function.

undefined symbol
undefined reference to
not defined
not define
java.lang.Exception: java.lang.InterruptedException
no definition for
enumeration value is out of
identifier is undefined
defined but not used
not fit in region
invalid operands to binary | (have 'int' and 'void *')
unable to allocate space for sections/blocks with a total estimated minimum size
with offset out of bounds
error loading bundle activator
no such file or directory
cannot be found
passing arg n of makes pointer from integer without a cast
was unable to load
exceeds the maximum allowed for
cannot open source file
cannot find source file
cannot fit into
not allowed
not facet-valid with respect to pattern
can not open
pointless integer comparison, the result is always false
cannot call
cannot be assigned to
cannot call intrinsic function
a function call cannot appear in a constant-expression
too few arguments in function call
was not declared in this scope
may be used uninitialized in this function
interact script return value
first use in this function
clock_config.h(34) : Fatal Error[Pe1696]: cannot open source file "fsl_common.h"
clock_config.h(34) : Fatal Error[Pe1696]: cannot open source file "fsl_common.h"
board.c(60) : Fatal Error[Pe1696]: cannot open source file "fsl_common.h"
clock_config.h(34) : Fatal Error[Pe1696]: cannot open source file "fsl_common.h"
clock_config.h(34) : Fatal Error[Pe1696]: cannot open source file "fsl_common.h"
board.c(60) : Fatal Error[Pe1696]: cannot open source file "fsl_common.h"
MKV58F24.h(326) : Fatal Error[Pe1696]: cannot open source file "MKV58F24.h(326) : Fatal Error[Pe1696]: cannot open source file "FreeRTOS.h(98) : Fatal Error[Pe1696]: cannot open source file "FreeRTOSConfig.h"
fsl_flash.h(68) : Fatal Error[Pe1696]: cannot open source file "fsl_common.h"

tra38 · 2017-03-05T23:49:27Z

Since I found your use case interesting, I decided to try to replicate the original case, except that it...er...works.

lsi = ClassifierReborn::LSI.new
lsi.add_item 'log message Error: 1', :Error
lsi.add_item 'log message Error: 0', :Pass

lsi.classify 'log message Error: 1'
#=> :Pass

Obviously, it's giving us the wrong answer, and looking at the LSI object suggests that it is due to the program ignoring one-character objects (digits) and not including them in the word_hashes:

=> #<ClassifierReborn::LSI:0x007f7f79980828
 @auto_rebuild=true,
 @built_at_version=2,
 @cache_node_vectors=nil,
 @items=
  {"log message Error: 1"=>
    #<ClassifierReborn::ContentNode:0x007f7f79972200
     @categories=[:Error],
     @lsi_norm=GSL::Vector
[ 5.774e-01 5.774e-01 5.774e-01 ],
     @lsi_vector=GSL::Vector
[ 6.309e-01 6.309e-01 6.309e-01 ],
     @raw_norm=GSL::Vector
[ 5.774e-01 5.774e-01 5.774e-01 ],
     @raw_vector=GSL::Vector
[ 6.309e-01 6.309e-01 6.309e-01 ],
     @word_hash={:log=>1, :messag=>1, :error=>1}>,
   "log message Error: 0"=>
    #<ClassifierReborn::ContentNode:0x007f7f799713c8
     @categories=[:Pass],
     @lsi_norm=GSL::Vector
[ 5.774e-01 5.774e-01 5.774e-01 ],
     @lsi_vector=GSL::Vector
[ 6.309e-01 6.309e-01 6.309e-01 ],
     @raw_norm=GSL::Vector
[ 5.774e-01 5.774e-01 5.774e-01 ],
     @raw_vector=GSL::Vector
[ 6.309e-01 6.309e-01 6.309e-01 ],
     @word_hash={:log=>1, :messag=>1, :error=>1}>},
 @language="en",
 @version=2,
 @word_list=
  #<ClassifierReborn::WordList:0x007f7f799711c0
   @location_table={:log=>0, :messag=>1, :error=>2}>>

So there's still that issue to deal with.

But we also have another issue at play. It's working fine on my machine while it's crashing on yours. My hypothesis for why it's crashing is based on the specific error message

D:/projects/P_hobbit/AI/log_classifier/lib/classifier-reborn/extensions/vector.rb:58

You are using vector.rb because you do not have the GSL and the the GSL Ruby Gem (to interface with the GSL) installed. Basically, if you don't have GSL on your computer, we load up our own (slower) scientific calculation library instead, which included the file "vector.rb". So there must be a bug within classifier-reborn's vector.rb file that is causing this specific error message to occur. According to the docs though, it is recommended that you install GSL, since it will make LSI "at least 10x" faster, so if you plan on using LSI, I would suggest you set up GSL on your local machine.

If you plan on not installing GSL, well...Unfortunately, I don't know enough about SVD to feel confident about debugging it. @Ch4s3, do you feel confident?

Ch4s3 · 2017-03-06T04:39:54Z

@tra38, no unfortunately our SVD function was not super well implemented, and is a bit beyond my ability with linear algebra to fix. I intend to replace it with a native ext at some point.

hakehuang · 2017-03-06T09:33:58Z

the Bayesian classifier has some other issue for my cases, which I am trying to debugging now. I drop some hot fixes of mine. with this fix, the bayes clasifier seems works fine for my cases. diff --git a/lib/classifier-reborn/bayes.rb b/lib/classifier-reborn/bayes.rb index 3d5bbf1..d658856 100644 --- a/lib/classifier-reborn/bayes.rb +++ b/lib/classifier-reborn/bayes.rb @@ -126,16 +126,23 @@ module ClassifierReborn end return score end + # if the word is not in the list just omit it category_keys.each do |category| score[category.to_s] = 0 + temp_s = 0 total = (@backend.category_word_count(category) || 1).to_f word_hash.each do |word, _count| - s = @backend.word_in_category?(category, word) ? @backend.category_word_frequency(category, word) : 0.1 - score[category.to_s] += Math.log(s / total) + temp_s += @backend.word_in_category?(category, word) ? @backend.category_word_frequency(category, word) : 0 + end + if temp_s == 0 + score[category.to_s] = Float::INFINITY + else + score[category.to_s] = Math.log(temp_s / total) end # now add prior probability for the category - s = @backend.category_has_trainings?(category) ? @backend.category_training_count(category) : 0.1 - score[category.to_s] += Math.log(s / @backend.total_trainings.to_f) + #s = @backend.category_has_trainings?(category) ? @backend.category_training_count(category) : @backend.total_trainings.to_f + #score[category.to_s] += -1.0 * Math.log(s / @backend.total_trainings.to_f) + #puts "#{category.to_s} scores #{score[category.to_s]}" end score end 2017-03-04 2:44 GMT+08:00 Chase Gilliam <notifications@github.com>:

…

I'm still not sure LSI is correct. Have you tried the Bayesian classifier? You can set it up not to use stop words. However if I were you, I would just write a simple parser and match on the number. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#153 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAw1xo7o2ye8JebwEHSTx4Ci3yji2mviks5riF-igaJpZM4MN5bP> .

hakehuang · 2017-03-06T09:36:13Z

the SVD seems a big challenge for all AI users, do you know any Ruby solutions for this? using a LAPACK backend seems not that good for cloud deployment. 2017-03-06 12:39 GMT+08:00 Chase Gilliam <notifications@github.com>:

…

@tra38 <https://github.com/tra38>, no unfortunately our SVD function was not super well implemented, and is a bit beyond my ability with linear algebra to fix. I intend to replace it with a native ext at some point. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#153 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAw1xjcb2dzee-P1PWrbhTUPfzla6v4Iks5ri44bgaJpZM4MN5bP> .

Ch4s3 · 2017-03-06T22:32:17Z

There aren't any good pure Ruby implementations that I'm aware of.

mach-kernel · 2017-06-21T16:34:15Z

I am also having issues using LSI on small words, with Math::DomainError being raised. I skip training those words as my current solution. For background, the corpus I am using are directly pulled from credit card compliance information (e.g. has dollar amounts, random - characters, etc).

lessaworld · 2017-08-15T18:26:34Z

Just came across this same issue... I know it's not a long term solution, but since I'm just evaluating this project, instead of skipping the small words, I created a hack function to just go around the problem, for now.

def fix_hack (text)
text.split(" ").map! {|w| w.size < 3 ? w+"_" : w}.join(" ")
end

and then, I just wrap every mention of the content during training and classification. e.g.

lsi = ClassifierReborn::LSI.new
lsi.add_item fix_hack("This is a test"), "test"
...
c, s = lsi.classify_with_score fix_hack("It is a test")

epugh · 2018-08-02T17:54:56Z

For me, brew install gsl and adding the GSL dependency:

gem 'classifier-reborn'  # lets get machine learning!
gem 'gsl', '~> 2.1', '>= 2.1.0.3'

has solved the sqrt issue and the other NaN issue, I think!

Ch4s3 · 2018-08-02T20:12:59Z

@epugh Have you tried with small words ~3-4 chars in length?

epugh · 2018-08-02T21:02:58Z

Yep, and with those, I just get a warning message, the code runs.

Here is my test set:

    strings = [["This text deals with dogs. Dogs.", :dog],
               ["This text involves dogs too. Dogs!", :dog],
               ["LOOKING FOR SPEAKER", :missing],
               ["Need speaker!", :missing],
               ["Need speakers!", :missing],
               ["n/a OSC Retreat.", :missing],
               ["na", :missing],
               ["spearks are needed", :missing],
               ["Matt Datastax.", :present]]
    strings.each { |x| classifier.add_item x.first, x.last }

    assert_same :missing, (classifier.classify ("speaker needed"))
    assert_not_same :missing, (classifier.classify ("Matt Overstreet Solr Stemmers"))
    assert_same :present, (classifier.classify ("Matt Overstreet Solr Stemmers"))

epugh · 2018-08-02T21:03:27Z

So the "na" gives an error, and previously before I installed gsl, the "n/a" blew up!

Ch4s3 · 2018-08-03T14:20:27Z

Unfortunately that's expected behavior, but not the desired behavior. Out plain ruby lsi implementation is pretty broken, and I lack the math background necessary to fix it.

epugh · 2018-08-03T14:42:20Z

I wonder if the best path is to say "You must have GSL installed"? I;e accept the plain ruby issues...

Ch4s3 · 2018-08-03T20:44:04Z

@epugh unfortunately we're a dependency of Jekyll, so we want to have a ruby only option to make it more accessible. However, for any sort of prod use beyond that, we strongly endorse GSL.

hakehuang closed this as completed Feb 28, 2019

YifanJiang233 mentioned this issue Oct 21, 2023

Error on adding Hashnode RSS: Liquid Exception: Numerical argument is out of domain - "sqrt" (Math::DomainError) alshedivat/al-folio#1828

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the lsi meets `sqrt': Numerical argument is out of domain - "sqrt" (Math::DomainError) #153

the lsi meets `sqrt': Numerical argument is out of domain - "sqrt" (Math::DomainError) #153

hakehuang commented Feb 28, 2017 •

edited

Loading

tra38 commented Feb 28, 2017 •

edited

Loading

Ch4s3 commented Feb 28, 2017

hakehuang commented Mar 1, 2017

Ch4s3 commented Mar 1, 2017

Ch4s3 commented Mar 2, 2017

hakehuang commented Mar 2, 2017 via email

Ch4s3 commented Mar 2, 2017

hakehuang commented Mar 3, 2017 via email

Ch4s3 commented Mar 3, 2017 •

edited

Loading

hakehuang commented Mar 5, 2017

tra38 commented Mar 5, 2017 •

edited

Loading

Ch4s3 commented Mar 6, 2017

hakehuang commented Mar 6, 2017 via email

hakehuang commented Mar 6, 2017 via email

Ch4s3 commented Mar 6, 2017

mach-kernel commented Jun 21, 2017

lessaworld commented Aug 15, 2017

epugh commented Aug 2, 2018

Ch4s3 commented Aug 2, 2018

epugh commented Aug 2, 2018

epugh commented Aug 2, 2018

Ch4s3 commented Aug 3, 2018

epugh commented Aug 3, 2018

Ch4s3 commented Aug 3, 2018

the lsi meets `sqrt': Numerical argument is out of domain - "sqrt" (Math::DomainError) #153

the lsi meets `sqrt': Numerical argument is out of domain - "sqrt" (Math::DomainError) #153

Comments

hakehuang commented Feb 28, 2017 • edited Loading

tra38 commented Feb 28, 2017 • edited Loading

Ch4s3 commented Feb 28, 2017

hakehuang commented Mar 1, 2017

Ch4s3 commented Mar 1, 2017

Ch4s3 commented Mar 2, 2017

hakehuang commented Mar 2, 2017 via email

Ch4s3 commented Mar 2, 2017

hakehuang commented Mar 3, 2017 via email

Ch4s3 commented Mar 3, 2017 • edited Loading

hakehuang commented Mar 5, 2017

tra38 commented Mar 5, 2017 • edited Loading

Ch4s3 commented Mar 6, 2017

hakehuang commented Mar 6, 2017 via email

hakehuang commented Mar 6, 2017 via email

Ch4s3 commented Mar 6, 2017

mach-kernel commented Jun 21, 2017

lessaworld commented Aug 15, 2017

epugh commented Aug 2, 2018

Ch4s3 commented Aug 2, 2018

epugh commented Aug 2, 2018

epugh commented Aug 2, 2018

Ch4s3 commented Aug 3, 2018

epugh commented Aug 3, 2018

Ch4s3 commented Aug 3, 2018

hakehuang commented Feb 28, 2017 •

edited

Loading

tra38 commented Feb 28, 2017 •

edited

Loading

Ch4s3 commented Mar 3, 2017 •

edited

Loading

tra38 commented Mar 5, 2017 •

edited

Loading