Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the SVD #30

Open
Ch4s3 opened this issue Dec 3, 2014 · 31 comments
Open

Rewrite the SVD #30

Ch4s3 opened this issue Dec 3, 2014 · 31 comments

Comments

@Ch4s3
Copy link
Member

Ch4s3 commented Dec 3, 2014

As discussed in #27 we need to rewrite the SVD method here. This could also be used as an opportunity to remove the monkey patch on matrix, and provide a method like svd(matrix). This will clear up #5 as well.

Here's a resource I found on SVD. I'll definitely need help on this one.

@Ch4s3
Copy link
Member Author

Ch4s3 commented Jun 14, 2015

@parkr The more I read about this, the more I think a pure Ruby implementation is just going to always give us poor performance. It might make sense to bundle a C solution into the gem since there a lot of great existing solutions. Then just check to make sure it builds correctly on various platforms.

I'm not married to this approach, but it could be the right way forward. Thoughts?

@parkr
Copy link
Member

parkr commented Jun 14, 2015

That seems like a good plan.

@Ch4s3
Copy link
Member Author

Ch4s3 commented Jun 14, 2015

Ok. I'll devise a plan tonight and scout out implementations. It'll be good to wrap this one up.

@parkr
Copy link
Member

parkr commented Jun 15, 2015

Sounds terrific. Thanks, Chase!

@Ch4s3
Copy link
Member Author

Ch4s3 commented Jun 15, 2015

Not a problem. I want to wrap up a few things in a row and get a release out in the next week(ish).

@Ch4s3
Copy link
Member Author

Ch4s3 commented Jun 16, 2015

I figured out how to build and compile C-extensions, and just need to settle on an implementation that has a good license. @parkr do you have a strong OSS license preference?

*edit this implementation looks nice, and I've seen the author's name on some papers. The license is GNU.

@parkr
Copy link
Member

parkr commented Jun 16, 2015

As long as the license is compatible and GitHub can use it for commercial purposes, it's fine. I usually lean for the MIT. Can MIT projects contain GNU code? I'm not sure.

@Ch4s3
Copy link
Member Author

Ch4s3 commented Jun 16, 2015

I'll read up

@Ch4s3
Copy link
Member Author

Ch4s3 commented Jul 23, 2015

Ok, I've tried few things and have it sort of working, but it segfaults on rare occasions. Still a wip.

@parkr
Copy link
Member

parkr commented Jul 23, 2015

Ok, I've tried few things and have it sort of working, but it segfaults on rare occasions. Still a wip.

Sweet! When is it segfaulting? Is the C still optional?

@Ch4s3
Copy link
Member Author

Ch4s3 commented Jul 23, 2015

It segfaults while transforming the matrix for some inputs. It could be optional.

@jayniz
Copy link

jayniz commented Aug 19, 2015

Any news here? :)

@Ch4s3
Copy link
Member Author

Ch4s3 commented Aug 20, 2015

@jayniz I had to take a break from this for a bit, but I'll try to get back to it soon. Basically I found 1 or 2 promising implementations, but they both segfault for some input. I'm not sure why yet so I haven't gotten past that. I'll do my best to figure it out soon, but I'm open to people helping out.

@jayniz
Copy link

jayniz commented Aug 21, 2015

Hey @Ch4s3 no problemo :-) The bayes classifier is working and goes into production today.

I played around with LSI locally, and I got the above errors for my inputs - not with the GLS lib and gem though, that worked. But where training the bayes with 6k inputs took ~8 sec, LSI took a couple of minutes for 600 inputs (and with 6k inputs it's still running after 30h) on a recent 13"mbp with 16G ram.

In my benchmarks the bayes classifier performed quite well to detect comment spam though - trained with 6k comments I had it classify ~80k comments and it classified correctly in 94.5% of the cases, with 4.75% false negatives and 0.75% false positives.

Let's see how it behaves in production!

@Ch4s3
Copy link
Member Author

Ch4s3 commented Aug 21, 2015

@jayniz Awesome to hear that the bayes classifier is working! I'll keep working on the SVD.

@Ch4s3
Copy link
Member Author

Ch4s3 commented Nov 4, 2015

@jayniz If you can provide me with sample data that worked with the GLS but broke with the ruby implementation, that would be super helpful!

@jayniz
Copy link

jayniz commented Nov 4, 2015

@Ch4s3 not straight away, I'd have to run it on some data to see if it crashes first :-) if you let me know which implementation I should try to crash, I can try to find some time this week to make it crash and then send you the data that crashed it?

@Ch4s3
Copy link
Member Author

Ch4s3 commented Nov 4, 2015

@jayniz Awesome, see if you can blow up the native ruby implementation. I found an LGPL implementation of the svd in C and I'm writing a wrapper for a c extension, but it won't be done straight away. Any input that crashes other implementations will be good for testing.

@jayniz
Copy link

jayniz commented Nov 4, 2015 via email

@Ch4s3
Copy link
Member Author

Ch4s3 commented Nov 4, 2015

ahh sorry, the LSI feature uses the SVD under the hood, and the ruby svd is what makes it slow vs using GSL. See this line.

@jayniz
Copy link

jayniz commented Nov 4, 2015 via email

@Ch4s3
Copy link
Member Author

Ch4s3 commented Nov 4, 2015

No worries. I probably won't have the c extension done by the weekend anyway.

@Ch4s3
Copy link
Member Author

Ch4s3 commented May 28, 2016

So after a few tries, I can't get the c extension right. If anyone else wants to try, I can push up what I tried. Maybe it would make sense to use Helix to wrap a rust SVD lib. That way we could distribute the binary and have some minimal guarantees about safety.

@Ch4s3
Copy link
Member Author

Ch4s3 commented Jan 3, 2017

May have found a candidate rust lib here and some workable demo code here... It might actually work

This is the direction I'm thinking of going

@danielpclark
Copy link

danielpclark commented Jan 15, 2017

@Ch4s3 Hey, I love that you brought Rust in on this. I'd like to provide some resources that may be helpful in getting this done. I'm the author of faster_path which rewrites Ruby's Pathname library in Rust for improving the performance of Rails at 30%+

I see you're using Thermite. That's next on my agenda for faster_path as it will allow binary builds of the Rust build of the dynamic library to be served from a host and thereby not require the Ruby users to have Rust installed. My current focus on my project is to have my test suite prove cross-platform Rust compilation works since Mac OS & a few Linux distros skip the Rust library build process.

But as for the helpful resources, the code base for faster_path has plenty of working Rust code running under Ruby. The next two things are an article I wrote Coming to Rust from Ruby and my documentation during my first week learning Rust Getting started in Rust.

Also when looking up Rust methods the internet is actually a more difficult way to find answers; the best answers to be found are directly from documentation provided in the Dash app. Searching for methods in Dash has been the quickest and most accurate tool for the job. An alternative to Dash is Zeal which is the open source version of it.

@Ch4s3
Copy link
Member Author

Ch4s3 commented Jul 31, 2017

@danielpclark Have you moved to thermite yet? I'm looking to pick this back up.

@danielpclark
Copy link

@Ch4s3 No… The author updated his PR to be current with the code base but the CI tests results are intermittent/flaky.

@Ch4s3
Copy link
Member Author

Ch4s3 commented Jul 31, 2017

Yeah, I saw that. I need to get back to the Rust book and try this again.

@eftikharEmad
Copy link

Hello,
can you check my test case
I have strings = [] with 100 element

lsi = ClassifierReborn::LSI.new
strings.each do |x| 
  lsi.add_item x[0], x[1]
end

but after specific number of added Items I am getting comparison of Float with NaN failed
Once I remove the last element from classifier I do the process again and give same error on another string.

I am not sure Why this error raised for string rather than other.
Some strings I got this error Math::DomainError: Numerical argument is out of domain - "sqrt"

Any solution ??

@danielpclark
Copy link

danielpclark commented Mar 29, 2018

@Ch4s3 I've got Thermite integrated now. I've submitted a bunch of PRs to ruru that give you most of Ruby's native features (they've been sitting unnoticed for a while) and I've merged them all into my own fork https://github.com/danielpclark/ruru/tree/playground if you want to try them out. I'm building FasterPath directly from it so that repo branch is here to stay.

I have pretty much mastered Rust to Ruby integration. Ruru doesn't support splat operators for parameters yet so I wrote code that's a little more bare metal here if you'd like to see it working with any number of parameters of input. Other than that the rest of how to do it should be somewhat easy to see from FasterPath.

@Ch4s3
Copy link
Member Author

Ch4s3 commented Apr 16, 2018

@danielpclark I'll take a look as soon as I can, but it probably won't be until early Summer. Thanks for giving me the heads up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants