-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reducing time to first model #448
Comments
I suspect that there's going to be some work needed to get StatsModels to a point where that's possible. I think there's a lot of unnecessary specialization due to, ahem overzealous use of type parameters and tuples to represent collections of terms, and coercing all tables to |
Just tried it. Still 10 seconds. |
I think this is largely a type stability problem. There is a lot of type instability both here and in StatsModels.jl. Objects like This comment in StatsModels - statsmodels.jl speaks to at least part of the problem:
Also this is a better test, as DataFrames.jl using a Dict for keys complicates things and adds complications and compile time of it's own: @time begin
data = (X=[1.0,2.0,3.0], Y=[2.0,4.0,7.0])
ols = lm(@formula(Y ~ X), data)
show(stdout, "text/plain", ols)
end This could clearly be type stable as everything is known at compile time. The instabilities are introduced later by I don't think any of that code will precompile. Additionally, |
Just moving @kleinschmidt I guess I'm suggesting the package could do with more specialisation, rather than less. |
I think this is the key problem, as @kleinschmidt noted. Type instability is probably not an issue as stability is only useful for large datasets, while the one in the OP is super small. Anyway the most costly core operations are in GLM and are type-stable.
#339 will get rid of this. But I don't expect it make a big difference. |
Im talking about type stability as a compilation cost, not a run-time problem. Everything is boxed and unstable here. This is slow to compile and won't be saved in precompilation. Its probably the main reason your precompile attempts dont work. In my experience using |
I've tried reducing latency ("time to first model") by enabling precompilation. Following the strategy adopted for DataFrames, I used SnoopCompile to extract all methods that are compiled when running the test suite:
Unfortunately, the result is disappointing even for examples that are in the tests:
This is probably due to two reasons:
precompile
directives directly in the session, I get slightly better timings (about 5s for the first example).The text was updated successfully, but these errors were encountered: