-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OTHER] Question: AR coefficients, sklearn lags understanding #65
Comments
Hello, thanks for the question, I think that it's a good sign that we could extend some parts of the documentation which doesn't cover this. All models have a model_lr = get_sklearn_wrapper(LinearRegression, lags=10)
_ = model_lr.fit(X[:-10], y[:-10]).predict(X[-10:])
model_lr.model.coef_ |
Thank you for the swift answer. It works. Very interesting library!
I have another simple question, if you don't mind.
I'm specifically interested to the possibility of
using Sklearn-API regressors as autoregressive models for time-series
prediction,
hence I need to understand well the hcrystalball data parsing model for
time series.
My question refers to the section 'Autoregressive Modelling in Sklearn'
(
https://hcrystalball.readthedocs.io/en/stable/examples/tutorial/wrappers/02_ar_modelling_in_sklearn.html)
and to the figure there (there must be two errors in it, x2 is skipped
everywhere,
and an x95 should be an x96 - second green line from the top, far right).
If you have say 100 datapoints, horizon 1 and lags 3, and e is the iid
error,
is it correct to say that, after calling
get_sklearn_wrapper(LinearRegression),
the y series is first organized as
y3 = f(y0,y1,y2)+e, y4=f(y1,y2,y3)+e, and so on until y100=f(y97,y98,y99),
then this matrix is passed to sklearn.linear_model.LinearRegression ?
Thank you.
…On Sat, Jul 24, 2021 at 11:27 PM Pavel Krizek ***@***.***> wrote:
Hello,
thanks for the question, I think that it's a good sign that we could
extend some parts of the documentation which doesn't cover this.
All models have a model attribute which allows accessing the original
estimator. In the case of sklearn the situation is a little bit more
complicated because the actual fit happens only in the predict method
(because the length of X passed to predict method is not known in the
time of fitting). This is an implementation detail, but the fact is that
you can access the coef_ only after calling predict method for
get_sklearn_wrapper.
model_lr = get_sklearn_wrapper(LinearRegression, lags=10)
_ = (model_lr.fit(X[:-10], y[:-10]).predict(X[-10:])
model_lr.model.coef
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#65 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOMFUN26RPDZ2F2LPDMWUBDTZMV37ANCNFSM5A5TM44Q>
.
--
Carlo Lucheroni
School of Sciences and Technologies
Universita' di Camerino
via Madonna delle Carceri 9
62032 Camerino (MC)
Italy
Office phone: 39-0737402552
--
Ask me for my Mathematical Finance and
Stochastic Dynamic Optimization courseware:
'Take the Chance!'
View and download my papers on Finance and Physics from ResearchGate:
https://www.researchgate.net/profile/Carlo_Lucheroni/publications
View and download my research on power markets on my SSRN Author page:
http://ssrn.com/author=1390778
You can also check IDEAS/RePEc for my name:
https://ideas.repec.org/f/plu234.html <https://ideas.repec.org/>
|
Hi @lucheroni, You are right, there should be x1, x2, x3, ... and later on x95, x96, x97 ; x96, x97, x98 etc. Thanks for spotting it. And yes, in this case, y0, y1, y2 are passed as X to the LinearRegression and y3 as y. Or one could also formulate it relative to y as following:
|
@lucheroni, @pavelkrizek I updated the picture, but don't want to close it in case there are still some open questions. Can I close it as is? :) |
I asked because I'm trying to check the in-sample predictions,
|
It's all right, the formula to predict an AR(2) variable is: You might got confused as the x go back in time, therefore the array is 'reversed'. used this code to show the differences: for i in range(1,8):
print("model_lry.predict(X[-"+str(i)+":])")
print(model_lry.model.coef_)
print(model_lry.predict(X[-i:])) There you'll also see that the coefficients change. That is indeed another source of confusion as they are not labeled with the changing xt-1,xt-2. Hope that was what you wanted to understand. |
Thank you for the clarification. My point is related exactly to what you embedded in the code. It seems that the library re-fits the data each time one calls .predict(), and it does it in a different way each time. This is at least what I get when I run the code fragment above - changing coefficients. Usually, on the contrary, once you have prepared your design matrix, you fit the model once and the coefficients will stay the same whatever you do with the model after fitting. It seems to me that the library registers in the namespace a model_lry.model.coef_ only Am I wrong on that? |
Finally found the problem. |
@ambader Could you please supply an example that reproduces the problem you are facing? Thank you. |
It's still the same problem this whole thread is about: ar_ord = 2
hz = 10
train_me = np.array([0.,1.,0.,1.,0.,1.,0.,1.,0.,1.])
train_me=np.tile(train_me,100)
y = train_me
date_index = pd.date_range(start="2020-01-01", periods=len(y), freq="d")
X = pd.DataFrame(index=date_index)
for i in range(1,8):
model_lry = get_sklearn_wrapper(LinearRegression, lags=ar_ord, fit_intercept=False,optimize_for_horizon=False)
print("model_lry.predict(X[-9:-"+str(i)+":])")
model_lry.fit(X[:-11],y[:-11])
print(model_lry.predict(X[-9:-i]))
print(model_lry.model.coef_) |
Ok, this behavior is expected by the implementation as the model is a function of training data which are defined by the horizon. There are also other ways how to implement the regressor as a time series forecaster, each with its advantages and drawbacks. PRs for new implementations are welcomed :) We had also initially horizon being explicitly passed to the |
I hope you'll be able to implement the method in a different way, if you
want to
push the scikit compatibility feature of your library to being used by the
time series community.
As it is, the method is not compatible with the way econometricians work:
in econometrics estimation is made in a given interval, and must be
separated from prediction inference
in other intervals (that can include or not the original estimtion
interval), take statsmodels
as the classic example.
Imagine an estimation which requires a very long time (days), and that
after it you need to predict your data in different time segments, at will.
You save the fitted model then you explore its capability. You don't want
to re-estimate it at each try.
The advantage that I thought I could find in hcrystalball was that it could
be a nice interface
for doing quick econometrics with regularization (not easy in python,
whereas scikit includes
ridge and lasso for static problems) or with static machine learning
methods extended to time series.
That could alleviate the need of writing interfaces by oneself, as it is
usually done.
…On Fri, Jul 30, 2021 at 11:25 AM Pavel Krizek ***@***.***> wrote:
Ok, this behavior is expected by the implementation as the model is a
function of training data which are defined by the horizon. There are also
other ways how to implement the regressor as a time series forecaster, each
with its advantages and drawbacks. PRs for new implementations are welcomed
:)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#65 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOMFUN6V5557KCIYRCOEH2TT2JVXTANCNFSM5A5TM44Q>
.
--
Carlo Lucheroni
School of Sciences and Technologies
Universita' di Camerino
via Madonna delle Carceri 9
62032 Camerino (MC)
Italy
Office phone: 39-0737402552
--
Ask me for my Mathematical Finance and
Stochastic Dynamic Optimization courseware:
'Take the Chance!'
View and download my papers on Finance and Physics from ResearchGate:
https://www.researchgate.net/profile/Carlo_Lucheroni/publications
View and download my research on power markets on my SSRN Author page:
http://ssrn.com/author=1390778
You can also check IDEAS/RePEc for my name:
https://ideas.repec.org/f/plu234.html <https://ideas.repec.org/>
|
I understand why the horizon is set by default, but if I explicitly try to change it, it should not just ignore it without any comment. And if that behavior is intended, why does predict checks for optimize_for_horizon but not _predict ? |
Hello.
A very simple question.
Once I've estimated an autoregression as a wrapped linear model, say as
model_lr = get_sklearn_wrapper(LinearRegression, lags=10)
mlr = model_lr.fit(X[:-10], y[:-10])
how do I get the autoregression coefficients?
I cannot find the equivalent of scikit .coef_ , like
mlr = LinearRegression().fit(X, y)
mlr.coef_
Thanks.
The text was updated successfully, but these errors were encountered: