Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Evaluation, Diagnostics and MLFlow Registry #1026

Open
wants to merge 71 commits into
base: main
Choose a base branch
from

Conversation

louismagowan
Copy link
Contributor

@louismagowan louismagowan commented Sep 12, 2024

Description

This PR is a recreation of this one which I was told to close and restart due to git issues on the pymc-marketing repo.

It could be nice to have some standardised model evaluation and diagnostic functions added to pymc-marketing. Ideally they'd be formulated in a way that makes them easy to log in MLFLow later on.
It would also be cool to build on top of the MLFlow module, to create a custom mlflow.pyfunc.PythonModel class to allow users to be able to register their models in the MLFlow registry. This would allow people to serve and maintain their MMMs more easily, and could help with MMM refreshes too.

Standard model metrics could include:

  • Bayesian R2
  • MAPE, RMSE, MAE
  • Normalised RMSE and MAE (to allow comparisons across datasets and methodologies - in particular with Robyn models, who use NRMSE as one of their 2 key metrics)
    etc.
  • Wrapper functions to calculate those across the entire distributions of posteriors, not just against the means (so we can have HDI lower, upper for each metric too)

Diagnostic metrics could include:

  • Step size, divergences
  • LOOCV metrics e.g.

Some additional plots (also useful for diagnosing models):

  • Plot prior vs posterior distributions, see how data is shifting things
  • Plot HDI forests for a given var, along with r-hat value

Model Registry / Additional Logging Code:

A wrapper for an MMM model to make it conform to the MLFlow api, enabling registering and easier deployment
Also an option to load models from the registry as well / or download the idata from MLflow too.
I'll open this as a Draft for now - since I'll need advice on how where best to put this code, as well as overall design etc.

Related Issue

Checklist

Modules affected

  • MMM
  • CLV

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

📚 Documentation preview 📚: https://pymc-marketing--1026.org.readthedocs.build/en/1026/

@juanitorduz
Copy link
Collaborator

@louismagowan, thanks for opening this one up again! Some tests (and the pre-commit) are failing. Do you need some help?

@louismagowan
Copy link
Contributor Author

@juanitorduz - thanks but I think I'm alright for now (or at least I think I can try and fix it myself later). I think I probably need to recreate my conda env (as I think it's a bit old now).
Apologies for being slow on this, I have a deadline at work that I've been rushing for - so really not had much time to work on this recently.

@wd60622 - no need to review again yet, I'm still working on addressing your earlier feedback (I'll ping myself when I think I'm ready again)

Thanks!

@louismagowan louismagowan marked this pull request as draft October 7, 2024 12:20
…metrics as distributions first, before then taking summary stats over the distributions of the metrics
Copy link

codecov bot commented Oct 7, 2024

Codecov Report

Attention: Patch coverage is 46.74556% with 90 lines in your changes missing coverage. Please review.

Project coverage is 93.69%. Comparing base (1464354) to head (1fc069d).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pymc_marketing/mlflow.py 47.91% 50 Missing ⚠️
pymc_marketing/mmm/mmm.py 4.76% 40 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1026      +/-   ##
==========================================
- Coverage   95.64%   93.69%   -1.96%     
==========================================
  Files          39       40       +1     
  Lines        4089     4250     +161     
==========================================
+ Hits         3911     3982      +71     
- Misses        178      268      +90     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@louismagowan
Copy link
Contributor Author

louismagowan commented Nov 25, 2024

In response to this <- good catch! I think Arviz does some sort of lazy loading perhaps - so I added a block that forces my idata to be loaded into memory, before deleting

It seems to have fixed the issue for me @wd60622 ☺️

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@louismagowan louismagowan marked this pull request as ready for review November 26, 2024 08:05
@louismagowan
Copy link
Contributor Author

Any idea why those tests are failing @wd60622 ? They aren't in the scope of my PR, I don't think 🤔

@wd60622
Copy link
Contributor

wd60622 commented Nov 26, 2024

Any idea why those tests are failing @wd60622 ? They aren't in the scope of my PR, I don't think 🤔

The tests are with fresh installs of libraries. Pydantic's latest version has changed error messages.

I pushed a changed to Carlo's posterior pr. Feel free to ignore. Out of scope

Copy link
Contributor

@wd60622 wd60622 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking very close to the finish line. Thanks for making the changes!

pymc_marketing/mlflow.py Outdated Show resolved Hide resolved
pymc_marketing/mlflow.py Outdated Show resolved Hide resolved
pymc_marketing/mlflow.py Show resolved Hide resolved
pymc_marketing/mlflow.py Show resolved Hide resolved
pymc_marketing/mlflow.py Outdated Show resolved Hide resolved
pymc_marketing/mlflow.py Outdated Show resolved Hide resolved
pymc_marketing/mlflow.py Outdated Show resolved Hide resolved
pymc_marketing/mlflow.py Outdated Show resolved Hide resolved
Copy link
Contributor

@wd60622 wd60622 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to keep support for all pymc models when not only for mmm

mlflow.register_model(model_uri, registered_model_name)


def load_mmmm(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many Ms

Comment on lines +934 to +935
We don't need to call this directly again in MMM.fit patch, since that function calls
pm.sample() internally anyway.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sample never calls compute_log_likelihood to my understand. CC @juanitorduz
Nor would we want it to

Do we need to adjust any logic to fit accordingly?

Comment on lines +975 to +976
if log_loocv and "log_likelihood" not in idata.groups():
pm.compute_log_likelihood(idata=idata, model=model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont automatically call with sample. It can be high computation


mlflow.log_params(
idata.attrs,
)
mlflow.log_param("nuts_sampler", kwargs.get("nuts_sampler", "pymc"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this move. This is helpful for all pymc models

Comment on lines -535 to -536
mlflow.log_param("pymc_marketing_version", __version__)
mlflow.log_param("pymc_version", pm.__version__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep for all pymc models

@@ -577,11 +999,41 @@ def new_fit(*args, **kwargs):
"saturation_name",
json.loads(idata.attrs["saturation"])["lookup_name"],
)

# Align with the default values in pymc.sample
tune = kwargs.get("tune", 1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoud be logged because of patched sample

if log_metadata_info:
log_metadata(model=model, idata=idata)

# mmm.fit() calls pm.sample() internally, so we need to make sure log-likelihood is only added once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong to my understanding

@@ -412,14 +463,359 @@ def log_inference_data(
os.remove(save_file)


def log_summary_metrics(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename to include mmm and something with evaluation

Comment on lines +418 to +421
Sets
----
idata.log_likelihood : az.InferenceData
The InferenceData object with the log_likelihood group, unless it already exists.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outdated

Comment on lines +177 to +178
.. code-block:: python
import pandas as pd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think a space is needed here

"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively, load a model that has been autologged to MLflow via `pymc_marketing.mlflow.autolog(log_mmm=True)`, from the [PyMC-Marketing MLflow module](https://github.com/pymc-labs/pymc-marketing/blob/main/pymc_marketing/mlflow.py)."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"autologged" is wrong based on the logic now

@cetagostini
Copy link
Contributor

Hey @louismagowan, great PR. I really like your work, but I have something to ask haha, could we have a notebook that is something like "Evaluating model" where we show all the new metrics and how we can import and use them. It would be great if each addition had its own notebook!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the repeated patterns occur in logging functions, Could we encapsulate and re-use?

def log_and_remove_artifact(path: str | Path):
    mlflow.log_artifact(str(path))
    os.remove(path)

Then we replace on places like log_arviz_summary, log_model_graph, and log_inference_data.

@@ -528,12 +928,29 @@ def autolog(
"""
arviz_summary_kwargs = arviz_summary_kwargs or {}

def patch_compute_log_likelihood(compute_log_likelihood):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type hints.

log_inference_data(idata, save_file="idata.nc")

return idata

return new_fit

def patch_mmm_sample_posterior_predictive(sample_posterior_predictive):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type hints

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deployment docs Improvements or additions to documentation mlflow MMM tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Support for Model Registering to MLFlow Module Evaluation and Model Diagnostic Functions
4 participants