Model Evaluation, Diagnostics and MLFlow Registry #1026

louismagowan · 2024-09-12T20:00:03Z

Description

This PR is a recreation of this one which I was told to close and restart due to git issues on the pymc-marketing repo.

It could be nice to have some standardised model evaluation and diagnostic functions added to pymc-marketing. Ideally they'd be formulated in a way that makes them easy to log in MLFLow later on.
It would also be cool to build on top of the MLFlow module, to create a custom mlflow.pyfunc.PythonModel class to allow users to be able to register their models in the MLFlow registry. This would allow people to serve and maintain their MMMs more easily, and could help with MMM refreshes too.

Standard model metrics could include:

Bayesian R2
MAPE, RMSE, MAE
Normalised RMSE and MAE (to allow comparisons across datasets and methodologies - in particular with Robyn models, who use NRMSE as one of their 2 key metrics)
etc.
Wrapper functions to calculate those across the entire distributions of posteriors, not just against the means (so we can have HDI lower, upper for each metric too)

Diagnostic metrics could include:

Step size, divergences
LOOCV metrics e.g.

Some additional plots (also useful for diagnosing models):

Plot prior vs posterior distributions, see how data is shifting things
Plot HDI forests for a given var, along with r-hat value

Model Registry / Additional Logging Code:

A wrapper for an MMM model to make it conform to the MLFlow api, enabling registering and easier deployment
Also an option to load models from the registry as well / or download the idata from MLflow too.
I'll open this as a Draft for now - since I'll need advice on how where best to put this code, as well as overall design etc.

Related Issue

Closes Evaluation and Model Diagnostic Functions #911
Closes Add Support for Model Registering to MLFlow Module #973
Related to Expose serialized format #891 MLflow deployment example #901

Checklist

Checked that the pre-commit linting/style checks pass
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks)
If you are a pro: each commit corresponds to a relevant logical change

Modules affected

MMM
CLV

Type of change

📚 Documentation preview 📚: https://pymc-marketing--1026.org.readthedocs.build/en/1026/

…nostics

…e compliant with standards

…nclude in calc_metric

…conform with API guidelines and enable model registering

…ing an MMMEvaluator class and moving functions there as methods. Breaking out pm.compute_log_likelihood into its own method. Moving over diagnostic functions from MLFlow into this module

…del builder

…V), adding remaining methods

juanitorduz · 2024-09-15T20:39:47Z

@louismagowan, thanks for opening this one up again! Some tests (and the pre-commit) are failing. Do you need some help?

louismagowan · 2024-09-16T10:28:58Z

@juanitorduz - thanks but I think I'm alright for now (or at least I think I can try and fix it myself later). I think I probably need to recreate my conda env (as I think it's a bit old now).
Apologies for being slow on this, I have a deadline at work that I've been rushing for - so really not had much time to work on this recently.

@wd60622 - no need to review again yet, I'm still working on addressing your earlier feedback (I'll ping myself when I think I'm ready again)

Thanks!

…Adding log_model to autolog, along with log_loocv()

…reation

…olog test

…metrics as distributions first, before then taking summary stats over the distributions of the metrics

codecov · 2024-10-07T15:10:20Z

Codecov Report

Attention: Patch coverage is 46.74556% with 90 lines in your changes missing coverage. Please review.

Project coverage is 93.69%. Comparing base (1464354) to head (1fc069d).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
pymc_marketing/mlflow.py	47.91%	50 Missing ⚠️
pymc_marketing/mmm/mmm.py	4.76%	40 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1026      +/-   ##
==========================================
- Coverage   95.64%   93.69%   -1.96%     
==========================================
  Files          39       40       +1     
  Lines        4089     4250     +161     
==========================================
+ Hits         3911     3982      +71     
- Misses        178      268      +90

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Co-authored-by: Will Dean <57733339+wd60622@users.noreply.github.com>

…ng dst_path an arg in it

louismagowan · 2024-11-25T20:04:57Z

In response to this <- good catch! I think Arviz does some sort of lazy loading perhaps - so I added a block that forces my idata to be loaded into memory, before deleting

It seems to have fixed the issue for me @wd60622 ☺️

…deletion of the idata object

…tor notebook

…ogic

review-notebook-app · 2024-11-25T22:01:08Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

…h, add separate patch to autolog

…_model was dropped from autolog

louismagowan · 2024-11-26T08:44:22Z

Any idea why those tests are failing @wd60622 ? They aren't in the scope of my PR, I don't think 🤔

wd60622 · 2024-11-26T09:23:14Z

Any idea why those tests are failing @wd60622 ? They aren't in the scope of my PR, I don't think 🤔

The tests are with fresh installs of libraries. Pydantic's latest version has changed error messages.

I pushed a changed to Carlo's posterior pr. Feel free to ignore. Out of scope

wd60622

Looking very close to the finish line. Thanks for making the changes!

pymc_marketing/mlflow.py

…name

wd60622

Would like to keep support for all pymc models when not only for mmm

wd60622 · 2024-12-02T01:32:20Z

pymc_marketing/mlflow.py

+        mlflow.register_model(model_uri, registered_model_name)
+
+
+def load_mmmm(


Too many Ms

wd60622 · 2024-12-02T01:35:05Z

pymc_marketing/mlflow.py

+        We don't need to call this directly again in MMM.fit patch, since that function calls
+        pm.sample() internally anyway.


sample never calls compute_log_likelihood to my understand. CC @juanitorduz
Nor would we want it to

Do we need to adjust any logic to fit accordingly?

wd60622 · 2024-12-02T01:37:33Z

pymc_marketing/mlflow.py

+            if log_loocv and "log_likelihood" not in idata.groups():
+                pm.compute_log_likelihood(idata=idata, model=model)


Dont automatically call with sample. It can be high computation

wd60622 · 2024-12-02T01:38:11Z

pymc_marketing/mlflow.py


            mlflow.log_params(
                idata.attrs,
            )
+            mlflow.log_param("nuts_sampler", kwargs.get("nuts_sampler", "pymc"))


Why did this move. This is helpful for all pymc models

wd60622 · 2024-12-02T01:39:21Z

pymc_marketing/mlflow.py

-            mlflow.log_param("pymc_marketing_version", __version__)
-            mlflow.log_param("pymc_version", pm.__version__)


Can we keep for all pymc models

wd60622 · 2024-12-02T01:40:40Z

pymc_marketing/mlflow.py

@@ -577,11 +999,41 @@ def new_fit(*args, **kwargs):
                "saturation_name",
                json.loads(idata.attrs["saturation"])["lookup_name"],
            )
+
+            # Align with the default values in pymc.sample
+            tune = kwargs.get("tune", 1000)


Shoud be logged because of patched sample

wd60622 · 2024-12-02T01:43:33Z

pymc_marketing/mlflow.py

+            if log_metadata_info:
+                log_metadata(model=model, idata=idata)
+
+            # mmm.fit() calls pm.sample() internally, so we need to make sure log-likelihood is only added once


Wrong to my understanding

wd60622 · 2024-12-02T01:45:14Z

pymc_marketing/mlflow.py

@@ -412,14 +463,359 @@ def log_inference_data(
    os.remove(save_file)


+def log_summary_metrics(


Can we rename to include mmm and something with evaluation

wd60622 · 2024-12-02T01:48:09Z

pymc_marketing/mlflow.py

+    Sets
+    ----
+    idata.log_likelihood : az.InferenceData
+        The InferenceData object with the log_likelihood group, unless it already exists.


wd60622 · 2024-12-02T01:55:50Z

pymc_marketing/mmm/evaluation.py

+    .. code-block:: python
+        import pandas as pd


Think a space is needed here

wd60622 · 2024-12-02T01:59:15Z

docs/source/notebooks/mmm/mmm_budget_allocation_example.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Alternatively, load a model that has been autologged to MLflow via `pymc_marketing.mlflow.autolog(log_mmm=True)`, from the [PyMC-Marketing MLflow module](https://github.com/pymc-labs/pymc-marketing/blob/main/pymc_marketing/mlflow.py)."


"autologged" is wrong based on the logic now

cetagostini · 2024-12-02T17:54:56Z

Hey @louismagowan, great PR. I really like your work, but I have something to ask haha, could we have a notebook that is something like "Evaluating model" where we show all the new metrics and how we can import and use them. It would be great if each addition had its own notebook!

cetagostini · 2024-12-02T18:12:46Z

pymc_marketing/mlflow.py

I see the repeated patterns occur in logging functions, Could we encapsulate and re-use?

def log_and_remove_artifact(path: str | Path): mlflow.log_artifact(str(path)) os.remove(path)

Then we replace on places like log_arviz_summary, log_model_graph, and log_inference_data.

cetagostini · 2024-12-02T18:15:52Z

pymc_marketing/mlflow.py

@@ -528,12 +928,29 @@ def autolog(
    """
    arviz_summary_kwargs = arviz_summary_kwargs or {}

+    def patch_compute_log_likelihood(compute_log_likelihood):


Type hints.

cetagostini · 2024-12-02T18:16:19Z

pymc_marketing/mlflow.py

            log_inference_data(idata, save_file="idata.nc")

            return idata

        return new_fit

+    def patch_mmm_sample_posterior_predictive(sample_posterior_predictive):


louismagowan added 14 commits September 12, 2024 21:55

feat(evaluation.py): Adding 1st draft of code for evaluation and diag…

66b081c

…nostics

chore(uml): Delete doc image to avoid merge conflict

aa3ad04

temp: Reopen PR and remove loguru

9e05169

feat(evaluate): Delete unnecessary try except. Rework docstrings to b…

120d0cd

…e compliant with standards

feat(evaluation): Giving user the option to select which metrics to i…

7eaf03e

…nclude in calc_metric

chore: Adding a comment

89b720f

feat(mlflow): Adding 1st draft of MLFlow wrapper for MMM, to make it …

c842305

…conform with API guidelines and enable model registering

chore(mlflow): Adding example to MMMRegistrar docstring

751105b

chore(mlflow): Fix typo in docstring

ac1b8b0

feat(evaluation): Unfinished work. Forcing commit to save work. Creat…

235b434

…ing an MMMEvaluator class and moving functions there as methods. Breaking out pm.compute_log_likelihood into its own method. Moving over diagnostic functions from MLFlow into this module

feat(model_builder): Adding pm.compute_log_likelihood as method to mo…

203b09b

…del builder

feat(mlflow): Adding code for computing LOOCV metrics to MLFlow logging

ef4c05e

temp(evaluation): Deleting code that was moved to mlflow module (LOOC…

e05f04e

…V), adding remaining methods

fix(mlflow): Addressing docstring feedback from Will

ec2d836

louismagowan mentioned this pull request Sep 12, 2024

Model Evaluation, Diagnostics and Registering #912

Closed

13 tasks

juanitorduz added the mlflow label Sep 13, 2024

juanitorduz requested a review from wd60622 September 14, 2024 12:19

louismagowan added 6 commits October 4, 2024 09:50

Merge branch 'pymc-labs:main' into model_evaluation2

fe795cb

feat(mlflow): Tidying MLFLow MMM wrapper, adding log_model function. …

f805cbe

…Adding log_model to autolog, along with log_loocv()

feat(mlflow): Adding load_model function

6e4b508

temp: Comment out evaluation, to remove test failures there

9e76dcc

fix(mlflow): Updating autolog to work with LOOCV and log-likelihood c…

fbcbe45

…reation

fix(tests): Updating mlflow test to include model artifact in mmm_aut…

c5aac57

…olog test

louismagowan marked this pull request as draft October 7, 2024 12:20

louismagowan added 2 commits October 7, 2024 16:43

feat(evaluation): Adding reworked model metric code - calculates the …

e822a43

…metrics as distributions first, before then taking summary stats over the distributions of the metrics

Merge branch 'main' into model_evaluation2

6fba186

fix(mlflow): Fixing LOOCV function to use model on context

128bd24

louismagowan and others added 6 commits November 18, 2024 15:15

Merge branch 'main' into model_evaluation2

b3fc397

chore: Will's nit

aee03a1

Co-authored-by: Will Dean <57733339+wd60622@users.noreply.github.com>

fix: Change scope in eval to function

b73b72c

chore(mlflow): Delete unnecessary indent

40a53eb

Merge branch 'main' into model_evaluation2

1635ff6

fix(mlflow): Correcting arg docstring for load_model, as well as maki…

655b7cf

…ng dst_path an arg in it

louismagowan added 3 commits November 25, 2024 21:09

fix(mlflow): Force idata groups to be loaded into memory, before the …

56fa53a

…deletion of the idata object

feat(docs): Add example of loading model from MLflow in budget alloca…

c7e8e37

…tor notebook

fix(mlflow): Patching pm.compute_log_likelihood to untangle logging l…

6ede5aa

…ogic

louismagowan added 4 commits November 25, 2024 23:04

chore(docs): Removing import statement junk from notebook doc

aa7a254

refactor(mlflow): Break out sample_posterior_predictive from fit patc…

6a13480

…h, add separate patch to autolog

fix(docs): Commenting out mlflow load run

5c923df

fix(test_mlflow): Removing model object from auto_log test, since log…

01f86d9

…_model was dropped from autolog

louismagowan marked this pull request as ready for review November 26, 2024 08:05

louismagowan added 2 commits November 27, 2024 08:08

Merge branch 'main' into model_evaluation2

a478d2b

Merge branch 'main' into model_evaluation2

3f9b22d

wd60622 requested review from juanitorduz and wd60622 November 28, 2024 13:35

wd60622 requested changes Nov 28, 2024

View reviewed changes

louismagowan added 2 commits December 1, 2024 21:16

chore(mlflow): Docstring and naming improvements

821b487

chore(notebooks): Update docs to load MMM using the changed function …

1fc069d

…name

louismagowan requested a review from wd60622 December 1, 2024 20:50

wd60622 requested changes Dec 2, 2024

View reviewed changes

wd60622 reviewed Dec 2, 2024

View reviewed changes

cetagostini reviewed Dec 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Evaluation, Diagnostics and MLFlow Registry #1026

Model Evaluation, Diagnostics and MLFlow Registry #1026

louismagowan commented Sep 12, 2024 •

edited

Loading

juanitorduz commented Sep 15, 2024

louismagowan commented Sep 16, 2024

codecov bot commented Oct 7, 2024 •

edited

Loading

louismagowan commented Nov 25, 2024 •

edited

Loading

review-notebook-app bot commented Nov 25, 2024

louismagowan commented Nov 26, 2024

wd60622 commented Nov 26, 2024

wd60622 left a comment

wd60622 left a comment

wd60622 Dec 2, 2024

wd60622 Dec 2, 2024

wd60622 Dec 2, 2024

wd60622 Dec 2, 2024

wd60622 Dec 2, 2024

wd60622 Dec 2, 2024

wd60622 Dec 2, 2024

wd60622 Dec 2, 2024

wd60622 Dec 2, 2024

wd60622 Dec 2, 2024

wd60622 Dec 2, 2024

cetagostini commented Dec 2, 2024

cetagostini Dec 2, 2024

cetagostini Dec 2, 2024

cetagostini Dec 2, 2024

		mlflow.register_model(model_uri, registered_model_name)


		def load_mmmm(

		We don't need to call this directly again in MMM.fit patch, since that function calls
		pm.sample() internally anyway.

		if log_loocv and "log_likelihood" not in idata.groups():
		pm.compute_log_likelihood(idata=idata, model=model)

		mlflow.log_param("pymc_marketing_version", __version__)
		mlflow.log_param("pymc_version", pm.__version__)

		@@ -412,14 +463,359 @@ def log_inference_data(
		os.remove(save_file)


		def log_summary_metrics(

Model Evaluation, Diagnostics and MLFlow Registry #1026

Are you sure you want to change the base?

Model Evaluation, Diagnostics and MLFlow Registry #1026

Conversation

louismagowan commented Sep 12, 2024 • edited Loading

Description

Related Issue

Checklist

Modules affected

Type of change

juanitorduz commented Sep 15, 2024

louismagowan commented Sep 16, 2024

codecov bot commented Oct 7, 2024 • edited Loading

Codecov Report

louismagowan commented Nov 25, 2024 • edited Loading

review-notebook-app bot commented Nov 25, 2024

louismagowan commented Nov 26, 2024

wd60622 commented Nov 26, 2024

wd60622 left a comment

Choose a reason for hiding this comment

wd60622 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cetagostini commented Dec 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

louismagowan commented Sep 12, 2024 •

edited

Loading

codecov bot commented Oct 7, 2024 •

edited

Loading

louismagowan commented Nov 25, 2024 •

edited

Loading