-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Project Meeting Minutes
Held on the 1st Friday of each month, 10Eastern/15GMT/16Europe
No meeting
- Chris F., Thomas W., Ricardo V., Oriol A.
- Google Summer of Code 2024
- Submitted projects
- PyData London 2024 Hackathon
- Roadmap
- Development Updates
- Chris, Thomas, Christian, Alex, Fernando
- CZI grant update
- Applications due Dec. 5
- November Docathon (Nov. 17)
- Development Updates
- Remove stuff from Root namespace: https://github.com/pymc-devs/pymc/pull/6973
- PyMCon Updates
No meeting
- POSE grant update
- Development Updates
- Do/Observe (we can discuss as much as you want)
- NumFOCUS resubmission
- Deadline Sept 8
- GSoC Wrapup
- Should we do an internal presentation?
- PyMCon Updates
- POSE grant update
- Development Updates
- Numfocus grants?
- GSoC Project Home Stretch
- PyMCon Updates
- Christian, Ricardo, Thomas, Oriol and Larry
- NSF grant
- Grant for something like Numfocus, but we already have it
- Community building events
- Partnerships with companies and other OSS (scipy/ stan/ arviz)
- Can’t be used for paying developers
- But maybe hackathons / sprints
- Documentation writing / maintenance
- Community manager / social media person
- Production / Commercial applications
- Running on cloud platforms
- Development Updates
- 5.6.0 is out!🎉
- multivariate imputation (includes timeseries)
- do/observe operator in pymc-experimental
- Hopefully merged in 1-2 months
- Gotta fix some other things and make sure the base functionality FunctionGraph is robust and flexible enough
- PyTensor blockwise almost ready → Vectorization of Pytensor grahs and arbitrarily batched multivariate distributions
- GSoC Project Updates
- Numfocus grants for the rejected GSoC projects
- Userbase is growing as seen by PyPI downloads, GH Stars and questions on Discourse
- Interest in JAX
- Should integrate more with BlackJax inference algorithms
- PyMCon Updates
- Big thanks for Purna for generating so much excitement
- August event will be a CFP workshop hosted by Ravin
- September likely causal inference, hosted by Thomas and Ben (who need to confirm)
- Some discussion of an end-of-year review of 2023 PyMCon events
- Development updates
- GSoC Projects
- Support for unselected projects
- PyMCon is going strong
- May event is being planned
- Argentina Bayesian Meetup
- Around August 1st through 3rd in Buenos Aires
- Bayesian Conference in Santiago Del Estero 4th and 5th
- Osvaldo, Tomas, Ravin, Oriol, Alex confirmed attendance. Anyone welcome
- “Friction points” when using PyMC (e.g., installation), particularly in production environments
- PyMCon reviews
- GSoC Projects
- Review
- Slot requests
- Development updates
- Release of PYMC
- Add a new sampler
- https://github.com/symeneses/SBM/blob/main/notebooks/template.ipynb
- Need to market the nutpie sampler
- All contained within pm sample
- PyTensor fixes jax implementation of scan
- Add computational speedup dock
- PyMCon Web Series
- Ricardo
- Advertise GSOC during PyMCon
- Tracebackend
- It happened!
- Replaced multitrace
- PyTensor
- Encourage
- No Pytensor users
- Do like the Aesara devs are in twitter
- Here’s a new contributor
- We could get more exposure
- Celebrating new PRs
- New core members
- More publicity to the development efforts
- Ask Fernando about marketing
- Communications Repo for social media
- Coordinating and a structure has to be setup
- Create repo
- Ravin, Michael, Virgile, Fernando, Reshama, Oriol, Larry
- Move meeting to PyMC Dev
- Delete pymc-devs@gmail.com
- Tracebackend
- McBackend may replace
- Looking for reviewers trace backend
- Colab on PyMC v4; need to move to v5
- GSoC
- GSoC 2023 projects · pymc-devs/pymc Wiki · GitHub
- New contributor activity on GitHub: reviewers needed
- Speed up merge time for first time contributors
- Blog on PyMC: Chris F 2020 keynote
- PyMCon Web Series
- Rolling CFP
- https://github.com/pymc-devs/pymcon_web_series/tree/main/docs/events
- First event is live. Go check it out https://discourse.pymc.io/t/pymcon-web-series-the-power-of-bayes-in-industry-your-business-model-is-your-data-generating-process-feb-9-2023/11264
- Discourse announcements
- Ravin, Adrian, Reshma, Thomas, Larry, Austin, Ricardo, Danh, Chris Notes Taken By
- Chris
- PyMCon
- First set of mentors and mentees paired!
- Events are being developed
- PyTensor
- Development Updates
- Nutpie
- PyMCOn
- CFP is underway; first events have dates
- Pairing entries with mentors has begun
- Call next week for mentors
- PyTensor
- Numba
- Missing Ops
- Optimization
- random number generators
- Perhaps have dev branch (“staging”) where nutpie is the default sampler, so that we can find the issues easily
- Numba
- Nutpie
- would be nice to have nutpie interoperate with Python samplers (https://github.com/pymc-devs/nutpie/issues/32)
- one bottleneck is writing to trace backend
- hopefully McBackend will help with this(?)
- use of float32 should be relatively straightforward; requires some knowledge of Rust
- include benchmarks in PyMC paper
Oriol AP, Thomas W, Michael O, Rob Z, Christian L, Rowan S, Reshama S, Ravin K, Larry S, Ricardo V, Alex A
Rowan
- PyMCon
- We got 13 submissions!
- We need mentors
- Next Steps
- Review proposals
- Determine how many mentors we have
- Pair the two
- Start working with each candidate
- Run our first event Late Jan/Early Feb
- In next couple of weeks we’ll review, submit feedback, and pick the accepted talks 1.
- PyTensor
- What should we do with AePPL?
- We can make it a git submodule but this isnt great
- Advantage is we don’t need to maintain a second codebase
- Disadvantage is working with git submodules is its confusing for first time maintainers
- Agreed single commit for user and maintainer reasons
- Ravin to talk Junpeng
- Development Updates
- Vote on Ricardos poll for the next session
- Switch to pytensor on dependency, Michael can do hfind and replace
- NumFOCUS CDR interim report
-
PyMCon
- 13 submissions!
- 7 people volunteered to review
- Try to make decisions by Wednesday, Dec 7
- Form is still open - Christian will reach out to late submitters and explain timing
- Eventually move to rolling submissions
- Need mentors i. Hopefully 4-5 mentors to work for next quarter ii. Mentors to encourage creativity from speakers iii. For this cycle - Mentorship role in development
- Next steps
- Please review!
- Please reach out if you want to mentor!
- In next couple of weeks we’ll review, submit feedback, and pick the accepted talks
- Sponsorship
- Conference will be helpful for sponsors
- Advertise Sponorship - Fernando
- Reshama & Thomas - Sponsorship Prospectus
- 13 submissions!
-
Aesara Fork / PyTensor
- What should we do with AePPL?
- As a submodule
- Pros: independent git histories, no maintenance overhead of making releases, technically most elegant solution
- Cons: confusing to clone and pull repos that contain git submods, high barrier for 1st time contributors
- As a submodule
- Agreed on single commit for user and maintainer reasons
- What should we do with AePPL?
-
Development updates
- Vote on Ricardo’s poll for the next session
- Switch to pytensor on dependency
-
NumFOCUS CDR interim report
- Contributor diversity and retention
- Report is out
Christopher Fonnesbeck, Bill Engels, Christian Luhmann, Fernando Irarrazaval, Junpeng Lao, Larry, Michael Osthege, Ravin Kumar, Ricardo Vieira, Thomas Wiecki, Virgile Andreani, Rowan Schaefer
Rowan
- Aesara Fork
- Communication
- Project organization
- Development Updates
- Getting all devs on board with PyMC-experimental (please read https://smartinference.slack.com/archives/C6AKTNSBS/p1664180846362399)
- Aesara Fork: Discussion
- Ergonomic workflow + faster PRs to benefit PyMC codebase and users - Taking on development responsibility for this: interest, hours, expertise? - Adrian, Ricardo familiar with Aesara codebase - Ricardo: Suggested 1 year outlook and re-evaluation of outcomes a. Communication: - 1. Internal announcement - 2. Named vote (Monday?) - 3. External announcement b. Project organization - Organization of the fork within the project - Michael: Opened discussion on decisions re: apps and dependencies->submodules?
- Development Updates - Formal processes can be revised to get core developers more involved in decision making
- PyMC Experimental
- Use as crazy ideas repository, or new features that will end up in pymc?
- Can keep solid implementations around without slowing down development
- Consider developing standards for documentation
- PyMCon
- Per Ravin: Review proposals for PyMCon and share social media posts!
- PyMC Development Updates
- Blockers?
- Paper
- PyMCon Web Series
- Everyone can view current progress here https://github.com/pymc-devs/pymcon_2022/
- Please take a look and see if you can help out
- Big shoutout to Purna, Christian and Fernando
- Each have made lots progress on the Website, Contributor Experience, and Marketing in the last month
- Everyone can view current progress here https://github.com/pymc-devs/pymcon_2022/
- NumFocus Summit
- Norway Summit
- Start thinking about a Spring summit in Norway
- Good spot to talk about samplers and backends
- Google Summer of Code
- PyMC Development Updates
- Summer of Code 2022
- Status updates
- PyMCon 2022
- PyMC Trademarking
No meeting
-
PyMC Development
- What are reasons for bumping the major version moving forward? Proposals below 👇
- Core API breaks (e.g. Distribution kwargs)
- Changes in seeding
- Changes in tuning
- …?
- When NOT to bump the major version?
- Breaks that were announced by a DeprecationWarning for >=6 months
- Breaks in submodules (.ode, .timeseries, .gp) that we consider part of the non-core API
- …?
- What are reasons for bumping the major version moving forward? Proposals below 👇
-
Can we get rid of manual Release Notes?
-
pymc-examples status
-
Size vs shape… :D
- 👉 keeping the status quo
-
Summer of Code 2022
-
PyMCon 2022
-
PyMC Developers Summit (has the ice melted already?)
-
Social media coordination
- PyMC Development Updates
- v4 release
- Should we remove size from the user-facing API? https://github.com/pymc-devs/pymc/pull/5746 (size is only in the Distribution.dist() signature, not the Distribution.init signature.)
- pymc-examples status
- Summer of Code 2022
- PyMCon 2022
- Web series
- PyMC Developers Summit
- Social media coordination
- Are we opening our meetings to outsiders?
- PyMC Development Updates
- Beta release
- No more betas, let’s move to release
- v4 release
- Just GaussianRandomWalk missing? → PR (https://github.com/pymc-devs/pymc/pull/5298) is really close to finish line. Just get it done.
- Deprecate EllipticalSlice step method? → Yeaah
- Installation issues: Upon
conda create -n pmv4 "pymc==4.0.0”
users should get environments that DON’T print warnings upon “import pymc”. At least on Windows this is not yet the case. Needs to be confirmed/investigated for Ubuntu+MacOS. See https://github.com/conda-forge/aesara-feedstock/issues/54
- PyMC-nightly broken?
- I’ve just created an issue
- PyMC-Experimental
- Move MLDA → Ask original contributors
- Are we releasing already?
- How are docs / tests / going on?
- Expecting users to be on beta?
- Review of pymc-examples
- Beta release
- Summer of Code 2022
- Candidate students
- Quick summary of the current CD/CI pipeline and what’s been recently optimized
- All new PRs should be created as a draft until they are ready for review and merge
- PyMC Development Updates
- Beta release
- v4 release
- Missing basic time series
- Missing basic transition guide + changelog (see https://github.com/pymc-devs/pymc/projects/3)
- pymc-examples has a progress tracker: https://github.com/pymc-devs/pymc-examples/projects/1
- GPU testing infrastructure
- CZI grant funding
- What do we need? 👉 Tracking Issue to figure how where to buy/configure/run a GitHub Actions GPU runner.
- Season of Docs 2022
- Are we doing it?
- Summer of Code 2022
- Any thoughts on the questions posted in #PyMCon channel?
- How many folks want a conference?
- How many folks will help plan a conference?
- Ravin will read the notes async
- How many folks want a conference?
- If there’s time: BaseTrace/MultiTrace/McBackend mini-deep dive (it’s not that hard)
- PyMC v4 Codebase Deep-dive
-
PyMC v4 Release
- Think we are in pretty good shape, well
- Not too much as happened since first beta
- Release is amazing! Great job everyone
- Tweet has been announced with transition guide
- We could be good with explaining new features and make that full announcement with 4.0 release
- The technical release notes which have everything, we can definitely tweet more about it
- Something observed on twitter is whenever we tweet about version 4 or Aesara, everyone is really curious about JAX. Not being interested in writing about JAX
- JAX is one of the option but not the only optiona, missing elevator pitch on Aesara
- For the whole graph manipulation part, have not need a good quick start tutorial
- What do people care about? Answer for users is they’re more excited about JAX
- We’ve been too shy on how cool Aesara is, and by advertising more
- Its more unique, we went down went dynamic graph route, with control etc
- Static graphs has their place, here’s one that does it, and its even backend agnostic
- Doesn’t exist elsewhere, and a really mature tested codebase
- A trend that is that the tide is going back before tensorflow and pytorch
- Pytorch is overengineered
- Now people like JAX, because we associate with JAX we write that wave together
- Get DL crowd and more users flowing in through that
- Maybe need a more typical example for Aesara
- But it can still do a lot of that stuff, one of the popular NN architectures, or transformer
- Aesara is independent, so strictly speaking we still is close enough to support it
- Examples of Numba vs JAX PyMC, show where Numba is better than pymc
- Use c backend, great argument in favor of Aesara, there is not one fastest
- Good to spread marketing across multiple backend ideas
- How far along is Numba backend?
- The point of numba backend is to try and get to as good as C backend,
- C backend is very intertwined with C backend
- Idea with Numba is to get same base performance, but you’re writing python
- NUMBA is very concerned with no getting backend regression
- JAX Used to be very flexible, but they started making it more and more fixed shape only
- We do a lot of put all variables together, flatten them out, things that don’t work with JAX
- Now good news is Brandon is doing a big push with fixed shape tensor, once you have them with fixed variable sizes then you can compile various models to JAX
- Seems very promising to have JAX more flexible for things to compile to JAX
- More flexibility on Aesara side, to build things JAX can handle, with fixed tensor
- Mindful of discussion of features that are interested for 5 percent
- there’s also 95% of userbase where users don’t care about backend
- More concerned about understanding more models and domain problem where PyMC has a big userbase and JAX not so much
- ‘Theano died” is a common sentiment
- Switch messaging to Theano evovlled
- Emphasis flexibility
- Think we are in pretty good shape, well
-
What is missing for beta 2?
- PR for symbolic distributions, the way we want to implement time series
- VI at the verge of being mergeable because of some AEPPL dependency took it out of Beta 1, three test failures
- Existing GP stuff is all passing, so were all solid
- Examples are still not updated
-
Push VI into next release
- We’ll release BETA 2, and push time series and VI into Beta 3
- We have to release using GUI in github, which create the correct pattern
- The pipeline kicks off, updating formatting of release notse section
- Bumping version number beforehand manually
- Conda feedback PR opened by bot, not much manual intervention is needed
- Manual release by pypi is made by pipeline
- Pin Numpy version to fix main branch in numpy compatibility
- Why dont we have CI failures in 1.22 in main
- Things that may delay more than 2 days then we don’t include in beta
-
Porting of example NBs to v4
- Migration guide thats a bit more detailed than the migration guide
- Most notebooks can likely work if we change the imports
- We can try mass running them
- This time do one PR notebooks
- Three things to change
- Change aesara
- Change inf data to true (removes)
- Something else (Missed in notes)
-
pymc-experimental repository
- Repo now exists; are there things we should move there now?
-
Sync meetings going forward We have been having dedicated documentation meetings which I think was partially responsible for some lab meetings ending before the allocated hour. We plan on continuing with the dedicated doc meetings, should we try to organize something of the sort for dev and discourse (and pymcon?) teams? One option could be to have dedicated team meetings of 45min-1h every 1-2 months plus a monthly 30 min common meeting to coordinate and talk about no-team things like governance or grants.
- PyMC v4 Release
- Blockers
- Beta Release next week!
- Log p recording is broken but only really needed for az.compare
- Migration guide
- Coordinate release with Meenal for release notes
- PyMC Community Team
- Season of Docs Wrap-up
- Accomplished everything, except for V4
- Martina sticking around which we’re all excited about
- In the new website template there is a section of distributions to check them (link: https://pymc--5232.org.readthedocs.build/en/5232/api/distributions.html)
- Proposal to rewrite quick start to use physical units
- Current quickstart goes into benchmarking, what bayesian computation and evaluation is
- pymc-experimental repository
- Theres functionality in the main repository that is more experimental but isnt well tested because we don’t want it holding up the main release
- Idea is to have some parallel addon repo if you installed it would put things side by side
- PyMCon 2022
- Mid jan/ early Feb planning kickoff
- Virtual is fallback, but better than no conference
- Sync meetings going forward We have been having dedicated documentation meetings which I think was partially responsible for some lab meetings ending before the allocated hour. We plan on continuing with the dedicated doc meetings, should we try to organize something of the sort for dev and discourse (and pymcon?) teams? One option could be to have dedicated team meetings of 45min-1h every 1-2 months plus a monthly 30 min common meeting to coordinate and talk about no-team things like governance or grants.
- Discourse upgrade to paid plan
- PyMC V3 is the historical name
- Fix old PyMC v2 docs to be more clear that its a very old version https://pymcmc.readthedocs.io/en/latest/README.html
- Installation issues remain a challenge
- Governance
- PyMC Renaming
- Outstanding issues
- readthedocs still seems to use pymc3
- Outstanding issues
- Docs
- Forwarding to the same server
- Proposal: docs.pymc.io → https://pymc.readthedocs.io/ (plus making stable the default version, not latest), examples.pymc.io → https://pymc-examples.readthedocs.io (default to latest)
- Status update on docs projects
- Add page on history of PyMC → migrate from https://github.com/pymc-devs/pymc/wiki/Timeline and also include the visualization Ravin made!! (and add a forward there once the page is part of the RTD)
- Add a short description of PyMC4.0 (one or two paragraphs with a link to more information) on the documentation front page. Could be based on the work that Thomas and Chris are doing on blog posts and reports.
- Forwarding to the same server
- PyMC v4
- Blockers for 4.0.0-beta1
- Initval framework: https://github.com/pymc-devs/pymc/pull/4983 (finish line)
- VI: https://github.com/pymc-devs/pymc/pull/4582 (finish line)
- Aeppl: https://github.com/pymc-devs/pymc/pull/4887
- Migration guide / Release Notes https://hackmd.io/7_BJXPrDT1ShtDMxghGfHA
- Updated and exhaustive api
- Blockers for 4.0.0-beta2
- Restauration of moments for most distributions & switching the default to them
- Rework initval/start/jitter mechanisms inside init_nuts
- Restoring support for the GP submodule
- Timeseries distributions
- Mixture distributions
- Blockers for v4.0.0 major
- Docs revamped to Martina’s mockup sample pages
- Update all beginner & quickstart pymc-examples to 4.0.0-beta2
- DeprecationWarnings and backports in v3.11.x release
- Blockers for 4.0.0-beta1
We'll refer to the new release as PyMC4.0 in the documentation
Documentation goals: Merge all open PRs to pymc-examples this week, then creating a v3 release to trigger readthedocs to build the page for v3. Then once the beta release is out start running notebooks with it Also get the documentation-side changes we want to see in the examples in before updating them to PyMC4.0 code
Examples need to be ported to PyMC4.0 before the alpha release (maybe 1 or 2 months from now)
Update the upgrade guide, quickstart guide and put deprecation warnings.
Some notebooks will be updated before the PyMC4.0 release we need to choose which ones to update first. This can coincide with the list of examples the documentation team is planning on moving to the main repo (important/popular examples that we want to update more frequently) then work will continue for about 2 months (estimated) we can do a sprint/hackathon to speed it up
- PyMC3 v3 any relevant news?
- Release 3.11.4. Havent gotten any complaints so assuming no news is good news
- Considered a success 🎉
- Installation troubles
- Generally trying to push dependencies to Aesara recipes
- Still ongoing (probably good to fix that before v4.0.0)
- 👉 The fixes could be backported for v3/theano-pymc-feedstock, but we’ll discuss that in the next meeting.
- PyMC3 v4
- Blockers for beta
- Start/initial values
- 👉 We agree to revert the default behavior to the moment that was the default in v3, without allowing for choosing between different moments! @Michael O will take this on.
- 👉 Option to draw from the prior & API point of entry is still under discussion.
- Docs updates for v4
- 👉 Need to set up autodocs for Sphinx. @Sayam K agreed to take this on
- Release announcement https://hackmd.io/I5F8t9swRqKPfGQPYsR_cg
- Start/initial values
- Blockers for v4 major
- Variational inference. No hope for Normalizing flows to work in nearest future, pickling is broken
- try cloudpickle to fix pickling issues
- ADVI should work and be done soon
- AEVB is also problematic, probably no one is using it and we might want to deprecate
- DensityDist
- Needs API changes to make purpose of inputs in gives more transparent to the implementation. Ricardo already has a proposal in https://github.com/pymc-devs/pymc3/issues/4831 for the updated API.
- Needs info if it’s even compatible right now, or how difficult it is to get it working.
- Before we forget: Deprecating backends and other things marked as
- 👉 Want to add deprecation package & migrate the existing warnings. Meenal want to pick that up.
- Variational inference. No hope for Normalizing flows to work in nearest future, pickling is broken
- Coordinated release announcement/event/materials? partially discussed above. Will re-raise later.
- Blockers for beta
- CZI grant
- The EOSS grant awardees are public now: https://chanzuckerberg.com/eoss/proposals/?cycle=4
- What are our computing infrastructure needs for v4? Now that we have money its time to think about the specifics.
- Idea: Resources for CI testing installation on various OSes?
- Idea: Benchmarking
- Governance doc discussion: https://raw.githubusercontent.com/pymc-devs/pymc3/governance/GOVERNANCE.md. Key open questions:
- Tiers
- general feedback on adding/formalizing the 4 level structure: recurrent/core/council/bdfl
- (maybe leave for PR or future) how should we match tiers (and maybe teams too) to github permissions. i.e. triaging permissions to recurrent contribs, admin rights based on teams and/or steering council?… Also if teams should be org-wide or project-specific?
- Teams
- long term wise, once the doc team is more established, do we agree with this potential team structure?
- Steering council:
- Voting: currently the old steering council votes on the new steering council, we could change that to core contributors vote on new members/renewals
- Vote of no confidence: I think this is like the code of conduct, we need to have some process for core contributors to remove people from the steering council even if we hope and expect to never have to use it
- council membership constraints. I think we should have a minimum number of members, say 5, with at least one member per team and at most 2 institutional contributors of the same company. Or some other numbers but some kind of constraints on those ends.
- Joining the team
- becoming recurrent/core contributors works on a nomination based approach. Do we want the nominations to be public (i.e. like in ArviZ) or only public within the team/slack?
- Things to not talk about
- documentation team goals (i.e. pymc-examples based or also about resources) → we will discuss that next week in the doc meeting, come if you want to talk about that
- Tiers
- Summer internships updates
- Google Season of Docs
- Outreachy
Agenda
- PyMC3 v3.11.3 release
- https://github.com/pymc-devs/pymc3/milestone/23 👉 We added a new issue #4854 to the milestone, because it’s a one-line fix and Ravin (the 3.11.3 release manager) is on vacation.
- What if we need to backport changes to Theano-pymc in the future? See: https://github.com/aesara-devs/aesara/issues/444#issuecomment-889648944 👉 We decided to NOT make backport fixes for Theano-pymc, because there are no automated release mechanisms. If someone needs such fixes for mission-critical production systems, they should fork, fix & deploy on their own.
- PyMC3 v4
- https://github.com/pymc-devs/pymc3/milestone/19
- Bound distributions: https://github.com/pymc-devs/pymc3/pull/4815/files
- Discussion: The codebase is morphing from OOP to functional programming, with lots of @register dispatching. Do we want to continue in that direction? What can we do to make it easier to understand/extend for new contributors? (dispatching is a rare pattern in the Python world; IDEs have a hard time making suggestions for functional programming styles, …) 👉 We agree that functional programming approaches make a lot of sense when working close to Aesara graphs, but multiple people were also concerned about codebase accessibility. To compromise we decided to attend to functional/OOP style in the developer guide and make the distinction based on “Is this related to Aesara things? Then functional if it makes sense. Otherwise prefer OOP.)” or the like.
- v4 timeline & communication
- 👉 Restoring moment-based initial values is a blocker (Ricardo, Michael)
- 👉 GaussianRandomWalk is a blocker
- 👉 Mark DensityDist, NormalMixture as “not ready, but we’re working on it” (Ricardo?)
- 👉 VI also “not ready, but WIP” (Max)
- 👉 Timeseries “not ready, API changes WIP”: We plan to use aeppl which enables us to evaluate log for any arbitrary time series, while this is an excellent option, it's still materializing so for the upcoming release we port three important distributions (ARIMA, AR and GaussianRandomWalk) based on Aesara Random Variables
- Summer internships updates
- Google Season of Docs
- Google Summer of Code 2021 - ARIMA tests and documentation left
- Outreachy
- going well so far. Unlike gsoc, outreachy internship can be extended if external circumstances delay the work. The project will finish on September 27th
- Next journal club (no papers on stack at the moment)
- PyMC3 v3
- Can we cut a release now? Any last-minute notebooks or things that need to be merged? → https://github.com/pymc-devs/pymc3/milestone/23
- → Test failures on the v3 branch because of Scipy chisquare API change breaking some tests → Chris will fix it
- → https://github.com/pymc-devs/pymc3/pull/3792 Michael will take a look & either fix or close it.
- CI is currently red, might be a flaky test
- Chris to fix the Chi squared value
- Can we cut a release now? Any last-minute notebooks or things that need to be merged? → https://github.com/pymc-devs/pymc3/milestone/23
- PyMC3 v4
- Which blockers remain until a release? → https://github.com/pymc-devs/pymc3/milestone/19
- Aesara → Just mentioning that conda-forge packages are not very reliable. Some activity in the conda-forge pymc3-feedstock may hopefully improve the situation.
- Summer internships updates
- Google Season of Docs
- Google Summer of Code 2021
- Outreachy
- Next journal club
- PyMC3 v3 release
- Can we cut a release in June?
- Dont need to couple them
- Dont want to introduce new features to v3
- Cut v3.11.3
- Relevant milestone: https://github.com/pymc-devs/pymc3/milestones
- Maybe add this https://github.com/pymc-devs/pymc3/issues/4658
- Ravin will be release manager
- Can we cut a release in June?
- PyMC3/Aesara discussion
- Priorities
- Michael will rebase PR #4696 one last time (restores
shape
backwards compat & brings more capabledims
) and then we merge it before other PRs break it again - Upcoming change of how
size
works in Aesara - doesn’t need to delay PR #4696, but will be incoming shortly after (need to update a few tests in PyMC3) - Merging
v4
intomaster
, so we can finally un-block lots of other PRs - Rename of
master
→main
- Michael will rebase PR #4696 one last time (restores
- …followed by
- Finalizing distribution refactors
- Documentation updates
- Beta Release of PyMC3 v4.0
- Priorities
- Summer internships updates
- Google Season of Docs
- Are are going to have versioned docs, if for no other reason than PyMC3 v3 final release
- Martina gave a great overview/plan for organizing the Docs. Could we put a link here to her spreadsheet?
- Google Summer of Code 2021
- Outreachy
- Google Season of Docs
- Next journal club (call for papers)
- PyMC3/Aesara discussion
- GLM submodule deprecation
- Status of v4 branch
- Need to resolve "Merge v4 into master” milestone blockers ASAP
- Timeline/roadmap
- Summer internships updates
- Google Season of Docs
- Review of candidates
- Timeline for project
- Switch to automated ReadTheDocs build beforehand?
- Google Summer of Code 2021
- Outreachy
- Google Season of Docs
- Aesara milestones
- Merge v4 into master milestone
- V4.1 milestone
- We can add labels to issue tickets to indicate which ones are blocking the above
- v4 update
- Brandons been taking a break to work on Numba
- Most of the operators are converted
- What’s been setup recently is taking the python implementations and running them in Numba without needing to map or write anything
- Not necessarily any speed gains, just demonstrating numba system meets all our need
- The C based compilation allows C and python to run together
- We can get rid all of the function stuff and just use NUMBA and call that within numba
- All things that have C implementations are going to have Numba implementations
- MLIR is the basis for XLA, JAX
- NUMBA works by compiled to llvmIR, but a specialized version that compiles ml stuff
- Working with Aesara PyMC combo is a bit hard
- v4 merge into Master planning
- Things blocking
- Integer casting issue in observations
- Size Shape PR
- After this we can merge and do an alpha release
- Things blocking
- Ricardos cool work
- Just like we compose x+y we can compose distributions
- like a GRW we just take two Gaussian and just cumulative sum them together
- Brandons been taking a break to work on Numba
- Google Season of Docs
- Deadline is May the 17th
- Go through the dropbox links
- PyMC3/Aesara discussion
- Aesara distributions - are we already feature-complete w.r.t. numpy.random dists?
- → Ravin will take a look
- Status of v4 branch
- New imputation mechanism was recently merged - now realized through masking instead of creating new tensors.
- Subtensor indexing now works for log-likelihoods - this allows, for example to do switching between different priors (like in a model comparison) using indexing. There are other ways to achieve the same mathematically, but the indexing is more elegant. (We should probably write a demo notebook for the gallery! Also this is something few other PPLs can do.)
- Outstanding tasks
- Brandon mentioned something about having to drop jax? → We don’t have to drop JAX. Some new features just won’t work with new JAX versions.
- https://smartinference.slack.com/archives/CCU6WGM5L/p1617310119041300
- Someone needs to dig into JAX and understand what’s going on below omnistaging
- We’re uncertain about how omnistaging affects model or which ones
- It might just be new dynamic indexing functionality that is affected
- We do want all features to be supported
- Current tests that are failing now seemed like by that point it should have been a constant
- Regardless no tests have been completely disabled at this moment
- Numba backend is looking promising
- Shape/dims/size discussion
- We all agree that “size” is internally the right way to thing about RV dimensionality/broadcasting.
- We all agree that “dims” are much more useful than “shape”.
- We’ll implement dedicated backwards compat. functions for the RV so we can keep supporting existing implementations
- DensityDist, Mixture, Bounds, Variational inference are the biggest not-yet-dealt-with things in v4
- For DensityDist we’ll reduce the supported API to something reasonably simple.
- VI internals are still broken
- intX problem (PR by Max) about downcasting
- Mixture - Brandon will probably give inputs here
- Branches: We want to rename master→main and merge v4→main.
- We’ll need a “Guide to v4” → Ricardo will work on this based on API Quickstart notebook
- We strongly favor building the docs with Readthedocs (it can be done) so their build is automated and we can have versioned docs. This is something we should get done before Google Season of Docs already.
- Aesara distributions - are we already feature-complete w.r.t. numpy.random dists?
- Summer internships updates
- Google Summer of Code 2021
- Outreachy
- Google Season of Docs
- PyMC3/Aesara discussion
- Project status updates
- Documentation, Maintenance & Bugs
- Summer internships updates
- Google Summer of Code 2021
- Outreachy
- Google Season of Docs
- Funding
- CZI “Essential Open Source Software for Science”
- Journal Club
- Next meeting, March 19
We would like to merge v4 into master so that it gets more attention, but there is concern that it would destabilize master.
Perhaps it would be a good idea to create a v3_stable
branch that people could use if they need stability (and for bugfixes), then go ahead and merge v4 into master.
Do GSoC mentors have to be PyMC project devs?
- PyMC3/Aesara discussion
- Status update
- PyMC3 3.11.1 release
- Upgrade of PyMC3 backend from Theano-PyMC to Aesara
- RandomVariable PR
- We’ll switch to Semantic Versioning starting in 4.0
- Documentation, Maintenance & Bugs
- Status update
- Summer internships
- Google Summer of Code 2021
- Outreachy
- Google Season of Docs?
- Funding
- Update on NASA/ROSES grant
- Journal Club
- Paper suggestions?
- Will have further discussion of “how much” semantic versioning we will use
- Marco G points out that it will mean maintaining branches for each major/minor/bugfix version
- this adds maintainer version, but probably not contributor
- Also a question of how long we port back bug-fixes to old versions
- Seems like there’s a consensus that it is too much work to maintain things that are not at HEAD
- [editor] I’m a little behind, apologies if I missed the counterpoint here
- Marco G points out that it will mean maintaining branches for each major/minor/bugfix version
- Doc/maint/bugs
- Michael O may be able to run all the notebooks and ping on the stuff that is not passing for 3.11.1
- Aesara
- There’s a random variable PR that’s quite big. coming along pretty quickly.
- Check out bullet points here and can start making PRs to the 4.0 branch
- Should we make v4 the master branch?
- Brandon says we might want to wait until “majority passes” instead of “majority fails”
- Lots of distributions need random variable operators - pretty straightforward, but a little bit of effort.
- How do we make sure nobody makes PRs that work on old version, but not v4?
- Let’s definitely communicate quickly with new PRs about the changes
- Add notice to pull request template (should @ tag someone here)
- Question: why dispatch pattern rather than OOP?
- [Arguably] cleaner
- Hard to annotate nodes with methods. Can work, but it is a kludge.
- The stuff that is carried around is in the
.tag
- Should update to use aesara’s printing subsystem for graphviz stuff
- The stuff that is carried around is in the
- Should we make v4 the master branch?
- GSoC
- List of ideas here
- We should perhaps apply to Outreachy
- GSoC is half the time of coding, so projects are smaller
- Outreachy focuses on underrepresented groups, still 3 months
- Costs $6500 to do Outreachy (price of 1 intern, but gives access to possibly more than 1)
- Check (with @Chris F ) if we have the money
- GSoD is later in the year, but good idea
- Can create a wiki page (@Sayam K?) with ideas to add to until then
- NASA/ROSES
- Stan/ArviZ in on grant
- Takes ~180 days to decide. Let’s see in November/December!
- Is there contract negotiation?
- NumFOCUS gets some money, and will hopefully handle haggling with NASA (in case this comes out well 😄 )
- Ask Ravin if you want to see the grant!
- Papers
- Lots of people have papers almost out, but there are robots to maintain in the meantime
- PyMC3/aesara discussion
- Current Topics: Documentation, Maintenance & Bugs
- Documentation migration: Notebooks are now a submodule and live in /pymc-examples instead. → We should put a pointer into the Readme
- This is now done. README needs a bit cleanup. Marco G is working on this.
- MacOS performance/installation problems → We should improve our install instructions (in the Readme) and explicitly tell people to install dependencies through conda before using pip install on Theano/PyMC3
- Make these installation instructions clearer and adapt them to the OS: one Wiki page for each OS install and link those in the README
- Documentation migration: Notebooks are now a submodule and live in /pymc-examples instead. → We should put a pointer into the Readme
- Release Roadmap
- Theano-PyMC 1.1.0 (imminent)
- Currently, PyMC 3.10.0 only works with Theano-PyMC <= 1.0.12
- The
PureOp
andOp
were renamed toOp
andCOp
respectively, because you can have anOp
which doesn’t have a C implementation —> every customOp
that someone wrote will have to now inherit fromOp
instead ofPureOp
- The linkers were all moved to their module
- The
gof
module is gonna be renamed tograph
→ easy to migrate to though, not a lof breaking changes for users → have to continue pinning Theano-PyMC in PyMC3
- PyMC3 3.11.0 (depends on the above)
- Will be pinned to Theano-PyMC 1.1.0, while PyMC 3.10.0 will stay pinned to Theano-PyMC 1.0.12
- Criteria and Timing of Aesara 2.0
- We should define such a date soon, while continuing the refactoring of the non-public code without breaking user-facing stuff.
- We can continue to make major improvements like graphs rewrites in Aesara 2.1.0
-
RandomVariable
is now merged in Theano-PyMC, so we can already migrate PyMC to using them, as well as getting rid of the forward sampling code.- Open issues on the PyMC3 repo to start this transition
- Pretty big investment, need more than 1 person doing it
- 100% a PyMC3 API refactor
- Need to replace each distribution
.random
method byRandomVariable
, including everything that uses this machinery (i.e forward sampling methods).- Can take a look at this blog post, but we probably a minimal reproducible example of this transition to kickstart developers’ work?
- This is actually a refactor of
pm.Model
and distributions’random
at the same time. - Once this PR is done, it should be quite easy for people to implement the little changes that come out from this big change. → We should make this change as soon as possible, so that PyMC3 4.0.0 is the release directly following 3.11
- See the timeline document for more information
- Theano-PyMC 1.1.0 (imminent)
- Current Topics: Documentation, Maintenance & Bugs
- Community Updates
- Collaborative Sampling Library? —> postponed to next meeting, to get people who are able to update us on that
- Apply to outreachy?
- Summary: Outreachy is a GSoC like project targeted at under represented minorities. Moreover, unlike next GSoC that has reduced the coding period to half the time it used to be, outreachy internships are still 3 months.
- Cost: $6500.
- Funding
- Update on NASA/ROSES grant
- Journal Club next Friday (2020-01-15)
- Paper suggestions?
- PyMC3/aesara discussion
- Development updates
- 3.10. release is very close
- need to fix the recursion problem with DensityDist (it’s a regression w.r.t. 3.9.3)
- more?
- Dev speed on Aesara is going up, with more people helping out, but we still need more.
- 3.10. release is very close
- What happens after 3.10 (roadmap to 4.0.0)
- Shape handling done right!
- Changing defaults in the API
- InferenceData all the way - we have PRs to support it of prior/posterior predictive and so on.. 4.0.0 is the chance to make it the new default.
- Documentation overhaul! There’s some structure to be fixed. Getting rid of warnings, maybe moving to a new docs theme? Similarity with Aesara?
- Dependency on Aesara v2.0.0 - this means we should push Theano-PyMC/Aesara on a fast dev cycle to get it ready!
- Development updates
- Collaborative Sampling Library
- Meeting with Remi/Neeraj/Du Phan
- Funding
- NASA/ROSES grant
- Should https://github.com/aseyboldt/sunode be included, especially w.r.t. satellites?
- NASA/ROSES grant
- Journal Club next Friday
- Bayesian Workflow Paper: https://arxiv.org/abs/2011.01808
- Presented by Ravin, who has read all 77 pages, thouroughly.
- PyMC3/Theano-JAX discussion
- Variational inference
- what are the implications for VI?
- Project organization
- Theano-PyMC name
- Two proposals for new name: Jeanet and Aesara
- Gonna be used for next major version (4.0), which will use new RandomVariable op (and all the nice benefits it has on dynaminc shape handling and forward sampling functions)
- Renaming PyMC4 work (PyMC-TFP?)
- Theano-PyMC name
- Roadmap
- Samplers
- Use TFP / Numpyro / MCX jaxified samplers thanks to JAX linker → for 3.10
- Writing pure Theano samplers would be included in next major version, e.g 4.0
- Samplers
- Variational inference
- Funding
- NASA ROSES grant
- Use of grant funds
- Writing 3-page proposal
- Joint submission with Stan?
- And Arviz
- Discussions already happening between ArviZ and Stan
- Next step: setup common discussions with ArviZ, Stan and PyMC folks
- Use of NumFOCUS funds for student projects
- NASA ROSES grant
- Documentation
- Feedback from PyMCon
- docs.pymc.io vs. pymc3.readthedocs.io
- Moving notebooks out of main repository
- Move it to a new dedicated repo, with sub-directories for topics
- Build automation
- Release new version
- PyMCon: What did people think?
- Takeaways for next time (while we still remember them)
- Please fill out this retrospective if you get a chance
- Theano
- Should we rename Theano PYMC?
- Yes
- Do we have people to help?
- Ravin will jump in
- Osvaldo knows some people
- What about the samplers?
- Osvaldo mentioned there may be issues
- Thomas says we just rely on other peoples samplers for the time being
- Those samples come from mcx, numpyro, and tfp
- Thomas said the whole thing doesn’t need to be theanofied but just parts of it
- The end goal is the whole sampler is theano, but that is difficult
- The key insight is that Theano used to think of Theano as a fixed set of ops, like lego bricks you can build certain things
- If you want to build a nuts sampler you can't build new pieces
- The key understanding is now that we own Theano we can build our own custom ops in Jax, and then assembled in that way
- Chris, low hanging fruit for things is to update documentation is to make it more user friendly
- Ravin, can we start deleting chunks of Theano?
- Chris: Yes, provided its modular
- Max: What are implications for VI?
- Random Ops not working for now
- NotImplementedError: No JAX conversion for the given
Op
: RandomFunction{normal} - It is a blocker for VI
- Chris: What are things were do for 3.10
- Thomas: Just jax based samples that work under Jax backend
- Chris: Would anything break?
- Max: He thinks because Theano relies on buffers and stuff
- Also Max: my concern about shared variables is not more a concern
- Random Variable explainer
- For now, instead of random variables we have placeholders
- Forward sampling in PyMC3 is ridiculously complicated, RandomOps will mean we can delete the forward sampling code
- Should we rename Theano PYMC?
- Theano-JAX discussion
- Consensus: We will focus our efforts on the new backend (currently “theano-pymc”) which has already become (on master) and will remain the backend for our library (“PyMC3”).
- Codebase roadmap:
- pymc-devs/pymc3 → this is and will remain our library
- current 3.9.3
- next 3.10.0 (will use theano-pymc as the backend)
- 4.0 (will use re-named backend and may have RandomVariable if it works out)
- pymc-devs/pymc4 → this was an experimental project that we’re not going to continue
- Message is that PyMC4 with TFP backend is not developed anymore — although we learned a lot with it and got great ideas — but the future is PyMC3 with the amazing, new Theano-JAX backend.
- pymc-devs/pymc3 → this is and will remain our library
- Proposal: move to Theano-JAX as next PyMC backend
- General agreement + Theano-PyMC (because JAX doesn’t yet support Windows)
- We’re retaining Theano as the backend and adding JAX as compiler, which is a new feature and allows the use of new, vastly faster samplers. But current compilers (C through classic Theano) and PyMC3 samplers are still there, as well as the PyMC3 modeling API.
TABLEDProposal: Rename Theano-pymc and move to its own, repo. Instead of being a fork of Theano.
1. jeanet? jeano? pymc-backend?
2. Aesara? → daughter of Theano ;)
3. Renaming: gives a clear message that this is a new Theano and that PyMC3 is not dead
4. Not renaming: conveys continuity and avoids confusion for users
5. Consensus for renaming, although we won’t choose the name today.
6. Timing of the renaming? This means different entry for PyPi and different import names. Renaming will be for next major version, so that PyMC3 will always work with a backend called Theano. And then PyMC4 will run on Aesara / Jeanet, whatever name we came up with for Theano-PyMC
5. PyMC samplers → Not discussed
1. use exclusively our own
2. use exclusively third-party samplers
3. have both (but with the "commitment" of developing and improving our own
6. Variational inference → Not discussed
1. what are the implications for VI?
7. Project organization → Not discussed
8. Follow-up tasks
1. Prepare statement about PyMC3/PyMC4 future
2. The roadmap on the Wiki should be updated based on the decisions above
- Developer meeting attendance
- Who is permitted to attend lab meetings?
- Should be open if we can → live-streams?
- We have the Discourse for public discussions, which will be even more important with Discourse. Update the README with link to discourse https://github.com/pymc-devs/pymc3/issues/4154
- The current way of waiting to observe people making consistent, useful PRs works well.
- The monthly meetings will remain private, insofar as they are purely useful for developers of the project and aren’t even a requirement of our project.
- Developing explicit criteria for (1) meetings, (2) Slack, and (3) repository access
- Do we need a public meeting or regular blog post to update user base?
- Who is permitted to attend lab meetings?
- PyMCon updates
- Speakers announced and registration is open. Tell all your friends
- Really could use volunteers to help out with critical and day to day tasks
- Either direct volunteer help from people here or reaching out in your network for reliable people
- Async tasks prior to conference, day of volunteer help
- And abbreviated task list is here: https://docs.google.com/spreadsheets/d/1g0rQs4yxSE_ILk1d6fPlxNUmtk_sOyL0RUn1vVfjU1U/edit#gid=0
- Join the slack
- Announce, with a talk, pymc4 (the theano-pymc one)
- Ravin will touch on this topic during PyMC Welcome talk
- How will PyMCon income be used?
- Note from Ravin: Can we punt this to next lab meeting? Let’s make the money first before deciding how to spend it
- Hire one (or more) intern(s) to work on PyMC4 via outreachy
- setting up gpu testing for theano-pymc
- Ravin would like to buy tshirts or other swag for organizers
- PyMC3 Development Updates
- Increase in issues with parallel processing on different platforms: is this a coincidence or could this be a bug?
- Adrian suspects a bug in OpenBLAS that causes deadlocks with worker threads
- Models must be pickled on Windows AND OSX starting with Python 3.8 (on OSX the default changed from “fork” to “spawn”, but can be changed back manually via a new kwarg - see PR #3991)
- Adrian did a PR that improves the error message when pickling fails: https://github.com/pymc-devs/pymc3/pull/3991
- Most issues seem to be related/fixable by how the user implement stuff in notebooks. We’ll create a Wiki page with known problems and workarounds
- Should we aim for a virtual hackathon preceded by issue review sometime?
- Issue with list of NBs needing manual style update: https://github.com/pymc-devs/pymc3/issues/3959
- InferenceData/posterior predictive related
- Any big new feature targets?
- MLDA coming soon
- pm.DifferentialEquation move to sunode?
- Theano-JAX / symbolic PyMC discussion → deferred to next meeting
- Communicate collectively around PyMC being in Tidelift subscription now
- Set up dates and process for annual fund-raisers (Bayes Days, Laplace Day, etc.)
- Increase in issues with parallel processing on different platforms: is this a coincidence or could this be a bug?
- Theano-PyMC Development Updates
- Adding Theano-PyMC to Tidelift
- PyMCon
- We got a lot of great proposals!!
- Will need volunteers to run the conference
- Please volunteer or help find volunteers
- Can we get 1 or 2 people to help with sprints?
- Will opening attendee registration this month
- Oriol and Alex got Tidelift to sponsor PyMCon!
- Things will really pickup by next Lab meeting
- ArviZ is now NumFOCUS sponsored
- Journal club
- September paper?
- Google Summer of Code 2020
- General updates
- PyMC3 Development
- Fixing the test env (we can’t merge PRs right now): candidate fixing PR
- Compile docs to 3.9.3 (currently on 3.8)
- Communication of releases
- Theano 1.0.5
- PyMC 3.9.3
- How do we proceed with the MLDA PR?
- Theano-PyMC Development
- PyMCon
- Help recruit volunteers, particularly a technology chair
- Share CFP with 2 people you know
- Refer sponsors (The money will go to PyMC)
- Journal club
- Next meeting Aug. 14: any topic proposals?
- Google Summer of Code 2020
- General updates
- PyMC3 development
- When do we drop py36? (see PR)
- Migration of
from_pymc3
from ArviZ to PyMC code base - Theano fork releases
- Symjax developments / Symbolic PyMC integration
- PyMC4 development updates
- Journal club
- Next meeting July 10
- https://arxiv.org/pdf/2004.12550.pdf
- Google Summer of Code 2020
- schedule (bi)weekly meetings to sync up
- push for blogposts (blog seems to be now a NumFOCUS requirement: https://github.com/numfocus/gsoc/blob/master/CONTRIBUTING-students.md#during-the-summer)
- Tidelift
- Versioning
- introducing 3.x.x pattern
- Eric & Michael will try to increase release automation
- major ~quarterly
- bugifxes ~monthly & by priority
- PyMC4 will be 4.x.x
- PyMC3 development updates
- 3.9 release
- Lots of
ValueError: Mass matrix contains zeros on the diagonal
on master — isjitter+adapt_diag
more unstable than on 3.8? See here for instance, or lots of questions on Discourse. - Few instance of this error in the last two weeks, either on GitHub or Discourse — good sign? Maybe was just a false positive? - Wait for 3.9 and see if there really is a problem. - Current workaround (changing initialization method toadapt_diag
is not ideal insofar as it makes users loose some of the benefits of running parallel chains. - New option to return
InferenceData
throughpm.sample
(merged). Set this as the default returned type in 3.1?.0? - Release the feature silently in 3.9 so that we can all test it internally - ImplementFutureWarning
in 3.10 and also potentially in 3.11 - SetInferenceData
as the new default in 3.11 or 3.12 (~end of year) - Theano fork releases?
- Fix index issue, especially because Numpy will break one day.
- Release with major PyMC version — 3.9? 3.10?
- Lots of
- Testing
- do we need to figure out how to further decompose our testing as it grows?
- how can we test to reduce breakage between ArviZ and PyMC3?
- Migrate
from_pymc3
from ArviZ to PyMC code base, to ease bug fixes and new features development. This would also makeInferenceData
native to PyMC3, as it is in PyMC4 - Necessity to test against latest ArviZ release and ArviZ master in CI
- Migrate
- 3.9 release
- PyMC4 development updates
- From slack discussion is TFP a good backend or symjax? (symjax meeting was rescheduled)
- PyMC4 logo design
Move to Google Meet for future meetings
- Google Summer of Code 2020
- Waiting for NumFOCUS to make decisions. Nothing to be done until the students have acceptance notes.
- Tidelift
- Waiting for more information from Chris about Tidelift — Would it be for funding?
- Returning
InferenceData
fromsample
Pain points for deploying model:- Parallelism — how do we assure ourselves that we are using all the resources of a particular high performance computing environment. Do we want multiple cores? Multiple chains? GPUs?
- Decomposing operations onto a pipeline is difficult. Saving and reloading traces is a problem. Hence interest in replacing
MultiTrace
withInferenceData
— no decision on that taken during meeting —> save it for future lab meeting. See discussion on GitHub.
- PyMC3 dependency issues
- Matplotlib / Seaborn conflicts (issue #3849)
- Big pain point. Include fix in next release. Alex will pin Matplotlib (<= 3.0.0 ?) to the appropriate release in dependencies.
- Updating theano
- Not discussed —> save it for future lab meeting.
- Synchronizing PyMC3 and ArviZ versions
- ArviZ already pinned to 0.7 in dependencies. PyMC stats and diagnostics docs should link to ArviZ’s. Alex will see how to do that with Colin.
- Matplotlib / Seaborn conflicts (issue #3849)
- PyMC4 development updates
- Issue and PR review (as time permits)
- Initialization seems more sensitive on the master branch, making
init="adapt_diag"
and usingtestvals
very often necessary. Where can it come from? What changed on the master branch? Robert will try to investigate this through the test suite.
- Initialization seems more sensitive on the master branch, making
- Miscellaneous
- Kevin Systrom (Instagram founder) engaging with PyMC — see https://rt.live. Other interest from health care organizations. Big enthusiasm from PyMC devs to keep this momentum going.
- Cutting 3.9 release soon? Adopting faster release cycles to reduce the gap between master and the latest released version? This would also make the docs and notebooks gallery more up to date with current functionnalities. Would a 6 months release cycle be appropriate? Or doing patch releases? No decision on that taken during meeting —> save it for future lab meeting.
- Several Wiki pages seem out of date. Michael will review and modify or delete them.
- Google Summer of Code 2020
- Need to start a document identifying students
- Summary of applicants of intensity of students in each project so we can start a selection
- Variational Inference.
- Students are reaching out to Maxim directly
- Forking Theano
- What are rules for merging things?
- Is there a general roadmap of things we should do?
- Fixing tests/Getting CI running?
- Porting over PRs from original Theano repo that look good
- Not sure of best path forward, everyone contributes to this
- Mentioned that there’s thing we can do.
- Colin has a big PR on original repository to the fork (https://github.com/Theano/Theano/pull/6729)
- Should merge the ones we opened into our own fork
- Add a github template informing folks where things will go
- There’s a few that Brandon opened that added graph
- Move all of our PRs over to the new repo
- Drop python 2 support and related code path (Ask Colin on this one)
- Hold off on making big code changes until we merge small prs
- Fix tests first? Or pull in Colin's test too much?
- One possible path
- Transfer all our pull requests first
- Comb through issues and other PRs on original theano repository and see what’s interesting
- Get CI working and come up with a plan
- Merging contributor attribution
- Should be fixed today 3/6
- PyMC4 development updates
- GP PR
- For those on call no one has time to look at PR
- Still discussing if Gaussian process is necessary
- Pr is trying to recreate a similar api compared to PyMC3
- But there’s pros and cons for this
- Chris is saying GP is hard to use because arrays need to be set ahead of time
- If we can come up with a nice api with not a lot a code
- What pull requests in TFP main repo would be useful in PyMC4
- For example for compound step is one of those
- Not sure how to proceed with that PR, whether it goes into PyMC4 or Compound Step
- Better to leave this compound step on PyMC4, Junpeng is not confident they’ll add sampler
- Compound step is controversial because it could be construed as inefficient
- Would be a good Google Summer of Code project
- GP PR
- PyMC3 PRs
- Several PRs are coming from us
- Do something like a PR bash in the time coming
- See if we can get those PRs merged
- Go through PRs and label bugfix or feature
- Focus on getting our fork of Theano first before clearing through PyMC3
- There’s a few Theano ops we never merged into Theano so these could be good to push up
- Let’s focus on getting our own fork working. Give some new momentum on PyMC development
- PY for vectorized posterior predictive
- Robert has permission from Chris to just merge it
- Bug review would be good
- Try and do bug bash before GSOC
- Issue and PR review (as time permits)
- Google Summer of Code 2020
- Review of deadlines
- Project list review
- Student candidates
- NumFOCUS Open Source Sprints
- Bloomberg, April 4-5 in NYC
- Robert, maybe
- May 2-3 in London
- June 15 as part of PyData Amsterdam.
- Aug 20 something PyDataLA
- Ravin
- Bloomberg, April 4-5 in NYC
- Forking Theano: see Notes.
- Vectorized posterior predictive sampling: clean up the context handling, and then merge in the current form, keeping the separate entry point: fast_sample_posterior_predictive
- PyMC4 development updates
- Proposed roadmap from Luciano +PyMC4 roadmap
- Issue and PR review (as time permits)
- Thomas forked Theano to Theano-PyMC. We should look to prepare a release and make it a PyMC3 dependency.
- Might be good to have a mass matrix adaptation project for GSoC
- PyMC4 development updates
- Issue and PR review
- Observations vs. predictors(explanatory variables)
- PyMC3 default mass matrix adaptation
Observations vs predictors:
- Can we update weight variables in a linear regression example using
pm.Data
? - Should there be a notion of
pm.Constant
, to distinguish frompm.Data
that goes intoobserved=
? - If there are deterministics that depend on
pm.Data
ortt.shared
variables, this breaks the posterior predictive checks: interaction of model and trace is hard for users to reason about. - ArviZ needs a “blessed” reference to the model instead of using a private attribute, like it does now.
- This also helps in the usecase where you have 2 models, one for inference, one for prediction, and want to be explicit about which to “bind to”
Agenda
- PyMC4 development updates
- deterministic remains a blocker
- make a PR to tfp (possibly in
experimental
)
- make a PR to tfp (possibly in
- context vs yield
- yield seems nice
- “Example driven development” can be useful
- translate notebooks, even if they don’t work
- deterministic remains a blocker
- NumFOCUS Summit
- PyMC3 release
- ODE PR (https://github.com/pymc-devs/pymc3/pull/3634)
- Need review, nothing is blocking the PR (michael is finishing his edits today)
- What to do with notebooks?
- Add big note to RELEASE_NOTES.md
- 3.8, try for Monday morning
- State of PyMC talk at PyData NYC @ 2:30pm monday
- ODE PR (https://github.com/pymc-devs/pymc3/pull/3634)
- NIPS presentation
- will be presented by Thomas and Maxim
- comments welcome for the poster
- prepare answers for tough questions (reviewer comments etc)
- FAQ at the poster itself
- other goals:
- meet with other PPL developers (pyro/numpyro)
- Xarray for Bayesians Examples/Challenges
- PyDataLA Sprints
- Help tag issues that are small and beginner friendly when you see them!
- Will be on December 3rd
No meeting held
- GSoC 2019 Wrap-up
- Joseph
- Demetri
- Martin
- Aniruddha
- Oriol
- PyMC4 development updates (Will discuss in next meeting)
- Trademark protection (Will discuss in next meeting)
Worked on Symbolic PyMC project
BaseMetaSymbol class was refactored to include a dispatcher
Why a gibbs sampler?
That was part of the proposal and trying to move towards it
How does graph printing work?
How does it work differently than tensorboard?
Is there an example of where we take a PyMC3 model and get a graph?
Yes, and Joseph has it working against a PyMC4 model as well
Adding Differential Equation to PyMC4
The to fit parameters is usually is fit a least squares problem
Modeled after Stans functionality
Potentially would like to add functionality where user can hot swap gradients
Adrian - Solver might not the slowness, but Theano call overhead might be the issue.
We could put in numba functions instead of python functions
Tensorflow or Jax compile functions but only give wrapper functions and we don’t always get the underlying function
Might want to look at the library Adrian is writing
Another thing we can also look at is surrogate models, if we have a relatively small system then we could find different functions that approximate the solution of the ODE, like Chaos polynomial expansion. PyMC would only see the surrogate model
PyMC could work on the surrogate model
Colin - Are there notebooks on how to use this?
Yes, its in the PR
Osvaldo, Martin and Austin worked on Bayesian Additive Trees
Not working fully at the moment but the idea is to grow the tree in a Bayesian way looking at the data
Started out with a non PyMC implementation and then are working with Osvaldo to start implementing it into PyMC3 model
Working on a more flexible way to set priors currently
Thomas - Is the split values probabilistic or discrete? Taken from a uniform distribution.
Do we get posteriors over the split values?
No
Is there a notebook or code example?
The model isn’t quite working yet but there is a notebook showing progress
Junpeng - Any areas of difficulty we can help with?
Talked to Osvaldo and Austin and in the beginning we have a problem replicating one of the priors but we do have a notebook
Two problems - One was PyMC codebase onboarding for Martin.
In a month we will likely have a model that works internally
Find slow ArviZ places, and optimize with Numba
Profiled each and every line of code with a line profiler, remove the bottleneck with Numba, and then benchmark improvement
~30% improvement in stats and diagnostics
Next was plots, so most plots came from fast_kde, so if that was optimized then many plots could be optimized
The last thing working on was Ahead of time compilation with Numba, should be able to do it this month or so
To use the work, the user doesn’t have to do it’s enabled automatically, but have given everything in detail in the blog post
Junpeng
Was this tried on a cloud platform like colab?
Aniruddha, did not try anything across platform but numba is designed to work across platform so it should work
Project was focused on Information Criteria. Check whether an MCMC run has converged or not
Got that done and additional diagnostic functionality as well
Resampling is holding up a diagnostic that is possible in PyStan. An issue is open
Second assessment is convergence assessment
Implemented additional functionality that Aki published in a recent paper
Third section was Model Checking
Added LOO-PIT to ArviZ along with plotting
- GSoC 2019
- Status reports, if available
- ode assimulo
- PyMC4 Development Updates
- Progress blockers
- PyMC3 Development Issues
- Variable names regexp
- Next journal club
-
PyMC4 Updates
- Can define Logp function
- No samplers
- Other distributions
- Max finds himself not having that much expertise in samplers so he can’t solve core path
- Could use Generalize and modularize the backend API
- Prior predictive samples are correct
- Static graph versus dynamic graph was the issue but it seems to have been resolved
- PyMC will have a small sampler meeting first
- In a week or two weeks
-
PyMC3 Update
- Should the strings for varnames be restricted?
- I [RPG] believe the sense of the group was that arviz dims could be limited to acceptable python variable names, since they are relative newcomers, but that restricting variable names might break too much legacy code. We mentioned that spaces in variable names were commonly used as was unicode, especially for Greek letters. [new note: it might be nice to have something that could be interpretable as superscript or subscript, like TeX’s “^” and “_” — underscore being problematic for python, since it’s so commonly used as a substitute for “ “.]
- Should the SMC portion be removed from sampler because it adds so many new keywords
- SMC is never an automatically chosen method
- The decision to do SMC is a big conscious step
- SMC and SMC/ABC are already combined
- The parallelization is a completely different sample size
- Might have a positive effect of seeing SMC because its more visible
- Osvaldo will move SMC (and SMC-ABC) to sample_smc
- No effort was made to check that all the step functions were added
- Possible that some may have been added and not documented
- Any lingering issues that need to be addressed?
- Categorical Memory Error: https://github.com/pymc-devs/pymc3/issues/3566
- Next Journal club,
- We also want to have a sampling meeting so we could skip Journal Club
- Any other non agenda items?
- None
- Should the strings for varnames be restricted?
- GSoC 2019
- Status reports
- Symbolic pymc is going great
- Harivallabha said he’s willing to keep contributing but likely in a milder scale.
- Status reports
- PyMC4 Development Updates
- Observed variables
- analog of
pm.Normal.dist
- NumFOCUS Summit (November)
- Next journal club
- Selecting the Metric in Hamiltonian Monte Carlo
- Exact Gaussian processes on a million data points
Maybe schedule a review of open PyMC3 issues at a later meeting? Possibly this should be just for a subset of people interested in the maintenance of this code base (as opposed to people who want to focus on PyMC4).
- Monthly bug hackathons are nice
- GSoC 2019
- Status reports
- Azure CI
- PyMC4 Development Updates
- Development summit follow-up
- Generator model API
- Next journal club
- Selecting the Metric in Hamiltonian Monte Carlo
- Exact Gaussian processes on a million data points
- Spring Developer Summit
- Followup call May 8
- GSoC 2019
- Final student choices
- Next steps
- PyMC4 Development Updates
- Next journal club
- Colin(?)
Developer Summit:
— Dinner with Google folk first night (28th? Might interfere with Florence and the Machine!)
— Call on May 8
— What is tf2 vs tf1 standard? Write so a graph mode could work, but don’t use session.
— Lots of ways to write a model. Will that settle down?
GSoC 2019
— Four spots instead of five. Don’t tell the students yet.
— Top four projects. Once public, (video-)meet with students
PyMC4:
— Rebump “observed” issue in `pymc4` repo
— Multiple chains is super cool. having 1k chains instead of 4 means you can do statistics instead of visually inspecting. Nice to use instead of MPI. See if stan has thought about this.
—Generators being able to “send” back is flexible and interesting. alternative to inheritance.
Journal Club:
— Next Friday! “Neutra-lizing”
Other Meetings:
- Probabilistic and differentiable programming gathering in SF in June (Colin and probably Ravin)
- Scipy: Eric is doing more PyMC3. Check out the work Ravin et. al. have been developing on a bayesian workflow
- Workshop at Quantum Black in London (Thomas + Colin, maybe in the Fall)
- PyData LA cfp coming up
Hey! Let’s cut a release!
- What’s missing? Might need to rerun one last GP notebook to be good with ArviZ.
- @Thomas W has a few he might need to push
- Density regression
- 3.7!
- don’t wait for ArviZ, we’ll just have a >= version.
- Spring Developer Summit
- Confirmed dates May 28-29, Montreal
- Prep calls
- First call agenda
- Review TF2 architecture before Montreal please!
- Look at
Sample
vsIndependent
, in particular
- Look at
- GSoC 2019
- Potential students (and mentors)
- Application deadline (timeline I think there’s a separate NumFOCUS deadline)
- Evaluation rubric
- PyMC4 Development Updates
- Junpeng and Josh prototype
- Talks and conferences
- Austin 19th? in NYC
- Quansight webinar today
- Eric @ PyCon
- Ravin, Colin and Eric @ SciPy
- Junpeng will be in MTV for 2 weeks early May if there is anybody around
- Next journal club
- Brandon(?)
- Let’s make sure we can still edit proposals after the 9th (when applications are due)
- if so, let’s have a meeting then, and figure out a first pass of students to give thorough feedback to
- We can have a “canned response” until then
- Spring Developer Summit
- Location (if it is in Zurich junpenglao can host)
- Dates
- GSoC 2019
- Potential Students (see notes below)
- Add yourself to the list of mentors https://goo.gl/forms/JUsVA910ED08hiip1
- PyMC4 Development Updates
- Next journal club
- Paying for Slack? Yes!
- Nightly builds for pymc3
- Tensorflow Dev Summit
- Any burning questions?
- Would attendance be helpful?
Notes
- Meeting
- Let’s figure out if London or Montreal is preferred?
- Zurich? Maybe not…
- Should do more video calls with google before then
- New Stan compiler is doing an “open weekly standup”, which google could choose to attend
- Colin might aim at helping with this starting mid-march
- New Stan compiler is doing an “open weekly standup”, which google could choose to attend
- Let’s figure out if London or Montreal is preferred?
- GSoC 2019
- https://github.com/numfocus/gsoc/blob/master/2019/ideas-list.md
- Potential Students
- +GSoC 2019 student list
- PyMC4 progress
- Need to hook the
tfp
inference up to models- Can also hook
pymc3
samplers up topymc4
models, looks at the slowdown
- Can also hook
- 2 nice PRs: one to auto-transform variables, another to rename parameters of distributions.
- Eric will make a last change to the parameter renaming, and then we can merge
- Transform has a tiny bit of work, then should go in
- https://github.com/pymc-devs/pymc4/issues/84 Functional API by writing a Class as model
- AST parsing: Agreement in principle to test it out on PyMC4 with it being off by default
- Need to hook the
Tensorflow Developer Summit
-
No burning questions for Tensorflow devs in general for TF Summit
-
TF devs are all busy porting code from Tensorflor 1 to Tensorflow 2. Guidelines of docs of good Tensorflow 2 code
-
We should probably care about writing for tensorflow 2 instead of tensorflow 1.
-
Maybe something on graph editing?
-
New release?
- Lots of nice shape stuff
- Give it a few days for
arviz
PR - Maybe next week or two
No meeting
- PyMC4 Development
- tf 2.0
- PyMC3 formatting standards
- Adopting new standards
- Add section to contributor’s guide?
- ArviZ update
- PyMC website update
- Greek symbol kwargs
- GSoC projects
- Next journal club
- Add tf-2.0 to requirements.txt
- where is tf-2.0 being published from? where’s the source code?
- It might be here, which looks like it is internal google code…
- GSoC — should update wiki with new ideas and keep older good ideas
- Create document outlining the case for/against using Black for formatting
- ArviZ should implement
- probably use aliases (
traceplot = az.plot_trace
) - good that numba is optional in arviz so it isn’t pulled in as a dependency
- Website looks good!
- Greek symbols
- Attending members mostly think nothing is wrong with it
- View of “this is confusing for beginners was presented”
- It can be viewed as “an Easter egg”
- GSOC
- https://github.com/pymc-devs/pymc3/wiki/GSoC-2018-projects
- Start submitting applications to Google Jan 15
- Start thinking about projects (Feb 26 get decisions)
- Brandon Willard had some symbolic computation thoughts for PyMC3/PyMC4
- Next journal club
- https://arxiv.org/abs/1806.07366 ← “Neural Ordinary Differential Equations”
- https://arxiv.org/abs/1812.11592 ← “A Geometric Theory of Higher-Order Automatic Differentiation” 🔥 (two parts? next friday, same time for a ~30min “Differential Geometry 101”)
No meeting
- Cut a new release
- Readme and docs need updated
- Deployment stuff on conda-forge
- What to do about random draws https://github.com/pymc-devs/pymc3/pull/3214
- Invite Luciano to dev summit?
- PyMC4 Developers Summit
- agenda
- logistics
- tensorflow materials
- ArviZ update and PyMC3 plotting deprecation road map
- PyMC website
- Next journal club
- Hamiltonian Descent Methods (https://arxiv.org/abs/1809.05042)
- Semi-implicit variational inference (https://arxiv.org/abs/1805.11183)
- cut small release, start adding
.dev
to end of master version - Invite luciano to slack at least to discuss the PR
- seems to be a preference to merge soon, since it does fix the problem
- draw_values
is doing a few jobs, which leads to problems like this and makes
.random` slow - we might do a refactor to address that after
- some question about adding then removing a requirement
- Check out design docs for PyMC4
- Think about how much time designing vs coding
- We are probably using tensorflow 2 instead of tf1
- ArviZ seems like it could have wrappers mapping or make a strong API change
- Let’s make a PR for
traceplot
after the next release and talk it over there.
- New website prototype will get pushed later today to a branch
- Sphinx is a little hard to work with for “nice” design (need to hack both sphinx and docutils to really do that)
- Possibly have a static site that links into a sphinx-generated site?
- Journal club: a tensorflow/edward paper?
- NumFOCUS summit debriefing
- roadmap
- governance
- infrastructure
- culture/CoC
- fund-raising
- Tensorflow backend for PyMC4
- Dustin and TFP
- Key question: how tight integration of interceptor in TFP?
- Face-to-face development meeting
- Arviz update and PyMC3 plotting deprecation road map
- Let’s get a proper website around here
- Next journal club
- Hamiltonian Descent Methods (https://arxiv.org/abs/1809.05042)
- Semi-implicit variational inference (https://arxiv.org/abs/1805.11183)
- Next journal club
- Hamiltonian Descent Methods — seems too heavy
- Semi-implicit variational inference — softly planned in 2 weeks
- NumFOCUS summit debrief
- Mo money mo problems
- Should roadmaps have dates?
- Misplaced expectations vs. estimates to work towards
- Start doing quick review at lab meeting
- PyMC4 roadmap should exist? (punt to later)
- Infrastructure desires
- Paying for CI or moving to Azure pipelines would save a lot of build time, but take upfront investment. Perhaps we can advertise (on the roadmap?!) that this is a task we’d like help with, and could support with a subscription to some CI service.
- Upgrade slack? or some other real-time discussion method?
- From Eric Schles at microsoft: “I meant to ask if you wanted funding things too? I know some folks in boston that could be helpful. Feel free to follow up over email - ericschles@gmail.com”
- In person meeting for PyMC4
- Dustin not with TfP team, but still at google.
- going to try to develop edward2 as an independent project with more control
- tfp team is more focused on internal use - distributions, for example. dustin is interested in bayesian inference.
- Sounds more interesting to follow edward2 along?
- Should have more regular meetings with TfP team - can we ask for a roadmap for edward2 and tfp?
- Another approach is to build stuff, using tensorflow probablility as an open source library, and not worrying about co-operating with the developing ecosystem (note: we are not obligated to work with google, nor does anyone work for google)
- New documentation
- 99 designs has been good in the past - can get logos, css, color stuff
- don’t want it to look like a template
- squarespace etc might be cool.
- Stan has merch on theirs!
- NumFOCUS summit
- AWS resource needs (infrastructure budget)
- NumFOCUS announcement
- First face-to-face development meeting planning
- Arviz update and PyMC3 plotting deprecation road map
- PyMC4 development updates
- Type annotations
- See https://github.com/pymc-devs/pymc3/pull/3181, the PR author rpgoldman@ contacted me some time ago on discourse about adding type checking to pymc3. He has a WIP branch (he said: “Note that I’m not putting type hints into pymc3 source code right now. Instead, I am providing type stubs so that programmers who call into PyMC3 can have their code type-checked.”) https://github.com/rpgoldman/pymc3/commits/type-stubs)
- Next journal club
AWS needs:
- running notebook suite
- AWS specific or not?
- Enterprise TravisCI
- Benchmarking (add to airspeed velocity suite to directly compare GPU vs CPU on some tasks?)
Face to Face meeting:
- Firstish week of December
- 1 day of “developer sprint”
- 1/2 day of community workshop
- 1/2 day of conference style setting
- “PyMC Developer Summit”, perhaps
- Second in spring in North America (probably Mtn View, maybe Boston)
Release plan
- 3.6 release that is Py3-only and supports Arviz plotting
Type Annotations
- Start in 2019
- Helps new developers get into the code base
- Also helps users who are using the right tooling use pymc3
- Go iteratively, not all at once
Journal Club
- Let’s do a poll to not leave someone alone again 😞
- GSoC Project Updates (as availability permits)
- Agustina
- Sharan
- Bill
- NumFOCUS summit
- ArviZ update
- Google TFP meeting debrief
- First face-to-face development meeting planning (what about vacations next year?🙂)
- Next journal club
- GSoC updates:
a. Bill: cleaning up some (5!) PRs, fixing test suite to be more useful. CO2 example updated with input from climate scientist
b. Agustina: SMC-ABC is working, but brittle/hacky. Could be because SMC is changing underneath (PR is out now). https://agustinaarroyuelo.github.io/jekyll/update/2018/08/08/final_evaluation.html
c. Sharan: PyMC4 meeting happened,
defun
could be very useful, and they probably understand our problem. https://github.com/sharanry/gsoc18 - Google TFP meeting debrief
Engaged, willing to help
Still interested in eager vs. graph mode
- should we support both?
- what do they prefer?
- NUTS is only in eager mode, it seems
- If eager mode, what are the benefits over PyTorch
- Can we do NUTS in graph mode?
High overhead for calling, e.g.
logp
from python - “should only call it twice”
- Can we do NUTS in graph mode?
High overhead for calling, e.g.
- NumFOCUS summit
- End of September: either Austin or Colin might go.
- Chris can forward email
- ArviZ update
- Osvaldo is using it in a course this semester
- Will be presentation in probprog on arviz
- xarray might turn into a backend for pymc3
- should deprecate plotting in a 3.5.x release
- Face to Face dev planning
- Start planning on the fall - either mountain view or europe
1. Europe: Amsterdam or London might have hosts
- Visa difficulties: amsterdam < london << USA
- Will also have remote call in b. Vanderbilt post-doc c. Vacation meeting in Europe
- Journal club
- 17 august to talk about variance networks (Maxim)
- GSoC Project Updates (as availability permits)
- Agustina
- Sharan
- Bill
- 3.4.2 release
- PyMC4 API next steps a. TensorFlow backend updates
- ArviZ update
- Next journal club
- PyMC4: Prepare for another Tensorflow Probability meeting
- clean up api proposal
- share blog post with api example
- ask graph question (every time the interceptor is used, the tensorflow graph gets larger)
- submit PR for half cauchy?
- make sure at least sharan and ferrine can be on call
- Journal club:
- July 20?
- Yes, But Did it Work? Yao, Vehtari, Simpson, Gelman
- Release in the next ~week
- GSoC Project Updates
- Agustina
- Sharan
- Bill
- Google collaboration updates
- ArviZ
- xarray?
- Docathon
- Next journal club
- PyMC4 TensorFlow backend
- Updates from TensorFlow team
- Questions for TensorFlow team
- Project roadmap (+PyMC Timeline )
- Travis CI housekeeping (someone just needs to do this eventually)
- GSOC 2018
- Project statuses
- Version 3.5 planning
*
Ordered
,draw_values
,sample_prior
,ArviZ
(?) - Next journal club
- Consensus on TF this week
- Google (Rif) very keen, some funding for face-to-face travel
- 1 North America (Mountain View?), 1 Europe
- 1 Fall, 1 Spring
- Get together roadmap
- We should inquire about NUTS
- Money for postdoc from Google? Plus protected time?
- Should ask some of Maxim’s questions from other doc
- Google (Rif) very keen, some funding for face-to-face travel
- Outside communication w.r.t. Tensorflow
- Public announcement that we’re using Tensorflow?
- Public announcement on Theano support?
- Two roadmaps: PyMC3 is production ready, will be supported and developed for X years. PyMC4 highly experimental and exciting
- One blog post
- Chris will click a button for Travis. Eventually.
- New repo for pymc4 when needed, experimental code in pymc4-prototypes
- delete pymc4-prototypes when it is no longer useful?
-
Ordered
done! - Rerun examples — add
check_model_point
,ArviZ
, - Deprecate plotting if arviz is working out
- Journal club — maybe Augustina can present on ABC? Two weeks from now?
- Plan a new release
- Remaining issues/PRs (Sample prior PR and Ordered transform PR could wait)
- Update docs
- PyMC4 backend updates
- Discussion of call with TensorFlow team
- Updates on Theano support
- GSOC 2018
- Candidate selection
- Project roadmaps
- Hackathon
- MAN AHL is organizing a Hackathon on April 21-22 (One of the organizer Slavi contacted us Junpeng, Thomas, Adrian, Peadar). I had a chat with him and said we can provide some online support (I cannot be there so are Thomas and Peadar, Adrian you interested?). Anybody has suggestion for good topics (docs/bugfixs/features)?
- Next journal club
- Rerun thomas’ ADVI documentation/blog
- Like tensorflow - adrian close to working mxnet example
- colin will put together a timeline with new backend, theano, versions, python3, etc.
- timeseries work isn’t a clear enough proposal. would be great to have timeseries support, though
- someone else (adrian/colin?) should sign up to mentor GSOC backend project
- PyMC4 backend updates
- Theano is dead; long live Theano!
- WIP PR that need more discussion
- logp-caching in Metropolis - might not be correct to do afterall, especially when compound step is involved
- sample_prior (1pr or 2?) - to be discuss in slack
- GSOC 2018
- Candidates - Osvaldo has 1 for sure, couple more have contacted us. For now we wait until we know how many slot we have
- PyMC3 Symposium/Meetup
- date to be decide around April (next meeting)
- Next docathon: schedule
- instead of a docathon, we invest some time to draft the two road maps instead over a period of time.
- Next journal club
- discuss in slack
- Need to develop a project roadmap for both PyMC3 and PyMC4
- PyMC4 backend updates
- Pyro
- Tensorflow
- GSOC 2018
- Next docathon: schedule
- Next journal club
Pyro backend:
- Junpeng’s prototype looks really good!
- Tensorflow data in/out seems slow
documentation:
- Automate notebooks
- “Getting Started” is a little academic
- “Quickstart” is not that quick
Other:
- deprecate findMap?
- don’t initialize NUTS with a point (Hessian computation is slow)
- 3.3 release
- Recent Travis build problem: TODO Ask Travis about failures, does money fix the problem?
- Theano 1.0
- PyMC4 backend updates
- Next docathon: schedule
- Next journal club: Colin on https://arxiv.org/pdf/1711.09268.pdf
- Invite Michael Osthege to slack
- New backend experiments
- Updates on exploration of MXNet/TensorFlow/PyTorch
- CSG and DEMC PRs
- GSoC ideas
- Next Journal club
- Patreon update
- Debriefing from NumFOCUS conference call
- Ideas for support targets
- Updates for Patreon page
- Next Docathon
- Next Journal club
- New backend experiments
- Updates on exploration of MXNet/TensorFlow/PyTorch
- Release 3.2
- Feature freeze (already done?)
- Making matplotlib optional (nearly ready?)
- Put out release candidate, and then full release
- Release notes: Chris + Thomas
- Patreon update
- possible to support our slack channel upgrade
- Release notes
- Add them to every medium to large sized pull-request
- https://github.com/blog/2111-issue-and-pull-request-templates
- Docathon
- Poll result of docathon
- Travis auto-build update
- Template for doc?
- Journal club
- Friday, same time as lab meeting 10/6
- New backend
- Static graph (Tensorflow)
- Dynamic graph (PyTorch, MXNet, Chainer)
- Others
Create RC today
- need what’s new section
New Backend brain storm (RIP Theano 😞 https://groups.google.com/forum/m/#!topic/theano-dev/gCBAhE3Sb_8)
-
Tensorflow(Backed by Google) Pros:
- Strong support, large community
- Tensorboard for visualization
- parallel within graph execution that makes things fast
- Convenient deployment Cons:
- Slow?
- Too many similar package already (e.g., Edward, ZhuSuan, Greta in R)
- Edward will be somewhat official as part of it? Compiler for graphs in development: XLA. It is supposed to support memory optimisations and Op fusion when it's done. Custom Ops can be written in python, but C++ ops require us to play games with that strange build system (written in java, and probably not going to end up in any linux packages) Old experiment for using it as backend: https://gist.github.com/aseyboldt/1054cf6d6b871041914c601c1efa11ae
-
PyTorch(Backed by Facebook) Pros:
- Dynamic graph
- easy GPU acceleration
- seems to be growing fast and well supported.
- use of standard debugging tools
- Numpy-like syntax (probably easier for end-users)
- higher-order derivatives Cons:
- Not well suited for tricks we did in pymc3 (array_ordering)
If we go for pytorch we should focus on variational inference for deep learning, it’s rather well suited. MCMC can be hard to implement because of pytorch design. If we ever wanted to do RL, seems like a dynamic graph would be the way to go. Would require us to change how models are defined, because we rely on a static graph right now. Not sure automatic graph optimisations are possible. There seems to be some work towards a tracing jit, but that seems to only eliminate runtime overhead right now (if I understand it correctly).
-
MXNet(Backed by Apache and Amazon) Static and dynamic graphs. Does optimisations for static graphs using nnvm (which?) Custom ops in python (similar to theano). C++ ops should be reasonably straight forward as well. Good control over memory locations of variables. Good support for automatic parallelisation (I heard, haven't checked myself) Small user group.
(JLao): I think it is important to have fast compile times (which Theano actually improved over the years I felt), so a nn library with dynamic graph would probably faster at that end.
General thoughts are that we already have a good tool for small models and MCMC, we can either try to recreate and extend it somewhere else or focus on things we can’t do with pymc3 (deep learning with large number of parameters). New backend should be deployable (theano is not).
- Documentation
- Schedule next Docathon
- Reviewers of written docs
- poll discourse for help (split time between new docs and review)
- Decided on rst-format for new docs
- PyMC3 gallery
- Colin is going to try a thing in ~2wks if no one else does
- Google Summer of Code updates
- Bill
- Maxim 1. Upcoming Grouped VI API https://github.com/pymc-devs/pymc3/pull/2416 2. Hierarchical VI thoughts https://arxiv.org/abs/1511.02386
- Initialization defaults
- Courses and tutorials
- DataCamp
- NIPS 2018
- PyData NY
- ICML Papers
- There are a lot of good papers(https://sites.google.com/view/implicitmodels/accepted-papers):
- Gradient Estimators for Implicit Models
- Implicit Variational Inference with Kernel Density Ratio Fitting
Potential docathon times
- weekends
- Late-august/early september
New users for doc review
reST vs notebooks for documentation colin will have a look at a gallery Continuum for benchmarking models
GP
- restructuring of GP codebase
- accomodating several uses for a GP object
VI
- grouped VI for separating different parts of the model with different complexity
Inirialization
- general feeling that ADVI alone is not optimal for initialization
- mass matrix initialization might be the way to go
- Ongoing Python 2 support
- GSoC updates
- Travel grants
- VI update from Maxime
- RMHMC update from Bhargav
- GP update from Bill: https://bwengals.github.io/
- Setting up of discourse.pymc3.org
- we can set up the discourse channel now - however we need to decide the name (future proofing basically): discourse.pymc3.com/discourse.pymc.com/discourse.pymc-dev.com
- 3.1 release
- Example notebooks and scripts
- Docathon
- Splitting API docs and modeling how-tos
- Open source PyMC3 book? (long term)
- Add sphinx gallery
- NumFOCUS Summit
- Add ourselves to python3statement
- Add warning to PyMC 3.1
- Commit to sunsetting Py2 support by 2020?
- Discourse to be set up as a sub domain of pymc.io
- Implement immediate feature freeze prior to 3.1 release
- The major blocker: ensuring notebooks run in current master
- We will "divide and conquer" to get all the notebooks running as soon as we can
- There was support for continuing the maintenance of notebooks within the repository.
- Recognition that while we would prefer to have notebooks pass unit testing with every commit, this is not realistic
- Current approach: occasionally sweep through and manually test notebooks, with more deliberate testing prior to release
- Colin has a nice tool for testing notebooks
- Strong support for docathon.
- Create doodle poll for a day (maybe a polling tool within Slack?)
- Create an issue with potential doc targets
- Create feature branch for docathon output
- Documentation
- Docathon
- Documentation leader: Austin Rochford
- Separate documentation/book repository
- Separate API vs. math/stats documentation
- API changes
- New VI interface
- Spec out Results object functionality
- GPU support
- Removal of sqlite and text backend (Xray)
- develop a serialization API
- explore xray as solution
- Better defaults for sample() (draws=1000, tune=500, njobs=4)
- Adrian proposal: New results object returned by sample() that contains trace, but also Rhats, ELBOs, convergence report
- Release of 3.1
- Review release checklist
- Deprecation warnings
- Outreach and educationm
- Conference talks
- Tutorials
- Short courses — share materials?
- Google Summer of Code
- Maxim: Taku, Thomas
- Bill Engels: Chris, Austin
- Bhargav Srinivasa: John, Colin
- Can we ask NumFOCUS to help funded expanded TravisCI testing?
- We need a review of our testing runs
- can some tests be dropped or run more infrequently?
- Separate PP education from PyMC functionality
- Need a simple common return object/structure across all inference methods
- dict?
- create a spec for return object
- float32 throughout PyMC3 coming in GSoC
- size/shape handling could use refactoring
- Place to aggregate resources and tutorials
- PyMC3 course at Harvard: Check it out!