Finding the Mole

Wie Is De Mol ("Who is the Mole") is a Dutch television show that has seen consistent popularity. It is also being released in Belgium and there is an American version on Netflix. In this repository, I explore methods to predict who the Mole will be.

Contribution

I would love to receive your contributions to this learning project. You can contribute in multiple ways:

Help me with my learning goals by sharing what you know on the topic or if you also have some questions you want answers to. Below, I explain what I already know so please read that first. You can share by opening an issue.
Raise pull requests. Make sure to follow community standards with good descriptions, tests, and follow the branch naming conventions.

Setup

Install Poetry locally
```
pip install poetry
```
It is recommended to install the virtual environment in the repo itself. This is done automatically through the poetry.toml file.
Navigate to the directory and install the project:
```
poetry install
```
Activate the virtual environment and then activate pre-commit hooks:
```
pre-commit install
```

Run the tests to see if the setup process was successful:

poetry run pytest tests/unit finding_the_mole --doctest-modules

Branch naming conventions

A git branch should follow the following convention: <category>/<description>/dev.

A git branch should start with a category. Pick one of these: feature, fix, hotfix, or test.

feature is for adding, refactoring, or removing a feature.
fix is for fixing a bug.
hotfix is for changing code with a temporary solution and/or without following the usual process (usually because of an emergency).
test is for experimenting outside an issue/ticket.

The description should of course be descriptive; make sure that it reflects what you will be changing.

As such, correct examples are:

feature/add-ci-pipeline/dev
fix/failing-unit-tests-for-new-polars-version/dev
test/trying-out-polars/dev

Inspiration: A Simplified Convention for Naming Branches and Commits in Git.

Learning goals

I have set the following learning goals for myself so far:

I want to get familiar with using Poetry for Python projects.
I want to get familiar with the polars Python library.
I want to get familiar with the deptry Python library.
I want to get familiar with GitHub Actions to set up a CICD process.
I want to get familiar with using Bayesian modelling in Python (inspiration: CodeErik).

I will document my progress on each of these topics, and possibly more topics. I do this because I feel it is important to share my learning path for others who might be looking into the same topics. Moreover, it is easy to forget where you started once your efforts grow. Sometimes, it is good to look back at the basics and revise instead of only building on existing resources.

If you want to help me along this journey, feel free to reach out! :)

Poetry library

Poetry website

I already knew about poetry but my interest recently spiked due to two events. Firstly, my team started to hire new colleagues and many of them used poetry. If many of them did, surely it was worth looking into. Secondly, an indirect colleague of mine joined the company and started to build upon the existing CICD templates. Previously, these templates relied on a structure with requirements.txt files and did not yet support poetry projects. His dedication showed me that there was definitely merit in the library.

I started my learning journey by watching YouTube videos. I noticed that many were relatively old, which makes sense since poetry has been around since February 2018. The one I mainly used is How to Create and Use Virtual Environments in Python With Poetry by ArjanCodes.

Setting up my project

Of course, I had to start with setting up my project with poetry. I noticed that it was easier to start locally and create my project folder with poetry new and then running git init, rather than cloning an already initialized GitHub repo from the website and calling poetry init.

poetry.lock file

Secondly, I was wondering about the poetry.lock file and whether I should commit it. The official website says:

When Poetry has finished installing, it writes all the packages and their exact versions that it downloaded to the poetry.lock file, locking the project to those specific versions. You should commit the poetry.lock file to your project repo so that all people working on the project are locked to the same versions of dependencies.

However, when reviewing the applications that I alluded to earlier, I had some issues specifically pertaining to the lock file. I also read the following comment by Arne on StackOverflow:

My personal experience has been that it isn't necessary to commit lockfiles to VCS. The pyproject.toml file is the reference for correct build instructions, and the lockfile is the reference for a single successful deployment. Now, I don't know what the spec for poetry.lock is, but I had them backfire on me often enough during collaboration with colleagues in ways where only deleting them would fix the problem. [...] I help maintain a number of closed and open source projects, and they all commit lockfiles, partly because I advocated in favor of it. By now I regret that choice, because it occurs quite often that someone's build is not working and the solution is to delete and re-build the lockfile, after which all of us end up having merge conflicts.

Exact requirements

In the end, I saw in the documentation that it was possible to specify exact requirements in the pyproject.toml file. This had my preference, so I removed the caret notation and specified specific versions. The only annoying thing about this is that you have to either:

Look up the version you want to use and run poetry add library@version, or
Run poetry add library, remove the caret, delete the poetry.lock file, and rerun poetry install to create a new lock file.

To be fair, this problem also arises with regular requirements.txt files, so it is not an issue for me.

poetry.toml file

I learned that there is a poetry.toml file which can contain configurations for poetry specifically when it creates a project. This is nice since it removes the personal responsibility to set certain poetry settings, most importantly to create the virtual environment in the project folder.

Polars library

Polars website

Polars grabbed my interest when I saw that it has a very similar syntax to Pandas, a popular library for data analysis and data science, while being so much faster. Although for big data I like using Pyspark, I knew that that would be overkill for this project. As such, I immediately knew that I wanted to try out Polars.

Deptry library

Deptry GitHub repo

I was first alerted to Deptry when a colleague of mine, the creator of the library, gave a presentation on his package at work. I am keen on keeping my repo clean, so this solution to maintain the proper dependencies and clean them up seemed perfect. Although it may not be as relevant or impactful for such a small project, it seems like a very nice tool for larger projects.

GitHub Actions

GitHub Actions website

Before I started this project, I only heard of GitHub Actions through some GitHub templates. Since I wanted to make sure to include CI into this repo, it seemed good to GitHub Actions for this purpose. I already had familiarity with YAML pipelines through Azure DevOps so the learning curve is not that steep.

What I like about it so far is that you can easily take actions developed by other people into your pipeline. Moreover, the triggers for the different workflows can be quite easily configured and also have many options. On the last point, I did run into some issues when I wanted to adapt a workflow with a trigger "create" (run when a new branch is created); namely, I could not manually (re)run that workflow on a new commit. It would be perfect if you can kick off any workflow manually to test it. Perhaps you can and I could not find how. In that case, it is apparently not clear enough how you could.

Overall, I do like how you can easily set up different files that are automatically configured to run on certain triggers in your repo rather than having to activate the pipeline in a separate section.

Bayesian modelling in Python

Inspiration: CodeErik

For more information on my Bayesian model, see finding_the_mole/bayesian_model/README.md.

Other learnings from this project

Pre-commit hooks with public repo access

I am already familiar with pre-commit hooks. At work, we use them to maintain code quality in many different ways. However, we had to download the libraries to use from the internal library repository because our pipelines are not allowed to connect to the internet to download them / checkout to repos (which I do like a lot). In this project though, I am able to do so, so I wanted to try out pre-commit hooks by checking out to repos.

I stumbled upon ruff a few months ago as a replacement for flake8. While implementing it and reading the documentation, I did not expect to see all the rules that you were able to add to the ruff linter. I expected a linter with quite basic checks and that I would have to specify other desired checks through other tools. However, as it turned out, I could specify many different options through ruff and only needed to add some checks from the official pre-commit-hooks to satisfy some checks that ruff did not have.

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
.github		.github
data		data
finding_the_mole		finding_the_mole
output		output
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
codecov.yml		codecov.yml
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Finding the Mole

Contribution

Setup

Branch naming conventions

Learning goals

Poetry library

Setting up my project

poetry.lock file

Exact requirements

poetry.toml file

Polars library

Deptry library

GitHub Actions

Bayesian modelling in Python

Other learnings from this project

Pre-commit hooks with public repo access

About

Contributors 2

Languages

mikeweltevrede/finding-the-mole

Folders and files

Latest commit

History

Repository files navigation

Finding the Mole

Contribution

Setup

Branch naming conventions

Learning goals

Poetry library

Setting up my project

poetry.lock file

Exact requirements

poetry.toml file

Polars library

Deptry library

GitHub Actions

Bayesian modelling in Python

Other learnings from this project

Pre-commit hooks with public repo access

About

Resources

Stars

Watchers

Forks

Contributors 2

Languages