Skip to content

Commit

Permalink
Add some utilities for pulling repo data (#1691)
Browse files Browse the repository at this point in the history
The README says it best:

> This directory holds some scripts used to fetch repository data from
Github
and extract it in various forms. It's useful for generating a list of
PRs
recently merged, or a list of who has contributed, etc.
> 
> Fetching the data requires Python and some Python libraries.

---------

Co-authored-by: Michael Crismali <michael@crismali.com>
  • Loading branch information
jim and crismali authored Oct 10, 2024
1 parent 2c3bce4 commit 7b29c68
Show file tree
Hide file tree
Showing 7 changed files with 74 additions and 0 deletions.
1 change: 1 addition & 0 deletions repo_analysis/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
github.db
6 changes: 6 additions & 0 deletions repo_analysis/Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
source "https://rubygems.org"
git_source(:github) { |repo| "https://github.com/#{repo}.git" }

ruby "3.1.6"

gem "sqlite3"
16 changes: 16 additions & 0 deletions repo_analysis/Gemfile.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
GEM
remote: https://rubygems.org/
specs:
sqlite3 (2.0.3-arm64-darwin)

PLATFORMS
arm64-darwin-23

DEPENDENCIES
sqlite3

RUBY VERSION
ruby 3.1.6p260

BUNDLED WITH
2.4.15
15 changes: 15 additions & 0 deletions repo_analysis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Repo Analysis

This directory holds some scripts used to fetch repository data from Github
and extract it in various forms. It's useful for generating a list of PRs
recently merges, or a list of who has contributed, etc.

Fetching the data requires Python and some Python libraries.

To use:

1. Install the dependencies using `pip3 install -r requirements.in'.
2. Setup a github access token [and configure github-to-sqlite to use it](https://github.com/dogsheep/github-to-sqlite?tab=readme-ov-file#authentication).
3. Run `sync.sh` to fetch the data.
4. Run `datasette github.db` to investigate the data in a nice web-based UI (optional but sometimes handy).
5. Edit/run `rake` to extract the data in a specific format.
29 changes: 29 additions & 0 deletions repo_analysis/Rakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
require "csv"
require "sqlite3"

db = SQLite3::Database.new("github.db")

desc "Print a CSV list of PRs (either still open or closed in the last two months)"
task :recent_prs do
query = <<~SQL
SELECT pull_requests.*, users.name
FROM pull_requests
LEFT JOIN users on users.id == pull_requests.user
WHERE (closed_at IS NULL OR closed_at >= "2024-06-01") AND user != 49699333
ORDER BY closed_at DESC NULLS FIRST
SQL

# pull_requests schema
# id, node_id, number, state, locked, title, user, body, created_at, updated_at, closed_at, merged_at, merge_commit_sha, assignee, milestone, draft, head, base, author_association, auto_merge, repo, url, merged_by

csv = CSV.new($stdout, col_sep: "\t")
csv << %w[pr author merged_at link]
db.execute(query) do |row|
csv << [
row[5], # title
row[23], # author
row[11], # state
"https://github.com/chicago-tool-library/circulate/pull/#{row[2]}"
]
end
end
2 changes: 2 additions & 0 deletions repo_analysis/requirements.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
github-to-sqlite
datasette
5 changes: 5 additions & 0 deletions repo_analysis/sync.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/sh
set -e

github-to-sqlite pull-requests github.db chicago-tool-library/circulate
github-to-sqlite contributors github.db chicago-tool-library/circulate

0 comments on commit 7b29c68

Please sign in to comment.