A project to publish website analytics for the City of Santa Monica.
Other government agencies who have reused this project for their analytics dashboard:
- U.S. Federal Government
- Tennessee Department of Environment & Conservation
- City of Boulder
- City of Philadelphia
- City of Sacramento
These notes represent an evolving effort. Create an issue or send us a pull request if you have questions or suggestions about anything outlined below.
Ths app uses Jekyll to build the site, and Sass, Bourbon, and Neat for CSS.
Install them all:
bundle install
To run locally:
bundle exec jekyll serve --watch --config _config.yml,_configdev.yml
The development settings assume data is available at /fake-data
. You can change this in _configdev.yml
.
analytics-reporter
is the code that powers the dashboard by pulling data from Google Analytics.
The report definitions are specified as JSON objects. In this repository, individual report definitions are stored in the _reports
folder, and aggregated into a single file reports/csm.json
by using Jekyll's build process and a custom plugin for JSONifying Jekyll frontmatter.
An individual report definition looks like:
{
"name": "report-name",
"frequency": "daily",
"query": {
"dimensions": [ "ga:pagePath", "ga:pageTitle" ],
"metrics": [ "ga:sessions" ],
"start-date": "yesterday",
"end-date": "today",
"sort": "-ga:sessions",
"max-results": "20"
},
"meta": {
"name": "Dummy Report",
"description": "Sample report definition to show the structure of a report"
}
}
name
- the name of the report; this will be the resulting file name for the reportfrequency
- corresponds to the--frequency
command line option. This option does not automagically create cron jobs; separate cron jobs or WebJobs are required.query
dimensions
&metrics
- valid values can be found in the Google Analytics Docsstart-date
orend-date
- time period for the reporttoday
yesterday
7daysAgo
30daysAgo
90daysAgo
sort
- valid values can be found in the Google Analytics Docsmax-results
- maximum results to return for this report
18F's original analytics dashboard was written with a Linux environment and 18F pages in mind. For this project, we've ported 18F's work into an Azure Web App.
This fork has both the Jekyll website and node app (analytics-reporter
) deployed to a single Azure Web App so that everything remains on the same domain. We use TravisCI to kick off Jekyll builds and related pre-deployment tasks, and publish the end result to Azure.
Travis can automatically deploy to Azure after a successful build using the following environment variables within Travis:
AZURE_WA_SITE
- the name of the Azure Web AppAZURE_WA_USERNAME
- the git/deployment username, configured in the Azure Web App settingsAZURE_WA_PASSWORD
- The password of the above user, also configured in the Azure Web App settings- Heads up! Travis sends this password in the remote URL (i.e.
https://user:password@domain.com/repo.git
), so be careful with special characters in your passwords (e.g. spaces don't work and will cause a cryptic error to be thrown).
- Heads up! Travis sends this password in the remote URL (i.e.
Here's what our .travis.yml
file looks like.
We are calling two separate scripts for Travis to execute. The first script is the .travis/build.sh
script which actually builds the Jekyll website (into the _site
folder as per Jekyll convention). In addition, it creates a Python virtual environment that is committed and deployed to Azure with Python 3.4 and our dependencies listed in requirements.txt
; these dependencies are only committed to Azure so we don't flood our own repository with dependencies that can be automatically fetched.
In our case, we have a "fake-data" folder for development so we remove that before we build the final website.
The second script (.travis/pre-deploy.sh
) is called before we deploy everything to Azure. Content is deployed to Azure via git, meaning .gitignore
is respected and the compiled _site
wouldn't be deployed to Azure.
To fix this, the pre-deploy script gives Travis an identity for git, forcefully adds the _site
directory, and amends the commit we were just building:
git config user.name "travis-ci"
git config user.email "travis@localhost"
git add -f _site/
git commit --amend --no-edit
By amending the commit, the message and author stay intact when viewed from the Azure portal.
18F specifies required environment variables in .env
files. Instead of placing all of them in .env
files and worry about sensitive information or repetition, we store them as Azure Application Settings.
We also opted to make use of Azure WebJobs for background tasks (such as polling Google Analytics and aggregating the results). 18F's cron
jobs were easily ported over to Azure's syntax.
WebJobs are placed in App_Data/jobs/<triggered|continuous>
and each WebJob belongs in its own folder (the name of the folder is arbitrary). By adding a run.sh
(or run.py
) and a settings.job
file containing a cron
expression, the run
file is executed based on the cron
schedule.
All of the scripts available have a custom $HOME
which is set to: D:\home\site\wwwroot
(the default for Azure Web Apps). All of the paths defined in Azure Application Settings or environment variables should be relative to the custom $HOME
directory; do not use absolute paths.
Kudu is the Azure build/deploy system tied into git, which is used to move the (compiled) site files into the website root ($HOME
) after a successful Travis build. Our Kudu configuration file looks like:
[config]
DEPLOYMENT_SOURCE = _site
COMMAND = bash .kudu/deploy.sh
This WebJob executes a bash script that reads every .env
file inside of $HOME/envs
and fetches the Google Analytics for each profile. The fetched data is then placed in a subdirectory with the same name as the .env
file, inside of ANALYTICS_DATA_PATH
.
For example, data for
smgov.env
will be placed at:$HOME/data/smgov
.
These Azure Application Settings are required for interaction with the Google Analytics API (via analytics-reporter
); these should be relative to $HOME
(see above):
-
ANALYTICS_REPORT_EMAIL
- The email used for your Google developer account; this account is automatically generated for you by Google. This account should have access to the appropriate profiles in Google Analytics.e.g.
example@analytics.iam.gserviceaccount.com
it should be noted that this email account must have Collaborate, Read & Analyze permissions on the Google Analytics profile(s)/view(s) being queried.
-
ANALYTICS_REPORTS_PATH
- The location of the JSON file that contains all of your reports.e.g.
reports/your-reports.json
-
ANALYTICS_KEY
- Copy theprivate_key
value from the Google Analytics json file. Keep all of the\n
s in there and do not expand it; the bash scripts will take care of the expansion; this should be one really long line. -
ANALYTICS_DATA_PATH
- The folder where all of the Google Analytics data will be stored.e.g.
data
Since we do not have "One Analytics Account to Rule Them All" like the DAP, we are aggregating individual websites together. A scheduled WebJob (using Python) goes through all of the agency directories, $HOME/$ANALYTICS_DATA_PATH/<agency>
and aggregates all of the data together and outputs them to, $HOME/$ANALYTICS_DATA_PATH
.
Our analytics dashboard then points to the ANALYTICS_DATA_PATH
folder instead of an individual agency; individual agency data is still available at the subdirectory level.
Because the data files powering the dashboard are constantly being overwritten, we have another Python WebJob that takes the daily analytics reports and snapshots them in to our open data portal.
These Azure Application Settings are required for publishing data to the Socrata portal (via soda-py
):
SOCRATA_HOST
- the Socrata host (e.g. data.smgov.net)SOCRATA_APPTOKEN
- reduces throttling with API calls with an App TokenSOCRATA_USER
&SOCRATA_PASS
- for basic HTTP authenticationSOCRATA_RESOURCEID
- the 4x4 ID of the dataset
This project is in the worldwide public domain. As stated in CONTRIBUTING:
This project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication.
All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.