Skip to content

Latest commit

 

History

History
50 lines (34 loc) · 3.88 KB

README.md

File metadata and controls

50 lines (34 loc) · 3.88 KB

ice-air

A project analyzing data from Immigration and Customs Enforcement's Alien Repatriation and Transfer System (ARTS) for the report Hidden in Plain Sight: ICE Air and the Machinery of Mass Deportation by the University of Washington Center for Human Rights.

Report contents

Repository description

This repo uses Git LFS.

This project uses "Principled Data Processing" techniques and tools developed by @HRDAG; see for example "The Task Is A Quantum of Workflow."

Tasks in this project are designed to be executed using the recursive make tool makr.

File structure

Projects

Datasets and high level resources:

  • installment1/ - Dataset released to UWCHR via FOIA in December 2018. Contains ICE Air ARTS passenger data for 2010-10-01 through 2018-12-05.
  • installment2/ - Dataset released to UWCHR via FOIA in August 2019. Contains ICE Air ARTS passenger and mission data for 2010-10-01 through 2019-05-03.
  • installment3/ - Dataset released to UWCHR via FOIA in November 2020. Contains ICE Air ARTS passenger and mission data for 2018-10-01 through 2020-05-08.
  • compare/ - Project for comparing contents of ICE Air ARTS installments 1 and 2.
  • kykm/ - Small task for dataset of community observations of ICE Air flights at Yakima Air Terminal (KYMK).
  • radarbox/ - Task for data on Swift Air (SWQ) and World Atlatic (WAL) flights using commercial flight tracker records from Radarbox.com
  • share/ - Various hand-written files and resources shared by multiple other tasks.
  • docs/ - HTML documentation published at https://uwchr.github.io/ice-air/

Tasks

Project-level tasks, in order of workflow (not all tasks will be present in all projects):

  • import/ - Convenience task for importing ICE Air ARTS dataset. Input files in import/input/ have been previously renamed to remove spaces in filenames, converted to CSV with pipe separator (|), and compressed using Gzip. Input files are symlinked to import/output/ and then to input/ of downstream task for transformation and analysis.
    • Original Excel files as released by ICE can be found on UWCHR's Google Drive. These raw files are excluded from the repository due to their size.
  • optimize/ - Determines optimal Python/Pandas data types for each field in the original dataset and outputs this as a YAML dictionary used and modified in downstream tasks.
  • clean/ - Standardizes selected field values in clean/hand/clean.yaml; fixes missing and bad airport data; removes duplicate passenger records. Outputs full ICE Air ARTS datasets, after cleaning, as Gzipped CSV files.
  • analyze/ - Contains various exploratory Jupyter notebooks and R Markdown. These notebooks and their outputs are exploratory and do not necessarily reflect the findings of UWCHR's report.
    • analyze/output/ contains various versions of figures and data subsets; currently none of these are used in any downstream tasks.
  • write/ - Writes out final reports to HTML using Pweave.
    • All analysis, figure generation, etc. takes place in write/src/.

To-do

  • Update data appendix for installment2/
  • Convert installment1/ and installment2/ tasks optimize/ and clean/ to output feather format files