Correctional Facilities Were an Early and Lasting Vector for COVID-19 Behind and Beyond Bars

Please check out this project on Medium!

Below is the accompanying explainer behind the narrative:

My project, Correctional Facilities Were an Early and Lasting Vector for COVID-19 Behind and Beyond Bars, is the capstone to my independent study with Politico’s Interactive Team leader, Andrew McGill. I was able to apply everything I learned through this project, and then some. The main goal of the project was to convey that COVID-19 outbreaks in prison have far-reaching ramifications for the general public.

Using Google’s advanced search syntax, I found an interesting report done by the Prison Policy Initiative (PPI) that estimated the number of COVID-19 cases linked to correctional facilities. The data was unfortunately not in a nice, tidy spreadsheet, so I had to bring the data into excel myself. Then, I had to write a python script to perform a double join on the county and state columns to get a reliable key, the GEOID, a concatenation of the state and county FIPS codes per record. Once I had the GEOID, I created another python script to join the data to a county shapefile.

After that, I got to work in ArcGIS Pro and then Adobe Illustrator to craft a compelling map. Above all, I enjoyed making the legend which is also a histogram that shows the breakdown of counties per class. Thank you to Tim Meko and your old “Initial Income Eligibility Thresholds” map for the inspiration! Also, Tanya Andersen and Alicia Iverson were a huge help as I worked through the map, so thank you for your feedback and ideas!

The other major dataset I wrangled with was a large database of the New York Times COVID-19 cases data. I brought this data into SQLite and performed some queries to get a data table that had summed up all the cases per county from May 1-August 1. Next, I normalized the cases to create a caseload value per 100,000 residents to create a metric that is compatible with PPI’s estimated cases linked to incarceration. With these two variables, I created a split bar chart comparing the total cases a county experienced with the number estimated to be linked to incarceration according to PPI.

Besides these two graphics, I did a variety of data wrangling using pivot tables and simple excel filters to find out interesting values interspersed throughout the narrative. For instance, I calculated how many times more cases the top 1% of rural counties had compared to the bottom 99%.

The most rewarding part of the project was the writing. Getting to thread all the key points together into a cohesive narrative was a fun challenge. Likewise, I interviewed two experts, Dr. Crystal Watson, and Dr. Lauren Brinkley-Rubinstein, who gave my story emotion and character. I am extremely grateful for them taking time out of their busy lives to talk with me.

Finally, I wish to give a huge thanks to my teacher, Andrew McGill. I learned more about data journalism from him in our one-on-one meetings and his course than I had in two years doing my own reading on the emerging field.

Created using SQLite, ArcGIS, Illustrator