Dealing with Data

Sometimes collecting, cleaning, and organizing data is the most difficult part of a project. I have substantial experience in all of these areas and would love to help you efficiently carry out these tasks for your project!

 

Data Cleaning

  • I strongly believe that decisions made to transform ‘raw’ data into ‘clean’ data should be transparent, reproducible, and easy to modify. To do this effectively, I strongly recommend the use of R to implement data cleaning. I provide reproducible R code with every project so the details of the data cleaning can be understood, reproduced, and tweaked if necessary.
  • I also recommend R because it has the capacity to handle much bigger datasets than is feasible in programs like Excel or STATA. R is especially useful for cleaning messy, text-intensive datasets.
  • My background in ‘text as data’ has helped me become an expert at converting messy, unstructured data into organized, structured, and useful datasets using R.

 

Web Scraping

  • I have substantial experience scraping internet sources to collect and organize data. In past projects, I have scraped resources such as State Department reports, diplomatic cables, and basketball statistics.
  • I am very familiar with the usage of APIs to retrieve large amounts of data. In previous projects, I have deployed APIs to access international event data and historical embassy data. I have also used APIs to utilize online machine-learned tools for automated content analysis.
  • I have created R scripts using RSelenium to access online data sources, such as vital statistics data sources, from the command line, which helps reduce the need to spend hours making manual queries.
  • I have used R to geocode and create a map of addresses indicating the service population for a social service program in the Memphis area.