Developed a MongoDB/Python based pipeline for automated political conflict event extraction from news text using natural language processing
Contributed to team effort resulting in publication of several million political conflict events extracted from over 14 million news articles spanning 1945 – 2015 generated from corpora composed of articles from BBC Monitoring’s Summary of World Broadcasts, CIA’s Foreign Broadcast Information Service, and the New York Times
Textual Location Extraction and Focus Identification
Created software for the extraction of locations from within textual resources and extrapolating the “focus” country of a news article
Improved accuracy over industry geolocation standard by up to 56.3% with initial investigation suggesting a significant increase in performance regarding comparable tools developed by labs at Penn State and MIT
Relevant Political Actor Extraction
Developed intrastate relationship extraction software to extract relevant political actors from corpora surrounding the onset of hostilities
International Collaboration
Participated in the multinational conference, Collaborative Research on Extreme Scale Text Analytics (CRESTA), at the Cline Center for Advanced Social Research, furthering research regarding international political conflict
Produced large volumes of data utilized by research teams to further computational political science research
Software Developer Effort Metric
Developed a methodology for defining the ‘effort required for upkeep and the addition of new components’ into an existing software pipeline