Join David Tussey, formerly of the NYC Department of Information Technology and Telecommunications, and Dr. Jan Yun, a professor at UConn for a presentation that explores data cleansing, a necessary first step in any data science or analysis effort.
During the event we will explore six areas of data cleanliness: structural issues, missing/blank data, validating data types, identifying invalid values, identifying logical inconsistencies, and identifying redundant data elements. We’ll use 311 service request data from 2022-2023 that was analyzed using custom software written in R. Even with only two calendar years, this is stillĀ approximately 6.4 million records!
Our goal is for attendees to come away with an understanding of real-world data cleanliness issues and some approaches to account for them.
This presentation is targeted to any analyst engaged in data science efforts. It is intended to be illustrative of the kinds of challenges faced by data scientists when analyzing large datasets. After there presentation there ample opportunity for Q&A.