Loading Events

« All Events

Virtual Event Virtual Event
  • This event has passed.

Is your data really that dirty? An in-depth look at NYC 311 data to determine its “cleanliness”

March 18 @ 2:00 pm - 3:00 pm

Virtual Event Virtual Event
Free
A graphic image with a yellow background and 3 icons indicating stacks of data, with a hand dusting them off with a broom and the words 'data cleansing'

Join David Tussey, formerly of the NYC Department of Information Technology and Telecommunications, and Dr. Jan Yun, a professor at UConn for a presentation that explores data cleansing, a necessary first step in any data science or analysis effort.

During the event we will explore six areas of data cleanliness: structural issues, missing/blank data, validating data types, identifying invalid values, identifying logical inconsistencies, and identifying redundant data elements. We’ll use 311 service request data from 2022-2023 that was analyzed using custom software written in R. Even with only two calendar years, this is stillĀ  approximately 6.4 million records!

Our goal is for attendees to come away with an understanding of real-world data cleanliness issues and some approaches to account for them.

This presentation is targeted to any analyst engaged in data science efforts. It is intended to be illustrative of the kinds of challenges faced by data scientists when analyzing large datasets. After there presentation there ample opportunity for Q&A.

 

Details

Date:
March 18
Time:
2:00 pm - 3:00 pm
Cost:
Free
Event Category:
Event Tags:
, ,

Other

Pre-requisites
It is helpful, but not required, to have a basic understanding of descriptive statistics such as mean, median, and standard deviation. Beginners and data science novices are welcome!
Public Dataset(s)
311 Service Requests from 2010 to Present
Event Materials
https://drive.google.com/drive/folders/1l-pTyUtAU6RolstIy2-bWHXFqbp_8kK6?usp=sharing