Raiders of the Lost Query: Learning Best Practices for Exploratory Data Analysis in R Programming
Presented by: Pierre DeBois
Indiana Jones had his hat, his whip, and his wits to save the day. But developers and managers need a lot more to program data models.
With so many dataset tools for data science available, managers and developers can create statistical programming models, but are overwhelmed as to how to best explore the dataset. Most professionals conducting data science spend a majority of their time exploring and cleaning data. Databases are increasingly containing semi-structured data, thanks to varied sources such as social media, mobile devices, geolocation, and attributes describing real-world structures. Being able to blend data from a range sources and create useful correlation require some knowledge as to know when to apply exploratory steps effectively.
This brief talk will show how attendees can better plan for speedier analysis of datasets so that developer/manager teams can develop better regression and machine learning models. This session will cover the querying features in popular data repositories (Kaggle, data.world), data exploration techniques using libraries and functions in R Programming, and ideas to systematically communicate with team members on the data exploration process.
The end result is a faster means to establish a better quality dataset, leading to better analysis for regressions, machine learning models, and other data science projects.