Data Wrangling

A beginner’s guide

Anne Sophie Gill https://www.skemagloballab.io/gillAnneSophie.html (SKEMA Global Lab in AI)https://skemagloballab.io , Thierry Warin https://www.nuance-r.com/principalInvestigator.html (SKEMA Business School (Raleigh, NC))https://www.skemagloballab.io
03-10-2020

Source: R for Data Science (Hadley Wickham & Garrett Grolemund)

Without clean and robust data, there is no Data Science

“Data you find “in the wild” will rarely be in a format necessary for analysis, and you will need to manipulate it before exploring the questions you are interested in. This may take more time than doing the analysis itself!" (Nabeel Siddiqui, 2017)

What is Data Wrangling

Data wrangling is the process of cleaning and unifying messy and complex data sets for easy access and analysis.

The importance of Data Wrangling

With the amount of data and data sources rapidly growing and expanding, it is essential for large amounts of available data to be organized for analysis.


This blog post will show you different ways of manipulating data within the SKEMA Quantum Studio (Warin 2019)


How to clean and manipulate data with the help of the dplyr and tidyr packages


SPREAD

The spread function allows you to transform your data from a long format to a wide format.