Guest blog by George Psistakis.
Working with data sets means that you have a way to get them first. After you get them you have to clean them.
Data scientists spend 80% of their time in data cleaning and data manipulation and only 20% of their time actually analyzing it.
And then you find yourself spending 80% of your time to clean these data. At the same time, deadlines and management demands keep you up at night.
This is one reason data analysts and data scientists regularly scour the web looking for anything that could help. Tools, tutorials, resources.
I have stumbled many posts around related with general Data Science MOOC courses or tutorials. But never one that has a list of resources on one of the most time-consuming processes in the data pipeline. Data cleaning.
In this post, I did my best to gather everything there is online. If you find a resource that I missed please let me know in the comments below.
Let's start with the basics...
What is data cleaning?
Note: Some of the courses bellow belong to specializations or batches of courses. For example, Coursera has a Data Science specialization or Udacity's Nanodegree Program but you may also take each course individually. If you are interested in a certificate, then usually there is a fee. If not (for Coursera at least) you may "audit" the course. Other courses are free and others are subscription based services.
Data Cleaning in R
Getting and Cleaning Data (Coursera)
- Course Name: Getting and Cleaning Data
- Institution: Johns Hopkins University
- Coursera Specialization: Data Science Specialization
- Price: Free
- Belongs to Coursera's Data Science Specialization from Johns Hopkins University and it is one of the best Data Cleaning courses out here.The course covers the basics needed for collecting, cleaning, and sharing data.
- Course Name: Data Science and Machine Learning Essentials
- Institution: Microsoft
- Price: Free, paid for certificate
- Another one of the best Data Science courses MOOC course. It covers tools like R, Python and SQL and among others covers data acquisition, ingestion, sampling, quantization, cleaning, and transformation.
Data Science with R (O'Reilly)
- Course Name: Data Science with R
- Price: Paid
- It is part in one of O'Reilly's Learning Paths. It starts from the basics to more advanced techniques including R Graph and machine learning. It contains an intro to Data Science with R, how to manipulate data sets and expert Data Wrangling with R.
Cleaning Data in R (DataCamp)
- Course Name: Learn How to Clean Your Data Using R
- Price: Free (some chapters), Subscription based.
- Provides a basic intro to cleaning data in R.
Foundations of Data Science (Springboard)
- Course Name: Foundations of Data Science
- Price: Free (some chapters), Subscription based or one-time payment
- It has a unit about Data Wrangling and data cleaning with R.
You may want to take a look at the list of resources about Data cleaning and R inside Udemy. There are a lot to choose from, but it might require some searching to find which one is valuable to you.
There are more courses about Python, SQL and more. You may read the rest of the updated and curated list here. If you know one that I miss please let me know!