Why Learn Data Science?

When I started out in data science back in 2015, I had no idea what I wanted. All I knew was that I had a dissertation to write for my degree in Economics, and that data science was the next big thing. I also knew that I liked programming and wanted to develop my skills in this area. I did not know nor expect to dive deep into data science. Many blog posts and one data science competition later (in 2019), I thought of pursuing an online Masters in Data Science to get certification of my skills. After much thought, I realised that these would have been what I would be paying for (in order of importance to me):

  1. Knowledge. All Masters programmes equip their students with knowledge and practical experience.
  2. Proof of Skills and Effort. The certificate is a declaration by a third party (a university) that I did attend certain courses and did well in them.
  3. Quality Control. All of the above, under a formal, structured programme.
  4. Brand. Apparently, a Masters degree at a reputable school means that the programme is respectable. Not entirely true, because data science is a young field, and Masters programmes are younger.

Having gone through the thought process, my recommendations are as such:

  1. If it is just knowledge you seek, go for Massive Open Online Courses (MOOCs).
  2. If it is knowledge and proof of skills and effort that you seek, go for paid MOOCs.
  3. If it is all of the above, go for a Masters programme.

Your Very Own Masters in Data Science

Regardless of which option you chose, know that structure is important (and people love structure). Therefore, you should structure your learning, taking reference from Masters Programmes. From my research on the programmes from Columbia University, UC Berkeley, Johns Hopkins, Carnegie Mellon, Cornell, and Harvard, I discovered that a Masters in Data Science typically comprises 7 key focus areas:

  1. Statistics
  2. Data Exploration
  3. Computing
  4. Databases
  5. Machine Learning
  6. Deep Learning
  7. Capstone Project

Using this framework, here are my recommendations. These are courses I have completed and found useful. The less useful courses have been left out of this list.

CATEGORY RESOURCES
Statistics Statistical Inference
Linear Models
Probability Models
Bayesian Statistics
Data Exploration R:
Exploratory Data Analysis (Coursera)
Computing Python:
Python for Data Science (EdX)
Programming for Everybody - Getting Started with Python (Coursera)
Programming with Python for Data Science (EdX)
Python Programmer Track (DataCamp)

R:
R Programming (Coursera)
Data & Databases SQL:
SQLBolt
Learn SQL (Codecademy)
DataCamp

Python:
Python Data Structures (Coursera)

R:
Getting and Cleaning Data (Coursera)
Machine Learning Python:
Principles of Machine Learning: Python Edition - Work in Progress (EdX)

R:
The Analytics Edge (EdX)
Deep Learning General:
Neural Networks and Deep Learning
Improving Deep Neural Networks: Hyperparameter Tuning, Regularisation and Optimisation
Convolutional Neural Networks
Sequence Models

Python:
Deep Learning Fundamentals with Keras
Capstone Project Create one!
Kaggle
UCI Machine Learning Repository
Quandl
More!
Others R:
Reproducible Research (Coursera)

Recommended Reading

  1. Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking
  2. Data Smart: Using Data Science to Transform Information into Insight
  3. Data Science from Scratch: First Principles with Python
  4. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management