The NUMBER ONE question I get asked from incoming Data Science students is
” What materials or courses do you recommend to prepare for classes?”.
I got this question so frequently last year that I wrote up all the MOOCs (Massive Open Online Courses) I had personally completed in the last 3 years and gave it to the course director to send out. Considering that the number of Data Science students in Australia has doubled since I did mine, I figure there is a fair appetite for lists like these and so have updated mine just for you.
Now before you jump in I want you to keep the following in mind:
- There is no substitute for practice. You cannot read your way to becoming a decent programmer. You have to code.
- Both Python and R are great languages. If you are reading this, I’m guessing you haven’t started to specialise. Try and get across both languages. I used to be 100% R and now I hardly ever use it, preferring Python instead. You will need to get adaptive. Not everyone uses just 1, 2 or even 3 languages. I get caught out all the time with janky code that requires Java, Python, Matlab and C++ all for one installation.
- Cost doesn’t mean quality. Some of what I have put up is free but some have a price tag (no we do not get sponsorship). This also goes for the certificates. You don’t need them, especially if you are doing a university degree. Look out for price drops as well. Udemy has heaps of discounts throughout the year (hello black Friday) but it can be a bit hit and miss.
Now before I get to the best MOOC’s for Data Science students 2019, I need to talk about Kaggle. Kaggle is a platform where you can go and retrieve datasets to play with but also see what other people have done too. I am not ashamed to say that Kaggle is responsible for getting me through my first semester of programming.
Some lecturers I work with now incorporate ‘kaggle comps’ into their teaching syllabus and they are often used by Data Science clubs form mini Datathons (think Hackathon for Data Scientists). Kaggle was initially a platform to make some good money from businesses who needed insights and were willing to pay for it. When you’re ready you can join competitions and compete for real-life dollarydoos (the official currency of Australia for which all money on this site is expressed as).
Now for the best MOOC’s for Data Science students 2019
Top course when you don’t know how to code (yet!)
DataCamp – Introduction to Python and Introduction to R ($40 dollarydoos per month)
Datacamp is Data Science orientated and runs on a subscription model that can be a bit exxy. They do have free materials so you can try before you buy, and they use a browser environment so you don’t need to download R or RStudio before you start. What I love about these courses is that they cover the very very basics then build you up through practice. Datacamp has gamified their courses, you earn experience points for correct code lines and can take hints. This gives you a sense of achievement, something you definitely need in your early programming education.
Udemy – Python for Data Science and Machine Learning Bootcamp ($14.99 dollarydoos)
This is a fairly cheap course that Jose Portilla runs this course. Jose is a well-regarded instructor with consistently high reviews. This course covers the very very basics including how to install Python and my prefered environment, Jupyter. I would recommend this course for anyone who is feeling a bit overwhelmed about learning to program, so pretty much everyone.
Coursera – Programming for Everybody (Getting started with Python (0$ dollarydoos if you audit)
Coursera is free if you don’t get the certificate. This is called auditing. I took this course a few years ago and enjoyed it. There is a good online community and they send you reminders that your ‘homework’ is due. The instructor is Associate Professor Charles Severance from the University of Michigan who is adorable. He even does ‘class outside’. Recommended for budding Python programmers.
Top courses when ‘I can code(ish), just give me the Data Science ‘
Coursera – John Hopkins University Data Science Specialisation course ($0 dollarydoos if you audit)
This is a well-known program in Python that gets consistently high reviews. Made up of 5 separate courses, the specialisation takes around 5-6 months to complete at a normal speed of 6-8 hours per week. What I appreciate the most about this course is the breadth. By the end, you will be proficient with machine learning libraries like scikit-learn, NLP libraries (NLTK), visualisation libraries (matplotlib) and network construction (networkx) for social network analysis. The course fairly lectures heavy so if you aren’t great at watching lectures, hop on a treadmill and kill two birds with one stone.
Udacity – Intro to Data Analysis ($0 dollarydoos)
A brief but clear dive into the cognitive mechanics behind Data Science through the actual doing of Data Science. The instructor emphasises communication and careful decision making. The course is in Python and part of the Data Analyst Nanodegree which I haven’t done any more of.
Udemy – Data Science and Machine Learning Bootcamp with R ($14.99 Dollarydoos)
This course was awesome because at the time I did it, I had never used Python and was seriously doubting my ability to understand algorithms that I had only just begun to hear of. Jose Portilla takes this course and does a great job guiding you through different classification algorithms, clustering algorithms, basic NLP and Neural Nets. Definitely a confidence booster.
Top courses for when you want to step it up and commit
These courses can be done in combination with your degree or work. But they are a bit heavier.
Coursera – Machine Learning ($0 dollarydoos if you audit)
I cannot recommend this course enough. You will get an amazing experience with 60 hours of jam-packed ML goodness. Offered by Stanford (so you know it’s legit), the course boasts 4.9/5 stars from 90k ratings. I did find I needed to brush up on a few things I was rusty on, like linear algebra. But it was so worth it. I finished feeling like I was finally becoming a real Data Scientist. Oh and the instructor Andrew Ng is literally the best, he doesn’t dress things up unnecessarily which I have found some (younger) instructors like to do. While you’re at it, do yourself a favour and listen to Andrew’s advice for building a career in ML.
Cognitive Class – Learning Paths for Data science ($0 dollarydoos, thanks to IBM)
Historically IBM has a lot to answer for but I’ll give them this, they have certainly figured out how to tailor education for industry. Here you can choose what you are interested in according to the ‘learning path’ (Scala for Data science, Hadoop programming, deep learning, blockchain etc) or your experience level. The platform is very on trend, with ‘Containers, microservices, Kubernetesm and Istio on the cloud’ being one such course (lol, wot?). They work on a badge system with optional competitions. Equal parts pokemon and Kaggle. Definitely one for those who are keen to go corporate.
Dataquest – Interactive coding challenges (~$35 dollarydoos per month)
Quite similar to Datacamp but I feel it’s a bit more …ahh….polished. Dataquest uses a hands-on learning approach where you are essentially given a project with problems to solve. Hands-on learning is the quickest way to get better and Dataquest steps it up with actual projects, not just isolated problems. It is subscription based and billed yearly.
Top courses for when you just want me to shut up and take your money
Literally, any reputable University degree where you get a Masters or Bachelors in Data Science.
We aren’t bashing tertiary qualification in Data Science. Many are amazing and produce extremely talented students. Realistically you do need some form of tertiary qualification to break into Data Science, particularly in Australia. Here is a list of tertiary institutions offering Data Science qualifications in Australia, USA and the UK.
But if a $64 – 75k price tag isn’t in your budget, there are some alternative qualifications increasingly being recognised in the industry. I haven’t taken the next two courses but I have heard that they both do a similar syllabus that is said to get you up to speed.
edX- MicroMasters Program in Data Science ( $1,746 dollarydoos)
Not having taken this course I can’t comment too much about it. However, the instructors have emphasised the statistical and mathematical aspects of machine learning which is always a winner with me. The breadth looks reasonable and UC San Diego has a great reputation. If you are planning on doing a further study you could be eligible for course credit. Ironically you couldn’t gain credit at UC San Diego, but you could at Curtin University in Perth, Western Australia (#hometown). I’ll let you make up your own mind about this course.
edX- Microsoft Professional Program in Data Science ($1,635 dollarydoos)
This one is aimed at people who are maybe managers or Business analysts using primarily Excel or similar. The course includes SQL, R, Python and foundational mathematics before moving into Machine learning and predictive analytics with Spark. The final quarter of the course is a capstone project resulting in a certificate which I’ve seen pop up a few times on LinkedIn.
For when you don’t want to be a ‘fake Data Scientist’
Ok, brace yourselves. To be a real Data Scientist, you will want to get some theoretical understanding of the mathematics and statistics behind you. Horrifying I know.
I have so many problems with people who have never taken a stats course calling themselves Data Scientists. Don’t leave yourself open to some bitch like me calling you out on your lack of understanding of the fundamentals. I’m not the worlds best and never will be, but my understanding of the fundamentals informs the choices I make when creating workflows, choosing models and interpreting my results.
Unfortunately, I can’t recommend any courses online. I learnt my stats the old fashion way, Introduction to Statistical Learning, Introduction to Probability and Statistics for Engineers and Scientists and countless bottles of wine. If you are feeling particularly masochistic try The Elements of Statistical Learning.
Accumulating this knowledge is an ongoing battle. These days I continue to destroy my own sense of well being by reading multiple research papers concerning Bayesian inference, crying over what I call ‘scary math’ and spending the hour after every PhD supervision deeply questioning my life choices.
But since you are just starting out on your Data Science journey, this is very unlikely to be your story. Take the statistics and probability portion of whatever education you end up doing very seriously.
If you have any recommendations or proclamations of love for our site, leave a comment.
Good luck champs!
I’ve got a copy of Introduction to Statistical Learning on my desk. I know I found it useful as a way of getting my head around a few processes, though obviously thin on data wrangling. But it’s heaps easier to digest than Elements.
Honestly, the hardest thing is just straight up computer programming. There’s no way around it, it’s time intensive, error prone, and getting good at it just requires stacks of time and mistakes. And while you can kind of fumble your way through Excel, Python and R can just refuse to work if your code is wrong, which can be extremely concerning until you find the error.
But at the end of it all, you can do things that just blow people’s minds.
LikeLike
Wow Ryan. Such a positive and realistic outlook. Yes, computer programming is such a steep learning curve for newcomers. Thanks for the comments.
LikeLike