Massive open online courses (MOOCs) have brought the very best teachers of the world to anyone with an internet connection. Over the last two years, I have explored many of these classes and fully completed several.
Computer science / data science :
CS231n – Convolutional Neural Networks for Visual Recognition. Audited via Stanford University.
This course is fantastic. Hands down, the best neural network resource I have used (and I’ve tried most of them).
It covers the actual details of neural net implementation thoroughly, and Andrej is very good at covering the intuitions that make sense of each method or trick used.
The course is focused on convolutional nets (as the title suggests) but there is a huge amount here that is useful beyond that. It covers everything from backprop algorithms to initialisation tricks to visualisation methods.
The best thing about this course (I have been watching the Winter 2016 lecture set almost contemporaneously) is how up to date it is. Andrej and Justin often mention papers that hit ArXiv within the last few weeks, and there is good coverage of state of the art models (for example, residual networks get a decent segment).
I can thoroughly recommend this for anyone who has come through a Machine Learning 101 level class and wants to move on to neural networks, or for anyone who is coming from the top down and wants to deepen their understanding of neural nets beyond the metaphor level to the implementation level.
Neural Networks for Machine Learning. Audited via Coursera, offering from the faculty of the University of Toronto.
A good class from the always amazing Geoffrey Hinton, although a little dated by today’s standards. A lot of time is spent on the history of the field and the breakthroughs of decades past, but there is also more focus on the underlying maths than CS231n. I personally prefer the Stanford offering, YMMV.
Exploratory Data Analysis using R. Completed via Udacity, offering from the data analytics team at Facebook.
This is a great second course in R. It is a good mix of short videos, multiple choice questions and practice assignments that explore the “workhorse” libraries that most data scientists use. Much of this course is done with ggplot2 for the visualisations, so if you wanted to add this to your skillset, it is a good way to get some practice.
You will need to know the basics already, something like datacamp or similar. If you know your types and structures, subsetting and the basics of functions, you should be good to go.
The presenters from Facebook are all engaging as well, which helps.
Microsoft Professional Program Certificate in Data Science. Reviewed via Coursera, offering from Microsoft and a variety of collaborators.
I reviewed many of these classes when they came out in late 2016, and this specialisation program is really nice and comprehensive for beginners getting into data science and machine learning. The sub-courses are from a variety of sources which means quality varies a bit and there is some overlap between courses, but this has become my go to resource that I recommend for newcomers to the field. It has the special distinction of offering the same content in separate streams for Python and R, both of which are languages I regularly use. In fact, it is probably the best R course for a beginner (with Statistical Learning from Stanford being a great intermediate course).
Obviously going through the lot is a good idea, but for someone with a bit of programming experience or who can pick it up reasonably quickly, the shortened program I like to recommend is just courses 5,6,7, and 8. If you can get through these, you are ready to start doing your own data science projects. The expected time demand for this is 2-3 hours a week for a bit over 20 weeks, and a dedicated learner could certainly get through it all much more quickly.
The Data Scientist’s Toolbox. Completed with distinction via Coursera, offering from the faculty of John Hopkins University.
This is part one of the data science specialisation in R by John Hopkins. Jeff Leek seems fine, but it is hard to even call this a course. It really just gets you to install R and make a repo in github.
The reason I have this here, but nothing from the rest of the specialisation, is that I started the specialisation about a year ago as a complete R newbie, and breezed through this class. Then the second class started (R Programming) and I hit a wall at assignment one. Complete stop.
Everyone who starts this course, don’t expect the specialisation to follow on in difficulty. There is a huge jump from here to part 2.
I still intend to go back and do the rest of the specialisation, now that I have a year and a lot of R work under my belt. From what I remember, the second course isn’t actually horrendously difficult, but it is far beyond a true beginner.
Programming is for everybody (Python). Completed via Coursera, offering from the faculty of the University of Michigan.
This is a nice introduction to Python. It isn’t a CS101 type class though, you aren’t going to learn much in the way of the underlying principles of computer science, it is more of a first steps using the language type class. That said, it does focus a little on data science, and there are few classes that do at the introductory level.
I am going to try out the Rice specialisation when I get the time, which is introductory CS in python, but using games as the medium rather than data. I expect it will be a bit more comprehensive on the theory side.
If you just want to learn how to get started with data in python, this is a good place to begin. You won’t learn any skills with SKlearn or NLTK or anything, just the basics of the language, but I found it enjoyable. The lecturer is also very personable, and clearly enjoys teaching.
This class was previously offered in one block, but has now been broken into a specialisation with multiple smaller blocks.
Statistical Reasoning in Public Health 1. Completion via Coursera, offering from the faculty of John Hopkins University.
Both part one and part two of this course are nice traditional stats courses. John McGready elevates the course by being very personable and damn near adorable. By completing part one and part two you should be able to cope with most biostatistics in medical research, at least unless you want to do -omic stuff, and even then you should have the tools to upskill pretty quickly.
Data Analysis and Statistical Inference. Audited via Coursera, offering from the faculty of Duke University.
Another course that has been split up into multiple parts to form a Coursera Specialisation. I really liked this course, which is a great mix of stats 101 and introductory R programming. Mine Çetinkaya-Rundel is the main lecturer and was fantastic. This is a major contender for my most recommended intro data science course, particularly if my recomendee is more focused on biostats (vs machine learning) and they want to use R.
The course is more statistically rigorous than the Microsoft offering up the page, so anyone seriously looking at public health as a long term research field might find this a better fit. It will take longer to get up to speed though, so the applied science folks and hackers would probably prefer the Microsoft courses.
Biology and Medicine :
Introduction to Systems Biology. Audited via Coursera, offering from the faculty of the Icahn School of Medicine at Mount Sinai.
A good introduction to a worthwhile topic, in a field where there are few MOOCs available. I personally found the style very dry, but there is a good depth to the mathematics involved. A nice place to start for a budding bioinformaticist, particularly as many other MOOCs in this field seem to focus almost entirely on genomics and applications of genomics.
Ethical and Social Challenges of Genomic and Precision Medicine. Completed via Coursera, offering from the faculty of the University of California, San Francisco.
Unfortunately this class has been taken down. It was a decent overview of the ethical issues around genomics, although to say it touched on other areas of precision medicine is probably stretching things. Not unexpected in the field, because almost all of precision medicine is genomics, and the same concepts could easily be extrapolated to any other precision methods.
The Science of Everyday Thinking. Audited via Coursera, offering from the faculty of the University of Queensland.
This is a very good course for anyone interested in a broad overview of how human psychology affects us all, and subconsciously biases our ‘decisions’. I strongly recommend anyone going on a MOOC binge to at least go to youtube and watch Episode 5, Learning to Learn. It gives some really nice insights into optimising your study practices. I personally found it more useful than the much longer MOOC Learning to Learn from the University of California, San Diego. If you want the cliff notes: there is no real shortcuts, but try to avoid lecturers who explain things so well you don’t feel like you have to work at understanding the content – this is an illusion where we mistake the feeling of “wow, that makes sense” for actually learning. They call this “desirable difficulty”, and it is one of several gems in the short video series.