So, would you believe that in over 3 years of blogging, I have never done a low-effort, low-information, clickbait post? Bizarre, I know, but that changes today.
Honestly though, I’ve wanted to put together a list like this for a while. I’ve had a lot of twitter discussions around these topics, and these are all things I believe strongly but other clever people disagree with. So here you are, a list of opinions I hold that are more or less outside the common consensus.
Non-standard disclaimer: this post is all my personal opinion, with very little evidence. Take with salt for best results.
- Open data is not necessarily good. Data is the main competitive advantage a company has to bring a product to market (which costs millions of dollars). If they don’t have this advantage, they have a much less certain return on investment. Why spend millions on a product that anyone can build? Open data could actually slow the pace of progress, where we end up with lots of research papers but no products. Open data is also is a terrible thing for generalisability, as everyone overfits massively to “be the best” on public datasets.
I’m not giving up my first mover advantage.
- “Normal vs abnormal” is a terrible task to train a model for. The abnormal class is so broad and diverse that your data will never cover it well, and your ability to notice rare subgroup errors will be very low (since you wont have any cases). I expect a huge spike in the rate of missed bone tumours if anyone brings an “normal chest xray” detector to market.
Exactly Ford. Without the darkness, how would we recognise the light?
- “Artificial intelligence” is a great term. We all know what it means, it brings in interest and money to the field, and frankly what we do is magic* so let’s just run with it.
Which is more magical, magitech or technomagic?
- Deep learning is pretty useless for EHR data. Not only is deep learning meh with unstructured** data like EHR records, but I don’t see any reason to expect breakthroughs. Deep learning works in images, text, sound, and so on because it looks for a very narrow subset of possible features (i.e., those with spatial relationships). EHR data has no internal structure^, so DL is no better than simpler ML models.
- End-user interpretability is over-rated. If your model works, most doctors will gleefully and immediately cede all related decision making to the AI, without the need for intepretability tools. At best, interpretability methods will provide a (false?) sense of security to clinicians^^. That said, faux-interpretable systems will probably sell better to CIOs trying to look “safety first”, so the current practice of adding heatmaps to everything makes a certain amount of cynical sense.
I can bill for it as a separate item.
- No medical advance is going to be achieved by a team who has designed a fancy new model for the task. Anyone using some home-spun model instead of an off-the-shelf dense/res/u-/inception network etc. is doing machine learning research, not medical research. The very process of building and tuning your own model means you will almost certainly overfit to your particular data, which is anathema to good medical systems. I’m actively skeptical of results in medical data where a novel architecture is used.
- Releasing public code is not particularly relevant in medical AI research. It doesn’t improve reproducibility for high performance systems, because without an equally good (but different) dataset we can’t actually validate the results. Even with shared data, running the same code on the same data only proves they didn’t make the results up.
I mean, I’d sell my soul for an AUC like that.
- Vision is done and dusted. Computer vision models aren’t going to get a lot better in terms of performance. We will slowly see improvements in data efficiency and semi-supervised learning, but pretty much any visual task can be performed at human or superhuman level given enough effort and data. We are at Bayes error.
The end of computer vision. So sad.
- Unsupervised learning isn’t clinically relevant. Currently, all AI that seems likely to add clinical value is supervised, because human performance is very close to the best achievable, given the inputs. Unsupervised learning is getting better, but you will always take a performance hit, which will always make it worse than human. There are undoubtedly some situations where unsupervised learning can play a supplementary role to supervised learning, but we won’t be solving medicine with our huge stores of unlabelled data anytime soon.
- Distrust any system with an AUC below 0.8, because this is roughly how well medical AI systems work when they overfit on non-pathological image features, like the model of x-ray scanner used or which technician who took the image (all of which can be identified in the image to some degree). These systems will mostly fail as clinical AI because they can’t generalise. Obviously the cut-off of 0.8 is a huge oversimplification, but tends to be a good rule of thumb for many common medical tasks.
There you go. Definitely not a format I will make regular use of, but holy smokes, I’ve written a blogpost with less than 1000 words! My only regret is I couldn’t find more gifs.
I’ll try to respond to any comments here or on social media, so disagree away (or suggest other controversial opinions). I’ll probably do a follow-up post with the best responses, especially if anyone can convince me my opinions are wrong 🙂