I started writing about AIM when the Royal College of Physicians, London invited Babylon’s Team (Babylon a healthcare chatbot) to make a presentation at its 2018 conference. This shows that Babylon is to be taken seriously.
Babylon is challenging the NHS-UK that it can assist the patients who seek an appointment with a GP who should be seen without delay and who can wait. This is basic work a triaging nurse and GP would do when a patient calls to get an appointment.
Babylon published a paper about the Chatbot and Enrico Coeria has commented on why Bablylon may not yet be ready to triage real-patients
What is AIM?
Medicine is defined by Merriam Webster as the science and art dealing with the maintenance of health and the prevention, alleviation, or cure of disease.
From the time computers came into use in the 1950’s scientists were aiming to create programs that were intelligent (thinking and reasoning) like human beings. In 1956 John McCarthy who coined the term ‘Artificial Intelligence’ (AI) (Ref) proposed at the Dartmouth Conference that “An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.”
This summer has extended into more than half a century. Physicians were also captivated by the potential AI could have in medicine. With computer power to store vast amounts of data and processing power, it was thought that computers would become ‘doctors in a box’ assisting and surpassing clinicians with tasks like diagnosis (Ref). With this background computer scientists and healthcare professionals mainly from USA working together formed the new discipline of ‘Artificial Intelligence in Medicine’ (AIM).
In reviewing the emerging field of AI in medicine, Clancey and Shortliffe in 1984 provided the following definition: ‘Medical artificial intelligence is primarily concerned with the construction of AI programs that perform diagnosis and make therapy recommendations. Unlike medical applications based on other programming methods, such as purely statistical and probabilistic methods, medical AI programs are based on symbolic models of disease entities and their relationship to patient factors and clinical manifestations.’ (Ref)
The field of AI had two schools of thought. Proponents of so-called ‘strong’ AI were interested in creating computer systems whose behaviour is at some level indistinguishable from that of humans (Interlude Box IC2.1 – Ref Turin Test). Success in strong AI would result in computer minds that could reside in autonomous physical beings such as robots or perhaps live in ‘virtual’ worlds such as the information space created by something like the Internet. (Ref – Coiera, Enrico. Guide to Health Informatics, Third Edition, 3rd Edition. CRC Press.) The ‘weak’ AI looked at human cognition and decide how it can be supported for complex or difficult tasks. For example, a fighter pilot may need the help of intelligent systems to assist in flying an aircraft that is too complex for humans to operate on their own. These ‘weak’ AI systems are not intended to have an independent existence, but instead are a form of ‘cognitive prosthesis’ that supports a human in a variety of tasks.
What is Machine Learning? AI is a branch of computer science that tried to make computers more intelligent. A basic requirement for intelligent behaviour in learning. Most experts believe that without learning there can be no intelligence. Machine learning is a major branch of AI and a rapidly developing subfields of AI (Ref). (This is a key paper to understand ML and the three branches – Baysean classifier, Neural Networks and Decision Trees) From the very beginning, three major branches of machine learning emerged. Classical work in symbolic learning is described by Hunt et al. , in statistical methods by Nilsson , and in neural networks by Rosenblatt . Bayesian classifier example and explanation – link http://www.statsoft.com/textbook/naive-bayes-classifier
Reviews re AIM – historical order Computer Programs to support clinical decision making – 1987 Shortlife Coming of Age in AI -2008 Patel Shortliffe Thirty years of AIM review of research themes – 2015 – AIM Peek Artificial intelligence in medicine – 2017 Hamet Artificial Intelligence in Medical Practice: The Question to the Answer? – AJM – 2018 Miller Topol The Medscape Editor Eric Topol’s articles about AIM The image below has all papers Toplo think is methodologicaly good for thned
What AI has been used for historically and present Jeremy Howard – Ted Talk
The wonderful and terrifying implications of computers that can learn I’m Jeremy Howard, Enlitic CEO, Kaggle Past President, Singularity U Faculty. Ask me anything about machine learning, future of medicine, technological unemployment, startups, VC, or programming LINK Gary Kasporov – Ted Talk Don’t fear intelligent machines. Work with them Fei Fei Li – Ted Talk
How we’re teaching computers to understand pictures
2018 – 12 27
On algorithms, machines, and medicine [Ref] The Lancet oncology piece by Coiera re thyroid cancer detection study
As we move into a world dominated by algorithms and machine-learned clinical approaches, we must deeply understand the difference between what a machine says and what we must do. Deep learning techniques in particular are transforming our ability to interpret imaging data.
The results of a retrospective preclinical [Ref] study applying deep learning and statistical methods to diagnose thyroid cancer using sonographic images are impressive. When compared with six radiologists on unseen data, in an internal validation dataset, the system correctly detected about the same number of cancers with the radiologists. How generalisable are these results? Training only on patients from one health service or region runs the risk of overfitting to the training data, resulting in brittle degraded performance in other settings. In this study, although similar machine specificity was achieved on populations from different hospitals, sensitivity dropped to 84·3%. One might anticipate the system to have weaker performance in non-Chinese populations. One remedy is to retrain the system on patients from new target populations. The problem of biases in training data is, however, foundational,5 and clinicians must always consider if a machine recommendation is based on data from a population different to their patient. For example, in the study, cancer-free images from patients with thyroid cancer were excluded from training. In real-world settings, such images are included, and their presence might distort algorithm performance.
The authors make commendable efforts to ensure results are as clinically meaningful as possible. Image augmentation was used to artificially distort training data—randomly cropping, scaling, and otherwise distorting images to mimic variations in real-world image quality. Deep learning systems are often criticised because their recommendations come without an explanation, the logic underpinning a diagnosis hidden. In this study, the pixels in an image that most contributed to a diagnosis were highlighted. A clinician could highlight salient parts of an image to help check the computer interpretation.
Coiera states that ‘Decision support must be embedded in a clinical workflow and is but one part of a web of actions and decisions that lead to patients’ care. In the case of thyroid cancer, ultrasound is one step in a sequence that can lead to biopsy and treatment. In view of concerns that thyroid cancer is both overdiagnosed and overtreated, improved ultrasound detection might deliver little benefit in terms of patients’ outcomes. For example, South Korea has seen a 15-fold increase in thyroid cancer, attributable largely to overdiagnosis, and any diagnostic method that detects more indolent than consequential disease would most likely exacerbate this situation. Certainly, precise automated identification of true negative sonograms might improve a clinician’s confidence to do nothing. For this reason, rather than only comparing human to machine, it is more clinically meaningful to measure the performance of human beings assisted by machine. Such measurements must ultimately take place in clinical trials, recording false-negative identifications and undertreatment as well as overtreatment. Indeed, there is a case that the most pressing decision-support need in thyroid cancer is not in diagnosis but in making the decision to treat.
Thus, excellence in algorithmic performance is essential in our quest for automation, but ultimately we are interested in what a human being decides when using automation in the messy reality of health care. Until our machines are fully embedded in that reality, and see it better than us, our role as clinicians is to be the bridge between machine and decision. At least for now, algorithms do not treat patients, health systems do
Published Online December 21, 2018 http://dx.doi.org/10.1016/ S1470-2045(18)30835-0
See Online/Articles http://dx.doi.org/10.1016/ S1470-2045(18)30762-9