Artificial Intelligence fails qualifying exam for radiologists — not ready for major surgery

LONDON — Artificial Intelligence (AI) is not ready to take over for doctors and perform major surgery or examinations, a new study reveals. Researchers in the United Kingdom report that a robot program failed a major radiology test — which serves as a qualifying benchmark for medical trainees.

AI is increasingly being used for some tasks that doctors carry out, such as interpreting radiographs, x-rays, and scans to help diagnose a range of conditions.  However, the program in this study was unable to pass one of the qualifying radiology examinations, suggesting the technology is not yet ready to replace human medics in more serious situations.

Researchers compared the performance of a commercially available AI tool with 26 radiologists, mostly between 31 and 40 years-old. Sixty-two percent of the human participants were female. All the candidates passed the Fellowship of the Royal College of Radiologists (FRCR) exam the previous year. The test is taken by U.K. trainees in order to qualify as radiology consultants.

What was on the test?

Study authors developed 10 “mock” rapid reporting exams, based on one of three modules that make up the qualifying FRCR paper designed to test candidates for speed and accuracy. Each mock exam featured 30 radiographs at the same or a higher level of difficulty and breadth of knowledge expected for the real FRCR exam.

To pass, candidates had to correctly interpret at least 27 (90%) of the 30 images within 35 minutes. Researchers trained the AI candidate to assess chest and bone (musculoskeletal) radiographs for several conditions including fractures, swollen and dislocated joints, and collapsed lungs.

Allowances were made for images relating to body parts that the AI had not been trained in, which were deemed “uninterpretable.” When uninterpretable images were excluded from the analysis, the AI got an average overall accuracy of 79.5 percent and passed two of 10 mock FRCR exams. Meanwhile, the average radiologist achieved an average accuracy of 84.8 percent and passed four of 10 mock examinations.

The sensitivity — or the ability to correctly identify patients with a particular condition — for the AI candidate was 83.6 percent, compared with 84.1 percent for the radiologists tested. The specificity — or the ability to correctly pick out patients without a certain illness — was 75.2 percent for the AI and 87.3 percent across all the humans who took the exams.

Across 148 of 300 radiographs that were correctly interpreted by more than 90 percent of radiologists. The AI candidate was correct in 134 (91%) and incorrect in the remaining 14 (9%). In 20 out of 300 radiographs that over half of radiologists interpreted incorrectly, the AI candidate was incorrect in 10 (50%) and correct in the remaining half.

Doctors overestimate AI’s ability to help them

Scientists found that radiologists slightly overestimated the likely performance of the AI, assuming it would perform almost as well as themselves on average and outperform them in at least three of the 10 mock exams. This was not the case, the researchers confess.

“On this occasion, the artificial intelligence candidate was unable to pass any of the 10 mock examinations when marked against similarly strict criteria to its human counterparts, but it could pass two of the mock examinations if special dispensation was made by the RCR to exclude images that it had not been trained on,” researchers write in a media release.

More training and revision were “strongly recommended” by the researchers, particularly for cases the AI considered “non-interpretable,” such as abdominal radiographs and those of the axial skeleton — the bones of the head and trunk of a vertebrate.

AI may help ease medics’ workflows, but human input is still crucial at this stage of the technology, scientists say. Researchers add that using artificial intelligence “has untapped potential to further facilitate efficiency and diagnostic accuracy to meet an array of healthcare demands.”

However, the experts say that doing so appropriately, “implies educating physicians and the public better about the limitations of artificial intelligence and making these more transparent.”

The research in the AI field is “buzzing,” the team explains, and this study reveals one major aspect of radiology — passing the FRCR exam necessary for the license to practice — still benefits from a human touch. Study authors note these are observational findings only looked at one AI, limited the scope of the results.

The team only used mock exams that were not timed or supervised, so radiologists may not have felt as much pressure to do their best as one would in a real exam, the experts add. However, this study is one of the more comprehensive cross comparisons between radiologists and AI, giving a wide range of scores and results to be analyzed.

The study is published in the Christmas issue of The BMJ.

South West News Service writer Chris Dyer contributed to this report.