Creating radiology reports is a vital but time-intensive task that involves analyzing images, consulting documents, and evaluating data. This process, heavily reliant on human effort, is prone to errors that can vary with the radiologists experience. Consequently, automating the generation of radiology reports is a key research goal due to its potential impact on medical procedures and patient care.This work proposes a multimodal approach specifically designed for generating radiological reports from chest X-rays (CXRs). Our method integrates a LLaMa large language model with Retrieval Augmented Generation (RAG), enhanced by a modified ALBEF embedding model that exploits efficient organ semantic segmentation and triple contrastive loss (called EALBEF). The combination of these two components allows radiological report generation that surpasses current state-of-the-art methods in terms of quality and accuracy. Our approach demonstrates a significant enhancement in the radiologist-specific metrics (e.g., RadCliQ), as well as across various generic lexical-based metrics (e.g., GLEU). Quantitative analyses of the models outputs reveal a notable increase in fluency and accuracy, with a marked reduction in issues such as hallucinations and source-reference divergences in the generated reports.
Report Generation from X-Ray imaging by Retrieval-Augmented Generation and improved Image-Text Matching
Bernardi M. L.;
2024-01-01
Abstract
Creating radiology reports is a vital but time-intensive task that involves analyzing images, consulting documents, and evaluating data. This process, heavily reliant on human effort, is prone to errors that can vary with the radiologists experience. Consequently, automating the generation of radiology reports is a key research goal due to its potential impact on medical procedures and patient care.This work proposes a multimodal approach specifically designed for generating radiological reports from chest X-rays (CXRs). Our method integrates a LLaMa large language model with Retrieval Augmented Generation (RAG), enhanced by a modified ALBEF embedding model that exploits efficient organ semantic segmentation and triple contrastive loss (called EALBEF). The combination of these two components allows radiological report generation that surpasses current state-of-the-art methods in terms of quality and accuracy. Our approach demonstrates a significant enhancement in the radiologist-specific metrics (e.g., RadCliQ), as well as across various generic lexical-based metrics (e.g., GLEU). Quantitative analyses of the models outputs reveal a notable increase in fluency and accuracy, with a marked reduction in issues such as hallucinations and source-reference divergences in the generated reports.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.