- Research
- Open access
- Published:
Optimizing breast lesions diagnosis and decision-making with a deep learning fusion model integrating ultrasound and mammography: a dual-center retrospective study
Breast Cancer Research volume 27, Article number: 80 (2025)
Abstract
Background
This study aimed to develop a BI-RADS network (DL-UM) via integrating ultrasound (US) and mammography (MG) images and explore its performance in improving breast lesion diagnosis and management when collaborating with radiologists, particularly in cases with discordant US and MG Breast Imaging Reporting and Data System (BI-RADS) classifications.
Methods
We retrospectively collected image data from 1283 women with breast lesions who underwent both US and MG within one month at two medical centres and categorised them into concordant and discordant BI-RADS classification subgroups. We developed a DL-UM network via integrating US and MG images, and DL networks using US (DL-U) or MG (DL-M) alone, respectively. The performance of DL-UM network for breast lesion diagnosis was evaluated using ROC curves and compared to DL-U and DL-M networks in the external testing dataset. The diagnostic performance of radiologists with different levels of experience under the assistance of DL-UM network was also evaluated.
Results
In the external testing dataset, DL-UM outperformed DL-M in sensitivity (0.962 vs. 0.833, P = 0.016) and DL-U in specificity (0.667 vs. 0.526, P = 0.030), respectively. In the discordant BI-RADS classification subgroup, DL-UM achieved an AUC of 0.910. The diagnostic performance of four radiologists improved when collaborating with the DL-UM network, with AUCs increased from 0.674–0.772 to 0.889–0.910, specificities from 52.1%–75.0 to 81.3–87.5% and reducing unnecessary biopsies by 16.1%–24.6%, particularly for junior radiologists. Meanwhile, DL-UM outputs and heatmaps enhanced radiologists’ trust and improved interobserver agreement between US and MG, with weighted kappa increased from 0.048 to 0.713 (P < 0.05).
Conclusions
The DL-UM network, integrating complementary US and MG features, assisted radiologists in improving breast lesion diagnosis and management, potentially reducing unnecessary biopsies.
Background
Mammography (MG) is recommended for breast cancer screening, but its sensitivity is limited in women with dense breasts [1]. Ultrasound (US), as a supplementary screening tool for dense breasts, however, falls short in detecting microcalcifications, a crucial indicator of early breast cancer [2]. MRI, although effective in detecting early breast cancer [3], is currently only recommended for high-risk women due to its high cost and lengthy scans [4]. Therefore, combining US and MG could potentially mitigate these limitations and improve cancer detection, particularly for women with dense breasts [5].
However, discordances of Breast Imaging Reporting and Data System (BI-RADS) classifications between US and MG are inevitable, potentially causing unnecessary anxiety and biopsies [6]. Previous attempts using shear wave elastography and contrast-enhanced US to improve diagnosis in discordant BI-RADS cases [7, 8] have encountered controversy due to high operator variability and discrepancies in diagnostic criteria. A recent study [6] proposed a nomogram integrating visual analysis of US and MG, but it relies on subjective radiological observation, posing challenges for less experienced radiologists. Therefore, to develop a robust and objective method for optimizing diagnosis and management of breast lesions with discordant US and MG classifications is imperative.
Deep learning (DL), allowing automatic analyzing medical images, has shown promise in breast cancer detection and management [9, 10]. Emerging evidence [11,12,13,14] suggests that extracting multimodal radiomics features through DL approaches could overcome unimodal imaging limitations, offering comprehensive and complementary diagnostic insights. Numerous studies [15, 16] indicated that artificial intelligence (AI) integration with radiologists could improve diagnostic accuracy, especially for junior radiologists, bolstering the role of artificial intelligence in clinical decision-making. However, how to effectively integrate DL models and radiologists of varying experience levels in cases of discordant MG and US BI-RADS classifications remains unclear. Moreover, radiologists’ perceptions of DL outputs may raise uncertainty regarding its clinical applicability, warranting further investigation.
Hence, we aimed to develop a DL network via integrating US and MG images (DL-UM) and investigate its performance in improving breast lesion diagnosis and management when collaborating with radiologists, particularly in cases of discordant US and MG BI-RADS classifications. Additionally, we explored the potential of DL-UM outputs and heatmaps to foster trust of radiologists with various experience in stimulated clinical workflow.
Methods
Statement of ethics
This study was approved by the Institutional Ethics Committee of the hospital (NFEC-202012-K8) and the requirement for informed consent was waived owing to the retrospective design and use of anonymized data.
Study population
Women undergoing breast imaging were consecutively and retrospectively collected from the Medical Centre 1 between June 2019 and June 2021 to form the development dataset for establishing DL network. An external testing dataset was consecutively collected from Medical Centre 2 between January 2021 and June 2021. Figure 1 shows the patient selection flowchart.
Inclusion criteria were as follows: (1) women aged > 18 years with paired MG and US imaging conducted within 1 month; (2) consistent targeted lesion findings on both MG and US; and (3) collected the most suspicious lesion in cases of multiple lesions in the imaging data. Exclusion criteria were as follows: (1) no pathological results; (2) incomplete or low-quality imaging data; (3) radiotherapy or neoadjuvant chemotherapy before examination; (4) patients with incomplete BI-RADS assessment (to exclude BI-RADS 0 cases); and (5) previous biopsy or surgery before examination (to exclude BI-RADS 6 cases).
After selection, 1126 patients from Medical Centre 1 were included for analysis and randomised into the training, validation, and internal testing cohorts in a 7:1:2 allocation ratio. The external testing cohort comprised 157 patients from Medical Centre 2.
Imaging acquisition and interpretation
MG images were acquired using Mammomat Novation DR (Siemens AG Medical Solutions, Erlangen, Germany) and Selenia Dimensions (Hologic, Bedford, Mass, USA) digital systems, encompassing craniocaudal and medio lateral-oblique views. US images were obtained using different devices, including Aixplorer (SuperSonic Imagine, Aix-en-Provence, France), Logiq E9 (GE Healthcare, Wauwatosa, WI, USA) systems, and other systems with 7.5–15 MHz linear high-frequency transducers. Two-directional (transverse and longitudinal) static images were recorded, focusing on the region of interest in each patient’s image data that exhibited the most suspicious lesion. For reliable and reproducible BI-RADS classifications, six senior radiologists (R1–R3 with ≥ 5 years of experience in breast US, R4–R6 with ≥ 8 years of experience in MG,) independently reviewed all images. If results diverged, the radiologists resolved discrepancies through discussion to reach a consensus for the final diagnosis. The radiologists were blind to the pathological results but had access to clinical information and prior imaging.
According to the 2013 American College of Radiology BI-RADS criteria, lesions rated as 2 or 3 were considered benign or probably benign, while those classified as 4 or 5 were considered suspicious, warranting tissue diagnosis. A discordant BI-RADS classification between US and MG was defined when a lesion was classified as 4 or 5 on one modality but as 2 or 3 on the other. Based on these standards, all lesions were categorised into subgroups with discordant or concordant BI-RADS classifications.
Data pre-processing
Data pre-processing involved cropping irrelevant regions in US and MG images to minimise their negative effects on network performance, conducted by experienced radiologists (R1–R6) using ITK-SNAP software (http://www.radiantviewer.com). To account for diagnostic significance of adjacent tissue, regions of interest (ROIs) in both MG and US images were expanded by 30% of their shortest lengths. All images were then resized to a standard size (224 pixels × 224 pixels), and intensity values were normalised to the minimum-maximum intensity range (0–1).
Model architecture
Figure 2 depicts the study design and the architecture of the DL-UM network. The network included two feature extraction branches, one each for US and MG images, with a shared feature identification developed using VGG19 [17]. Inputs for each feature extraction branch were 224 × 224-pixel paired patches from US and MG images after lesion segmentation and image preprocessing. Each branch comprised 5 convolution blocks with convolution layers of 2, 2, 4, 4, and 4, respectively, and of 64, 128, 256, 512, and 512 filters with a kernel size of 3 × 3, for efficiently extracting and propagating the coarse-to-fine representations. To prevent from gradient vanishing and enhance network sparsity, the last convolutional layers of all convolution blocks were followed by a batch normalization and a nonlinear Rectified Linear Unit activation operator. For the first four blocks, the activated features would pass through a maximum pooling layer with the pooling window size of 2 × 2 to perform feature dimension reduction for relieving overfitting issue. Meanwhile, they were also input to an additional classification head, including a global average pooling (GAP) layer, two fully connected layers with neuron numbers of 64 and 1, and a sigmoid function, for encouraging the network to capture more discriminative information via deep supervision strategy [18]. In the final convolution block, the maximum pooling layer was removed and features from GAP layer were used for subsequent supervision and final classification. Given the output features from final convolution block, a concatenation operation was embedded into the end of network to receive and integrate the final representations derived from US and MG branches. Finally, the integrated feature representations were used to perform breast tumour classification via the same classification head. A focal loss [19] was used as supervision function to focus network’s attention on the samples difficult to classify during training. Meanwhile, the mean absolute error (MAE) to force the final predictions of US and MG branches to be consistent. During the network training, the Adam optimizer was used with the global learning rate of 1 × 10 − 4. Meanwhile, the momentum was 0.9 and the batch size was 16. After every epoch, model parameters were saved and the model with the lowest average loss on the validation set was chosen for evaluation on the test data set. The DL-UM models, developed in Python (3.6.13; Python Software Foundation, Wilmington, DE, USA), were trained and evaluated using five-fold cross-validation, with samples from different classes randomly partitioned by patient. The DL models were trained on a server with eight 12GB NVIDIA GeForce RTX 2080 Ti GPUs, an Intel Xeon E5-2650 v4 CPU @ 2.20 GHz, and 192GB RAM. Key Python packages and versions include TensorFlow (v2.1.0), Keras (v2.3.1), NumPy (v1.19.2), Pandas (v1.1.5), scikit-learn (v0.20.3), Matplotlib (v3.3.4), and Pillow (v8.4.0).
Design of the study. A. Patients underwent paired ultrasonic and mammographic imaging within 1 month and were dichotomised into subgroups with concordant and discordant ultrasonic and mammographic BI-RADS classifications. B. Summary of development and external testing datasets from two medical centres. C. Architecture of the deep learning (DL-UM) network. The DL-UM network includes two feature extraction branches and a classification head. Each branch comprises five convolution blocks (i.e., \(\left\{ {C{B_i}} \right\}_{i = 1}^5\)), with the convolution of layers 2, 2, 4, 4, and 4, and subsequent filtering at 64, 128, 256, 512, and 512 Hz, respectively, using filters with a 3 × 3 kernel size. A focal loss function is used for deep supervision and classification loss, with the mean absolute error loss function used to force the final output of the two branches to be consistent. * In the VGG19 model used in this study, Block 2, Block 4, and Block 5 share the same architectural structure as Block 3, while Block 1 and Block 3 represent two distinct structural designs. D. Diagnostic performances of three DL models were compared with ROC curves. E. Overview of DL-UM-assisted radiologist workflow. First step, four radiologists independently reviewed and analysed US and MG images, dichotomising the lesion as “possibly benign” and “possibly malignant”. After two months, radiologists respectively re-analysed the US and MG images in random orders and referred to DL-UM outputs (“possibly benign” and “possibly malignant”). Meanwhile, radiologists could accept or reject DL-UM suggestions or request AI explanations with heatmaps, and made the final diagnosis
Network interpretability
The Class Activation Map (CAM) [20] technique visualizes the image regions a DL model focused on during classification, pinpointing areas that significantly influence decision-making. CAMs help illustrate which image areas contribute most to the model’s decisions, supporting radiologists in evaluating and interpreting the model’s performance. In our model, CAMs were generated from the last convolutional layer in each block to highlight areas of network focus. The GAP layer produced an eigenvector representing average feature significance, which was weighted and applied to feature maps for visualization. The resulting heatmap effectively emphasized the critical tumor regions identified by the network.
DL-UM-assisted radiologist
Breast images from Medical Centre 2 were independently evaluated by two junior radiologists (3 years of experience in breast US [R7] and MG [R8]) and two senior radiologists (8 years of experience in breast US [R9] and 10 years of experience in MG [R10]). After analysing US and MG images independently, four radiologists dichotomised all lesions as “possibly benign” and “possibly malignant” respectively. After two months, radiologists (R7–R10) re-analysed the US and MG images in random orders and referred to DL-UM outputs (based on both US and MG images) for a dichotomous classification diagnosis. Meanwhile, radiologists could accept or reject DL-UM suggestions or request AI explanations with heatmaps (for both US and MG images) (see Fig. 2E). Additionally, before and after reviewing the DL-UM outputs, each radiologist had access to clinical information, both US and MG images for each patient but remaining blind to pathologic results. To evaluate DL-UM-assisted radiologists contributing to decision-making at a simulated clinical setting, we quantified the potential reduction in recommended biopsies and unnecessary biopsies based on DL-UM outputs and radiologist interpretations. Following previous studies [21, 22], recommended biopsies included all cases predicted as malignant, while unnecessary biopsies were defined as cases predicted as malignant but pathologically confirmed as benign. Meanwhile missed malignancies were defined as cases predicted as benign but pathologically confirmed as malignant.
Statistical analysis
Continuous variables were analysed using independent t-tests, while categorical variables were compared using chi-squared (χ²) or Fisher’s exact tests. Predictive performances of the three DL networks and radiologists were assessed with receiver operating characteristic (ROC) curve analyses. Diagnostic accuracy was compared using Delong’s test, and sensitivity and specificity were compared with the chi-square test. Inter-observer agreement was analysed using weighted Kappa values. Statistical analyses were performed using SPSS Statistics (version 24.0) and R software (version 3.3.0), with a two-sided P-value of 0.05 for significance.
Results
Patient demographics
Overall, we included 1126 patients from Medical Centre 1 (mean age, 47.1 ± 10.0 years; 573 benign and 553 malignant lesions) and 157 patients from Medical Centre 2 (mean age, 51.9 ± 11.1 years; 79 benign and 78 malignant lesions). Demographics are summarised in Table 1. Original US and MG BI-RADS categories with histopathologic results are presented in Supplementary Table 1.
Diagnostic performance of individual and fusion models
In the external testing dataset (Fig. 3), DL-UM exhibited significant superiority over DL-U in specificity (0.667 [95% CI, 0.624–0.709] vs. 0.526 [95% CI, 0.487–0.564], P = 0.030) and over DL-M in sensitivity (0.962 [95% CI, 0.945–0.978] vs. 0.833 [95% CI, 0.802–0.865], P = 0.016). This difference was particularly notable in the discordant classification subgroup (Fig. 3 and Supplementary Table 2). Results for development and external test datasets are found in Supplementary Tables 3–8.
Diagnostic performance and management improvement through DL-UM-assisted radiologists
In the external testing dataset, the diagnostic performance of radiologists with DL-UM assistance significantly improved compared to radiologists alone, with area under the ROC curve (AUC) values increased from 0.734–0.835 to 0.898–0.918 (all P < 0.05) and specificities from 57.0%–76.0 to 84.8–86.1% (all P < 0.05) (Table 2; Fig. 4). Meanwhile DL-UM-assisted radiologists achieved a significant reduction of 8.3–18.7% in unnecessary biopsies, regardless of radiologists’ experience (all P < 0.05) (Table 2). Similarly, such improvements were more pronounced in the subgroup of discordant classification cases, with increased AUC from 0.674–0.772 to 0.889–0.910 (all P < 0.05) and specificities from 52.1%–75.0 to 81.3–87.5% (all P < 0.05), and 16.1%–24.6% cases could avoid unnecessary biopsies (all P < 0.001).
The diagnostic performance of radiologists with and without the assistance of DL-UM. a: all cases combined; b: subgroup of concordant cases; c: subgroup of discordant cases. R7: junior radiologist in breast ultrasound; R8: junior radiologist in breast mammography; R9: senior radiologist in breast ultrasound; R10: senior radiologist in breast mammography
Clinical implications of radiologists’ trust for DL-UM diagnostic support
During the DL-UM-assisted workflows, 73.1% of radiologists showed positive acceptance of DL-UM outputs (Supplementary Tables 9 and Fig. 5). Within the discordant subgroup, however, there was a notable increase in the demand for explanations (26.0% of cases) (Table 3). Meanwhile DL-UM explanation resulted in a significant reduction in unnecessary biopsies by 19.2%, and also decreasing the rate of missed malignancies by 10.8%. Additionally, compared to senior radiologists, the utilization of heatmaps allowed junior radiologists to achieve a greater reduction in unnecessary biopsies by 19.6% and in missed diagnoses by 11.5% (Table 3).
Interobserver agreement of radiologists with and without the assistance of DL-UM
Without DL-UM, agreement between US and MG was significantly lower in the discordant subgroup (kappa = 0.048) than that in the concordant subgroup (kappa = 0.618, P < 0.05). With DL-UM assistance, interobserver agreement significantly enhanced with increased weighted kappa from 0.048 to 0.713 (P < 0.05) in the discordant classification subgroup. Moreover, observer agreement between R7 and R9 or between R8 and R10 significantly increased to 0.739 and 0.687, respectively (both P < 0.05) (Fig. 6; Table 4).
Discussion
In this study, the bimodal DL-UM network, integrating US and MG complementary features, significantly improved specificity compared to DL-U and sensitivity compared to DL-M, particularly in the discordant classification subgroup. With the aid of DL-UM, radiologists’ diagnostic accuracy and specificity were significantly enhanced, resulting in a notable reduction in unnecessary biopsies, especially for junior radiologists, and improving consistency between US and MG. The potential of DL-UM in building radiologists’ trust in AI was further emphasised, with heatmaps aiding in preventing unnecessary biopsies and missed malignancies. These findings highlight the value of DL-UM as a complementary tool to assist radiologists in optimizing breast lesion diagnosis and management.
In alignment with prior DL studies on MG [23,24,25], this study found that DL-M demonstrated high specificity but reduced sensitivity, which attributed to the obscuring effect of dense breast parenchyma in two-dimensional imaging of MG [1]. Conversely, as noted in previous AI studies [11, 26, 27], DL-U showed high sensitivity but with low specificity, which may result from overlapping ultrasonic features in benign and malignant lesions, leading to potential misdiagnoses [28]. While DL-UM did not significantly exceed DL-M in specificity or DL-U in sensitivity, it maintained a sensitivity level comparable to DL-U as well as a specificity level comparable to DL-M simultaneously. This result underscored DL-UM’s capability to effectively integrate the complementary diagnostic information of two modalities, mitigating their individual limitations. Such integration enhanced the performance of DL-UM with higher AUC in breast cancer detection and improved adaptability of DL-UM across complex clinical scenarios.
While most previously reported multimodal radiomics studies [29, 30] have shown promise in breast cancer diagnosis, but often stopped at merging outputs from individual modalities [25, 31,32,33], neglecting valuable complementary diagnostic information between multimodalities. By contrast, the DL-UM comprehensively extracted feature across multiple modalities, incorporating both intramodal features from identical imaging data and intermodal features from diverse imaging types. This approach focused on integrating diagnostic features from both US and MG, recognizing their complementary and correlated diagnostic role [34]. When applied to complex scenarios like discordant MG and US BI-RADS classification, DL-UM was optimised through focal loss supervision for challenging and difficult samples and minimising feature disparities between US and MG classifiers with MAE. Our results underscored the effectiveness of the DL-UM, particularly in cases with discordant MG and US classifications.
In stimulated clinical workflows without the involvement of DL-UM, even experienced radiologists exhibited reduced diagnostic accuracy in cases of discordant BI-RADS classification. However, the incorporation of DL-UM significantly enhanced diagnostic performance, particularly benefiting less experienced radiologists who achieved comparable results to their senior counterparts. In this study, 65% (392 out of 603) of discordant cases underwent unnecessary biopsies, a concern often associated with high recall rates and biopsy rates [7, 8]. However, the addition of DL-UM resulted in a notable reduction in unnecessary biopsies. Prior research [35, 36] has reported significant improvements in diagnostic agreement among radiologists with varying levels of experience when AI is introduced, consistent with our findings. Notably, DL-UM assistance improved inter-observer consistency between US and MG, especially in cases with divergent MG and US BI-RADS classifications. Such improvement highlighted AI’s potential to support radiologists with diverse backgrounds in breast imaging, reducing subjective bias and addressing uncertainties in areas such as image interpretation, result communication, and treatment decisions.
Understanding radiologists’ trust in AI is crucial for its integration into clinical practice [37]. Overall, radiologists expressed positive feedback regarding DL-UM outputs, which could enhance their confidence in image interpretation and patient management [38], particularly for junior radiologists. However, trust in DL results diminished when radiologists hesitated, particularly when US and MG classification diverge, leading to a surge in demand for AI explanations. Heatmaps play a vital role in gaining radiologists’ trust in DL-UM by highlighting lesion boundaries, peri-tumoral areas, and calcifications in both US and MG images, aligning closely with visual diagnoses. Adjusting the initial diagnosis based on heatmaps improved breast cancer detection and reduced unnecessary biopsies. Our findings emphasized the necessity of providing explanations in AI implementation, especially for inconclusive diagnoses or when there is skepticism regarding DL-UM output, particularly among less experienced radiologists.
There are still some limitations in this study. First, excluding patients with follow-ups may introduce selection bias. Second, 6.5% (83/1283) of the patients underwent biopsy despite having BI-RADS classifications of 2 or 3 on both MG and US, influenced by factors beyond BI-RADS, such as palpation findings and patient preferences in routine clinical settings. This could potentially affect the practical utility of DL-UM in clinical decision-making. Third, as this study is retrospective and involves only two medical centres, further prospective studies with larger sample sizes from multiple centres are necessary to improve model performance and generalisability. Finally, in this study, ROIs were manually outlined to ensure the consistent targeting of lesions on both US and MG images. However, there are ongoing efforts to develop automated segmentation software for multimodalities to address the requirements of large-scale datasets and integrate them into clinical workflows in the future.
Conclusion
The DL-UM bimodal fusion network, integrating US and MG complementary features, showed good performance for breast lesion diagnosis, particularly for those cases of discordant US and MG BI-RADS classification. The DL-UM network showed great potential to support radiologists in breast lesion diagnosis and management, reducing unnecessary biopsies. Following prospective multicentre clinical trials, the DL-UM network may be evolved into an advanced software module, seamlessly integrating into clinical practice to aid decision-making and advance precision healthcare.
Data availability
Due to the privacy of patients, the data and materials related to patients cannot be available for public access but can be obtained from the corresponding authors on reasonable request approved by the institutional review board of all enrolled centers.
Abbreviations
- US:
-
Ultrasound
- MG:
-
Mammography
- BI-RADS:
-
Breast Imaging Reporting and Data System
- DL:
-
Deep Learning
- AI:
-
Artificial Intelligence
- GAP:
-
Global Average Pooling
- MAE:
-
Mean Absolute Error
- CAMs:
-
Class Activation Maps
- ROC:
-
Receiver Operating Characteristic
References
Hussein H, Abbas E, Keshavarzi S, Fazelzad R, Bukhanov K, Kulkarni S, Au F, Ghai S, Alabousi A, Freitas V. Supplemental breast cancer screening in women with dense breasts and negative mammography: a systematic review and meta-analysis. Radiology (2023) 306(3).
Kunitake J, Sudilovsky D, Johnson LM, Loh HC, Choi S, Morris PG, Jochelson MS, Iyengar NM, Morrow M, Masic A et al. Biomineralogical signatures of breast microcalcifications. Sci Adv (2023) 9(8).
Mann RM, Athanasiou A, Baltzer P, Camps-Herrero J, Clauser P, Fallenberg EM, Forrai G, Fuchsjager MH, Helbich TH, Killburn-Toppin F, et al. Breast cancer screening in women with extremely dense breasts: recommendations of the European society of breast imaging (EUSOBI). Eur Radiol. 2022;32(6):4036–45.
Barba D, Leon-Sosa A, Lugo P, Suquillo D, Torres F, Surre F, Trojman L, Caicedo A. Breast cancer, screening and diagnostic tools: all you need to know. Crit Rev Oncol Hematol. 2021;157:103174.
Glechner A, Wagner G, Mitus JW, Teufer B, Klerings I, Bock N, Grillich L, Berzaczy D, Helbich TH, Gartlehner G. Mammography in combination with breast ultrasonography versus mammography for breast cancer screening in women at average risk. Cochrane Database Syst Rev (2023) 3(3).
Xu Z, Lin Y, Huo J, Gao Y, Lu J, Liang Y, Li L, Jiang Z, Du L, Lang T, et al. A bimodal nomogram as an adjunct tool to reduce unnecessary breast biopsy following discordant ultrasonic and mammographic BI-RADS assessment. Eur Radiol. 2024;34(4):2608–18.
Pu H, Zhang XL, Xiang LH, Zhang JL, Xu G, Liu H, Tang GY, Zhao BH, Wu R. The efficacy of added shear wave elastography (SWE) in breast screening for women with inconsistent mammography and conventional ultrasounds (US). Clin Hemorheol Microcirc. 2019;71(1):83–94.
Shao SH, Li CX, Yao MH, Li G, Li X, Wu R. Incorporation of contrast-enhanced ultrasound in the differential diagnosis for breast lesions with inconsistent results on mammography and conventional ultrasound. Clin Hemorheol Microcirc. 2020;74(4):463–73.
Dhar T, Dey N, Borra S, Sherratt RS. Challenges of deep learning in medical image analysis—improving explainability and trust. IEEE Trans Technol Soc. 2023;4(1):68–75.
Yang Y, Guan S, Ou Z, Li W, Yan L, Situ B. Advances in AI-based cancer cytopathology. Interdiscip Med (2023) 1(3).
Yang Y, Zhong Y, Li J, Feng J, Gong C, Yu Y, Hu Y, Gu R, Wang H, Liu F, et al. Deep learning combining mammography and ultrasound images to predict the malignancy of BI-RADS US 4A lesions in women with dense breasts: a diagnostic study. Int J Surg. 2024;110(5):2604–13.
Assari Z, Mahloojifar A, Ahmadinejad N. A bimodal BI-RADS-guided GoogLeNet-based CAD system for solid breast masses discrimination using transfer learning. Comput Biol Med. 2022;142:105160.
Antropova N, Huynh BQ, Giger ML. A deep feature fusion methodology for breast cancer diagnosis demonstrated on three imaging modality datasets. Med Phys. 2017;44(10):5162–71.
Jiang M, Lei S, Zhang J, Hou L, Zhang M, Luo Y. Multimodal imaging of target detection algorithm under artificial intelligence in the diagnosis of early breast cancer. J Healthc Eng. 2022;2022:9322937.
Drozdov I, Dixon R, Szubert B, Dunn J, Green D, Hall N, Shirandami A, Rosas S, Grech R, Puttagunta S et al. An artificial neural network for nasogastric tube position decision support. Radiol Artif Intell (2023) 5(2).
Yu Q, Ning Y, Wang A, Li S, Gu J, Li Q, Chen X, Lv F, Zhang X, Yue Q, et al. Deep learning-assisted diagnosis of benign and malignant Parotid tumors based on contrast-enhanced CT: a multicenter study. Eur Radiol. 2023;33(9):6054–65.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In. Ithaca: Cornell University Library, arXiv.org;; 2019.
Chen-Yu L, Xie S, Gallagher P, Zhang Z, Tu Z. Deeply-supervised Nets. In. Ithaca: Cornell University Library, arXiv.org;; 2014.
Lin TY, Goyal P, Girshick R, He K, Dollar P. Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell. 2020;42(2):318–27.
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. In. IEEE; 2016:2921–2929.
Shen Y, Shamout FE, Oliver JR, Witowski J, Kannan K, Park J, Wu N, Huddleston C, Wolfson S, Millet A, et al. Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams. Nat Commun. 2021;12(1):5645.
Wang SJ, Liu HQ, Yang T, Huang MQ, Zheng BW, Wu T, Qiu C, Han LQ, Ren J. Automated breast volume scanner (ABVS)-based radiomic nomogram: a potential tool for reducing unnecessary biopsies of BI-RADS 4 lesions. Diagnostics (2022) 12(1).
Romero-Martin S, Elias-Cabot E, Raya-Povedano JL, Gubern-Merida A, Rodriguez-Ruiz A, Alvarez-Benito M. Stand-alone use of artificial intelligence for digital mammography and digital breast tomosynthesis screening: a retrospective evaluation. Radiology. 2022;302(3):535–42.
Lauritzen AD, Rodriguez-Ruiz A, von Euler-Chelpin MC, Lynge E, Vejborg I, Nielsen M, Karssemeijer N, Lillholm M. An artificial intelligence-based mammography screening protocol for breast cancer: outcome and radiologist workload. Radiology. 2022;304(1):41–9.
Tan T, Rodriguez-Ruiz A, Zhang T, Xu L, Beets-Tan R, Shen Y, Karssemeijer N, Xu J, Mann RM, Bao L. Multi-modal artificial intelligence for the combination of automated 3D breast ultrasound and mammograms in a population of women with predominantly dense breasts. Insights Imaging. 2023;14(1):10.
Yi M, Lin Y, Lin Z, Xu Z, Li L, Huang R, Huang W, Wang N, Zuo Y, Li N et al. Biopsy or follow-up: AI improves the clinical strategy of US BI-RADS 4A breast nodules using a convolutional neural network. Clin Breast Cancer (2024) 24(5).
Gu Y, Xu W, Liu T, An X, Tian J, Ran H, Ren W, Chang C, Yuan J, Kang C, et al. Ultrasound-based deep learning in the establishment of a breast lesion risk stratification system: a multicenter study. Eur Radiol. 2023;33(4):2954–64.
Lang M, Liang P, Shen H, Li H, Yang N, Chen B, Chen Y, Ding H, Yang W, Ji X, et al. Head-to-head comparison of perfluorobutane contrast-enhanced US and multiparametric MRI for breast cancer: a prospective, multicenter study. Breast Cancer Res. 2023;25(1):61.
Misra S, Yoon C, Kim KJ, Managuli R, Barr RG, Baek J, Kim C. Deep learning-based multimodal fusion network for segmentation and classification of breast cancers using B-mode and elastography ultrasound images. Bioeng Transl Med (2023) 8(6).
Xu Z, Wang Y, Chen M, Zhang Q. Multi-region radiomics for artificially intelligent diagnosis of breast cancer using multimodal ultrasound. Comput Biol Med. 2022;149:105920.
Jiang T, Song J, Wang X, Niu S, Zhao N, Dong Y, Wang X, Luo Y, Jiang X. Intratumoral and peritumoral analysis of mammography, tomosynthesis, and multiparametric MRI for predicting Ki-67 level in breast cancer: a radiomics-based study. Mol Imaging Biol. 2022;24(4):550–9.
Chen S, Guan X, Shu Z, Li Y, Cao W, Dong F, Zhang M, Shao G, Shao F. A new application of multimodality radiomics improves diagnostic accuracy of nonpalpable breast lesions in patients with microcalcifications-only in mammography. Med Sci Monit. 2019;25:9786–93.
Huang W, Tan K, Zhang Z, Hu J, Dong S. A review of fusion methods for omics and imaging data. IEEE/ACM Trans Comput Biol Bioinform. 2023;20(1):74–93.
Xi X, Li W, Li B, Li D, Tian C, Zhang G. Modality-correlation embedding model for breast tumor diagnosis with mammography and ultrasound images. Comput Biol Med. 2022;150:106130.
Kim HJ, Choi WJ, Gwon HY, Jang SJ, Chae EY, Shin HJ, Cha JH, Kim HH. Improving mammography interpretation for both novice and experienced readers: a comparative study of two commercial artificial intelligence software. Eur Radiol. 2024;34(6):3924–34.
Lopez-Almazan H, Javier PF, Larroza A, Perez-Cortes JC, Pollan M, Perez-Gomez B, Salas TD, Casals M, Llobet R. A deep learning framework to classify breast density with noisy labels regularization. Comput Methods Programs Biomed. 2022;221:106885.
Ho S, Doig GS, Ly A. Attitudes of optometrists towards artificial intelligence for the diagnosis of retinal disease: a cross-sectional mail-out survey. Ophthalmic Physiol Opt. 2022;42(6):1170–9.
Calisto FM, Santiago C, Nunes N, Nascimento JC. BreastScreening-AI: evaluating medical intelligent agents for human-AI interactions. Artif Intell Med. 2022;127:102285.
Acknowledgements
None.
Funding
This research was supported by the National Natural Science Foundation of China (82271998 and 82071949), College Students’ Innovative Entrepreneurial Training Plan Program (202212121022), and Guangzhou Municipal Science and Technology Department: 2023 Key Research and Development Plan Projects (2023B03J1350).
Author information
Authors and Affiliations
Contributions
Conceptualization, data curation, formal analysis, methodology, writing– original draft: Z.X., S.Z. and Y.G.; validation: J.H. and W.X.; resources: W.H. and J.Z.; investigation: C.Z., Q.D., L.L., Z.J., T.Z., S.X., J.L.; writing– review & editing: X.H., G.W., Y.Z. and Y.L.; project administration: G.W., Y.Z. and Y.L.; funding acquisition: Y.L. The authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
This study was approved by the Institutional Ethics Committee of the hospital (NFEC-202012-K8) and the requirement for informed consent was waived owing to the retrospective design and use of anonymized data.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Xu, Z., Zhong, S., Gao, Y. et al. Optimizing breast lesions diagnosis and decision-making with a deep learning fusion model integrating ultrasound and mammography: a dual-center retrospective study. Breast Cancer Res 27, 80 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13058-025-02033-6
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13058-025-02033-6