- Reeser J.W.
- Martin D.
- Miya J.
- Kautto E.A.
- Lyon E.
- Zhu E.
- Wing M.R.
- Smith A.
- Reeder R.
- Samorodnitsky E.
- Parks H.
- Naik K.R.
- Gozgit J.
- Nowacki N.
- Davies K.D.
- Varella-Garcia M.
- Yu L.
- Freud A.G.
- Coleman J.
- Aisner D.L.
- Roychowdhury S.
- Togni M.
- Masetti R.
- Pigazzi M.
- Astolfi A.
- Zama D.
- Indio V.
- Serravalle S.
- Manara E.
- Bisio V.
- Rizzari C.
- Basso G.
- Pession A.
- Locatelli F.
- Kloosterman W.P.
- van den Braak R.R.C.
- Pieterse M.
- Van Roosmalen M.J.
- Sieuwerts A.M.
- Stangl C.
- Brunekreef R.
- Lalmahomed Z.S.
- Ooft S.
- Galen A.V.
- Smid M.
- Lefebvre A.
- Zwartkruis F.
- Martens J.W.M.
- Foekens J.A.
- Biermann K.
- Koudijs M.J.
- Ijzermans J.N.M.
- Voest E.E.
- Kloosterman W.P.
- van den Braak R.R.C.
- Pieterse M.
- Van Roosmalen M.J.
- Sieuwerts A.M.
- Stangl C.
- Brunekreef R.
- Lalmahomed Z.S.
- Ooft S.
- Galen A.V.
- Smid M.
- Lefebvre A.
- Zwartkruis F.
- Martens J.W.M.
- Foekens J.A.
- Biermann K.
- Koudijs M.J.
- Ijzermans J.N.M.
- Voest E.E.
Materials and Methods
Patients and Samples
Disease | N |
---|---|
Aplastic anemia | 12 |
Acute lymphoblastic leukemia | 89 |
Acute myeloid leukemia | 352 |
Brain tumors | 44 |
Breast cancer | 137 |
Burkitt lymphoma | 10 |
Carcinoma (not otherwise specified) | 32 |
Clear cell renal cell carcinoma | 8 |
Cholangiocarcinoma | 9 |
Chronic lymphocytic leukemia | 167 |
Chronic myeloid leukemia | 46 |
Chronic myelomonocytic leukemia | 97 |
Colorectal carcinoma | 308 |
Diffuse large B-cell lymphoma | 746 |
Endometrial cancer | 113 |
Esophageal carcinoma | 34 |
Follicular lymphoma | 145 |
Gallbladder carcinoma | 4 |
Gastric carcinoma | 10 |
Gastrointestinal stromal tumor | 11 |
Hairy cell leukemia | 5 |
Head and neck tumor | 4 |
Hodgkin lymphoma | 65 |
Lung cancer | 794 |
Lymphoma (not otherwise classified) | 3 |
Mantle cell lymphoma | 93 |
Marginal zone lymphoma | 76 |
Myelodysplastic syndrome | 316 |
Melanoma | 21 |
Multiple myeloma | 113 |
Myeloproliferative neoplasms | 88 |
Neuroendocrine tumor | 5 |
Normal bone marrow, fresh | 782 |
Normal lymph node | 24 |
Ovarian cancer | 126 |
Pancreatic cancer | 96 |
Prostate cancer | 36 |
Sarcoma | 137 |
Squamous cell carcinoma of skin | 15 |
T-cell acute lymphoblastic leukemia | 7 |
T-cell lymphoma | 145 |
Thyroid cancer | 24 |
Upper gastrointestinal cancer | 23 |
Urothelial cancer | 38 |
Vulva cancer | 9 |
Waldenstrom macroglobulinemia | 31 |
Total | 5450 |
RNA Library Construction and Sequencing
Using Machine Learning Algorithm for Classification of Two Diagnostic Classes
- Albitar M.
- Zhang H.
- Goy A.
- Xu-Monette Z.Y.
- Bhagat G.
- Visco C.
- Tzankov A.
- Fang X.
- Zhu F.
- Dybkaer K.
- Chiu A.
- Tam W.
- Zu Y.
- Hsi E.D.
- Hagemeister F.B.
- Huh J.
- Ponzoni M.
- Ferreri A.J.M.
- Møller M.B.
- Parsons B.M.
- van Krieken J.H.
- Piris M.A.
- Winter J.N.
- Li Y.
- Xu B.
- Young K.H.
where m is the number of classes, ni is the number of cases in the class, and ti is the number of correctly classified cases in class i estimated using the k-fold cross-validation.
where MSB was the mean sum of squares between groups, MSW was the mean sum of squares within groups, and F was the analysis of variance coefficient following the F distribution. The P value was obtained from the F value. This confidence value provided the measure of the stability and robustness of the gene in the classifying groups. It did not provide concrete classification accuracy but contributed the overall confidence in the differences of the class means. Both criteria provided quantitative measures of the relevance of a gene for classification; however, these two relevance measures did not always produce the same ranking. Applying both measures would produce effective and stable gene selection methods for machine learning–based classification systems.
Using GMNB Classifier for Ranking Diagnostic Classes
Results
High Accuracy in the Differential Diagnosis between Two Diagnostic Classes
Two classes | AUC (95% CI) | Sensitivity, % | Specificity, % | Genes, N | Leave-one-out: AUC − 1 (95% CI) |
---|---|---|---|---|---|
Normal versus AML | 0.9764 (0.954–0.974) | 90.9 | 93.2 | 100 | 0.945 (0.933–0.957) |
Normal versus ALL | 0.981 (0.973–0.989) | 95.1 | 95.5 | 200 | 0.977 (0.968–0.985) |
Normal versus CLL | 0.997 (0.994–0.999) | 96.4 | 98.8 | 100 | 0.980 (0.973–0.988) |
Normal versus mantle | 0.992 (0.987–0.997) | 95.1 | 97.8 | 100 | 0.969 (0.959–0.980) |
Normal versus MDS | 0.831 (0.801–0.861) | 78.1 | 75.3 | 400 | 0.826 (0.796–0.856) |
Normal versus MPN | 0.923 (0.884–0.962) | 90.9 | 82.3 | 400 | 0.903 (0.860–0.946) |
MDS versus MPN | 0.884 (0.837–0.931) | 90.9 | 70.8 | 500 | 0.806 (0.748–0.864) |
AML versus MDS | 0.880 (0.854–0.906) | 86.1 | 70.2 | 400 | 0.864 (0.837–0.892) |
CLL versus mantle | 0.986 (0.968–1.000) | 94.6 | 95.2 | 10 | 0.986 (0.968–1.00) |
Marginal versus CLL | 0.984 (0.964–1.00) | 98.7 | 91 | 25 | 0.864 (0.809–0.920) |
Marginal versus follicular | 0.946 (0.917–0.974) | 91 | 93.4 | 550 | 0.942 (0.912–0.971) |
Hodgkin versus normal LN | 0.990 (0.972–1.00) | 95.4 | 100 | 100 | 1.00 (1.00–1.00) |
Hodgkin versus T-cell lymphoma | 0.963 (0.930–0.996) | 92.3 | 91 | 500 | 0.902 (0.850–0.954) |
Hodgkin versus DLBCL | 0.975 (0.948–1.00) | 96.9 | 95.3 | 500 | 0.965 (0.934–0.997) |
DLBCL versus follicular | 0.986 (0.972–0.999) | 95.9 | 93.1 | 600 | 0.975 (0.957–0.993) |
DLBCL versus T-cell lymphoma | 0.967 (−0.946 to 0.988) | 91.7 | 89.8 | 600 | 0.942 (0.915–0.969) |
Lung versus colorectal | 0.982 (0.975–0.989) | 97.2 | 94.5 | 900 | 0.977 (0.969–0.985) |
Lung versus breast | 0.988 (0.982–0.994) | 98 | 92.7 | 700 | 0.988 (0.982–0.994) |
Breast versus ovarian | 0.994 (0.984–1.00) | 100 | 94.2 | 700 | 0.989 (0.976–1.00) |
Ovarian versus endometrial | 0.959 (0.933–0.984) | 92.9 | 91.2 | 600 | 0.853 (0.803–0.902) |
Breast versus colorectal | 0.997 (0.991–1.00) | 97.8 | 98.7 | 800 | 0.987 (0.973–1.00) |
Pancreas versus colorectal | 0.989 (0.980–0.997) | 94.5 | 95.8 | 550 | 0.971 (0.956–0.985) |
Pancreas versus esophageal | 0.999 (0.990–1.00) | 97.1 | 98.9 | 550 | 0.960 (0.914–1.00) |
Ovarian versus lung | 0.994 (0.984–1.00) | 97.6 | 96.6 | 600 | 1.00 (0.997–1.00) |
Lung versus DLBCL | 0.996 (0.992–0.999) | 97.2 | 97.3 | 800 | 0.988 (0.983–0.993) |
Sarcoma versus ovarian | 0.995 (0.986–1.00) | 99.2 | 95.7 | 300 | 1.00 (0.997–1.00) |
Sarcoma versus GIST | 1.00 (0.997–1.00) | 99.3 | 100 | 300 | 1.00 (0.997–1.00) |

Differential Diagnosis between 47 Different Diagnostic Classes with Ranking
Diagnosis | Cases, N | Accurate diagnosis as first choice (PPA), n (%) | PPV, % | Accurate diagnosis as second choice, n (%) | PPA by first and second choices, % |
---|---|---|---|---|---|
ALL | 26 | 26 (100) | 84 | 0 (0) | 100 |
Colorectal | 101 | 83 (82) | 79 | 4 (4) | 86 |
Brain | 16 | 12 (75) | 75 | 0 (0) | 75 |
Lung | 201 | 177 (88) | 73 | 7 (3) | 91 |
DLBCL | 149 | 127 (85) | 73 | 8 (5) | 91 |
Breast | 31 | 25 (81) | 71 | 2 (6) | 87 |
CLL | 61 | 44 (72) | 69 | 5 (8) | 80 |
Endometrial | 31 | 21 (68) | 66 | 3 (10) | 78 |
MM | 31 | 22 (71) | 65 | 0 (0) | 71 |
Ovarian | 41 | 29 (71) | 63 | 6 (15) | 85 |
Pancreas | 31 | 19 (61) | 58 | 5 (16) | 77 |
Follicular | 36 | 26 (72) | 53 | 5 (14) | 86 |
Mantle | 31 | 18 (58) | 50 | 3 (10) | 68 |
Sarcoma | 40 | 26 (65) | 45 | 1 (3) | 68 |
Hodgkin | 26 | 16 (62) | 41 | 9 (35) | 97 |
Normal | 201 | 92 (46) | 37 | 39 (19) | 65 |
AML | 120 | 106 (88) | 35 | 6 (5) | 93 |
T cell | 41 | 21 (51) | 34 | 8 (20) | 71 |
Marginal | 26 | 8 (31) | 26 | 4 (15) | 46 |
MDS | 101 | 19 (19) | 13 | 47 (47) | 65 |
MPN | 26 | 3 (12) | 9 | 3 (12) | 23 |
CMML | 31 | 2 (6) | 4 | 2 (6) | 13 |
CML | 17 | 0 (0) | 0 | 1 (6) | 6 |
Diagnosis | Cases, N | Cases correctly diagnosed as first choice (PPA), n (%) | Sensitivity (95% CI), % | Specificity (95% CI), % | Cases correctly diagnosed as second choice (PPA), n (%) | Cases correctly diagnosed as first and second choices (PPA), % |
---|---|---|---|---|---|---|
Lymphoid | 427 | 389 (91) | 77 (72–81) | 88 (86–90) | 20 (5) | 96 |
Myeloid | 295 | 258 (87) | 44 (38–49) | 77 (75–80) | 26 (9) | 96 |
Carcinoma | 452 | 427 (94) | 81 (77–84) | 95 (92–96) | 17 (4) | 98 |
Normal | 201 | 93 (46) | 46 (39–53) | 96 (95–97) | 41 (20) | 67 |
Sarcoma | 40 | 26 (65) | 65 (48–79) | 99 (98–99) | 1 (3) | 68 |
Total | 1415 | 1189 (84) | 109 (8) | 92 |
Discussion
Acknowledgment
Author Contributions
Supplemental Data
- Supplemental Figure S1
Receiver operating characteristic curves for the prediction of diagnoses between two diagnostic classes using RNA combined with the machine learning algorithm. The area under the curve (AUC) and 95% CI are shown for various diagnostic classes. The number of genes used for distinguishing between diagnostic classes is shown. ALL, acute lymphoblastic leukemia; AML, acute myeloid leukemia; CLL, chronic lymphocytic leukemia; FPF, false positive fraction (specificity); GIST, gastrointestinal stromal tumor; TPF, true positive fraction (sensitivity).
- Supplemental Table S1
References
- Artificial intelligence and cancer.Nat Cancer. 2020; 1: 149-152
- Next-generation artificial intelligence for diagnosis: from predicting diagnostic labels to “wayfinding.”.JAMA. 2021; 326: 2467-2468
- Artificial intelligence in cancer research, diagnosis and therapy.Nat Rev Cancer. 2021; 21: 747-752
- Stable feature selection based on the ensemble L 1-norm support vector machine for biomarker discovery.BMC Genomics. 2016; 17: 65-74
- RNA sequencing: new technologies and applications in cancer research.J Hematol Oncol. 2020; 13: 1-16
- High-throughput approaches for precision medicine in high-grade serous ovarian cancer.J Hematol Oncol. 2020; 13: 1-20
- Targeted RNA sequencing reveals the deep complexity of the human transcriptome.Nat Biotechnol. 2012; 30: 99-104
- Validation of a targeted RNA sequencing assay for kinase fusion detection in solid tumors.J Mol Diagn. 2017; 19: 682-696
- Identification of the NUP98-PHF23 fusion gene in pediatric cytogenetically normal acute myeloid leukemia by whole-transcriptome sequencing.J Hematol Oncol. 2015; 8: 1-3
- Recurrent and pathological gene fusions in breast cancer: current advances in genomic discovery and clinical implications.Breast Cancer Res Treat. 2016; 158: 219-232
- A systematic analysis of oncogenic gene fusions in primary colon cancer.Cancer Res. 2017; 77: 3814-3822
- The functions and clinical significance of circRNAs in hematological malignancies.J Hematol Oncol. 2020; 13: 1-15
- Role of microRNAs, circRNAs and long noncoding RNAs in acute myeloid leukemia.J Hematol Oncol. 2019; 12: 1-20
- Principles and innovative technologies for decrypting noncoding RNAs: from discovery and functional prediction to clinical application.J Hematol Oncol. 2020; 13: 1-27
- Determining clinical course of diffuse large B-cell lymphoma using targeted transcriptome and machine learning algorithms.Blood Cancer J. 2022; 12: 25
- The emerging clinical relevance of genomics in cancer medicine.Nat Rev Clin Oncol. 2018; 15: 353-365
- An evolutionary perspective on field cancerization.Nat Rev Cancer. 2018; 18: 19-32
- Hallmarks of cancer: new dimensions.Cancer Discov. 2022; 12: 31-46
- Tumor microenvironment as a “game changer” in cancer radiotherapy.Int J Mol Sci. 2019; 20: 3212
Article info
Publication history
Footnotes
Supported by the Genomic Testing Cooperative.
Disclosures: M.A., H.Z., A.C., I.D.D., and W.M. work and own stocks in a diagnostic company that offers RNA sequencing using artificial intelligence. J.M. served on a speaker’s bureau for Amgen, Bristol Myers Squibb, Incyte, Jazz Pharmaceuticals, Stemline, and Takeda; and has served as a consultant for AbbVie, CTI BioPharma, and Novartis. A.G. has consulting/advisory board/honoraria from AstraZeneca, SecuraBio, and TG Therapeutics, not relevant to this work. M.W., A.I., M.D., D.S., M.G. and A.P. have no relevant conflict of interest.
Identification
Copyright
User license
Creative Commons Attribution – NonCommercial – NoDerivs (CC BY-NC-ND 4.0) |
Permitted
For non-commercial purposes:
- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article (private use only, not for distribution)
- Reuse portions or extracts from the article in other works
Not Permitted
- Sell or re-use for commercial purposes
- Distribute translations or adaptations of the article
Elsevier's open access license policy
ScienceDirect
Access this article on ScienceDirectLinked Article
- This Month in AJPThe American Journal of PathologyVol. 193Issue 1
- PreviewThe cellular mechanisms underlying cognitive symptoms in rodent bile duct ligation (BDL) models of cholestasis are unclear. Using BDL mice and human neuronal cell cultures, Gee et al (Am J Pathol 2023, 11–26) studied these mechanisms. BDL reduced spatial learning capabilities as well as exhibited cellular changes, which were partly restored by treatment with obeticholic acid (OCA)—farnesoid X receptor (FXR) agonist and a clinically approved treatment agent. OCA therapy or FXR agonism may limit cholestasis-induced neuronal senescence.
- Full-Text
- Preview
- Creating a More Welcoming Home for Your Work at The American Journal of PathologyThe American Journal of PathologyVol. 193Issue 1
- PreviewWhen Martha Furie became Editor-in-Chief of The American Journal of Pathology (AJP) in 2018, her target was for AJP to achieve an Impact Factor of greater than 5.0 by the end of her five-year term. We were delighted to learn this past June that we made it! The most recent Impact Factor is 5.77, and AJP remains the most highly cited journal in the field of pathology. This goal was attained through the dedicated efforts of the AJP team; the contribution of high-quality, original research by AJP's authors; and an emphasis on increasing the number of timely review articles and theme issues.
- Full-Text
- Preview