Advertisement

Deep Domain Adversarial Learning for Species-Agnostic Classification of Histologic Subtypes of Osteosarcoma

Open AccessPublished:October 26, 2022DOI:https://doi.org/10.1016/j.ajpath.2022.09.009
      Osteosarcomas (OSs) are aggressive bone tumors with many divergent histologic patterns. During pathology review, OSs are subtyped based on the predominant histologic pattern; however, tumors often demonstrate multiple patterns. This high tumor heterogeneity coupled with scarcity of samples compared with other tumor types render histology-based prognosis of OSs challenging. To combat lower case numbers in humans, dogs with spontaneous OSs have been suggested as a model species. Herein, we adversarially train a convolutional neural network to classify distinct histologic patterns of OS in humans using mostly canine OS data during training. We show that adversarial training improves domain adaption of a histologic subtype classifier from canines to humans, achieving an average multiclass F1 score of 0.77 (95% CI, 0.74–0.79) and 0.80 (95% CI, 0.78–0.81) when compared with the ground truth in canines and humans, respectively. Finally, we applied our trained model to characterize the histologic landscape of 306 canine OSs and uncovered distinct clusters with markedly different clinical responses to standard-of-care therapy.
      Osteosarcoma (OS) is a rare but aggressive pediatric malignancy with approximately 800 cases reported annually in the United States.
      • Ottaviani G.
      • Jaffe N.
      The epidemiology of osteosarcoma.
      Patients with metastatic or relapsed disease have dismal outcomes, with survival rates of <30% despite aggressive salvage regimens that typically include additional surgery, radiotherapy, and chemotherapy with agents such as ifosfamide, etoposide, cyclophosphamide, gemcitabine, and topotecan.
      • Misaghi A.
      • Goldin A.
      • Awad M.
      • Kulidjian A.A.
      Osteosarcoma: a comprehensive review.
      Most osteosarcomas display osteoblastic differentiation, sometimes intermixed with one or more additional histologic patterns, including chondroblastic, fibroblastic, giant cell rich, and vessel rich.
      • Beck J.
      • Ren L.
      • Huang S.
      • Berger E.
      • Bardales K.
      • Mannheimer J.
      • Mazcko C.
      • LeBlanc A.
      Canine and murine models of osteosarcoma.
      • Maxie G.G.
      • Meuten D.J.
      Tumors in Domestic Animals.
      Currently, the only reliable histologic marker for prognosis in human OS is the amount of necrosis achieved after neoadjuvant chemotherapy.
      • Gorlick R.
      • Meyers P.A.
      Osteosarcoma necrosis following chemotherapy: innate biology versus treatment-specific.
      This assessment is based on review of tumor sections harvested after local tumor control via surgery. Despite this, there is a subset of patients with high necrosis that still develop metastatic disease after completion of frontline therapy. Hence, additional prognostic biomarkers are needed for accurate prognosis prediction. Because naturally occurring canine osteosarcoma has strong biological, molecular, and histologic similarities to human osteosarcoma and is at least 10 times more common than human osteosarcoma, it can serve as a powerful translational model for cancer biomarker investigation and drug development.
      • LeBlanc A.K.
      • Breen M.
      • Choyke P.
      • Dewhirst M.
      • Fan T.M.
      • Gustafson D.L.
      • Helman L.J.
      • Kastan M.B.
      • Knapp D.W.
      • Levin W.J.
      • London C.
      • Mason N.
      • Mazcko C.
      • Olson P.N.
      • Page R.
      • Teicher B.A.
      • Thamm D.H.
      • Trent J.M.
      • Vail D.M.
      • Khanna C.
      Perspectives from man's best friend: National Academy of Medicine's Workshop on Comparative Oncology.
      • LeBlanc A.K.
      • Mazcko C.N.
      Improving human cancer therapy through the evaluation of pet dogs.
      • LeBlanc A.K.
      • Mazcko C.N.
      • Khanna C.
      Defining the value of a comparative approach to cancer drug development.
      In dogs with OS, standard of care consists of amputation of the affected limb to achieve local tumor control, followed by systemic platinum and/or anthracycline-based chemotherapy.
      • Selmic L.E.
      • Burton J.H.
      • Thamm D.H.
      • Withrow S.J.
      • Lana S.E.
      Comparison of carboplatin and doxorubicin-based chemotherapy protocols in 470 dogs after amputation for treatment of appendicular osteosarcoma.
      However, many clinical studies demonstrate that development of metastases, most often to the lungs, occurs in >90% of canine patients within several months of diagnosis.
      • Selmic L.E.
      • Burton J.H.
      • Thamm D.H.
      • Withrow S.J.
      • Lana S.E.
      Comparison of carboplatin and doxorubicin-based chemotherapy protocols in 470 dogs after amputation for treatment of appendicular osteosarcoma.
      • Al-Khan A.A.
      • Nimmo J.S.
      • Day M.J.
      • Tayebi M.
      • Ryan S.D.
      • Kuntz C.A.
      • Simcock J.O.
      • Tarzi R.
      • Saad E.S.
      • Richardson S.J.
      • Danks J.A.
      Fibroblastic subtype has a favourable prognosis in appendicular osteosarcoma of dogs.
      • LeBlanc A.K.
      • Mazcko C.N.
      • Cherukuri A.
      • Berger E.P.
      • Kisseberth W.C.
      • Brown M.E.
      • et al.
      Adjuvant sirolimus does not improve outcome in pet dogs receiving standard-of-care therapy for appendicular osteosarcoma: a prospective, randomized trial of 324 dogs.
      • Nagamine E.
      • Hirayama K.
      • Matsuda K.
      • Okamoto M.
      • Ohmachi T.
      • Kadosawa T.
      • Taniyama H.
      Diversity of histologic patterns and expression of cytoskeletal proteins in canine skeletal osteosarcoma.
      • Skorupski K.A.
      • Uhl J.M.
      • Szivek A.
      • Allstadt Frazier S.D.
      • Rebhun R.B.
      • Rodriguez Jr., C.O.
      Carboplatin versus alternating carboplatin and doxorubicin for the adjuvant treatment of canine appendicular osteosarcoma: a randomized, phase III trial.
      In contrast to humans, the clinical workflow in dogs does not allow for assessment of response to neoadjuvant therapy, but rather access to the entire tumor at the time of diagnosis via limb amputation. This allows a greater area of untreated tumor for analysis and correlation with outcomes of that specific patient.
      Furthermore, in canine OS, beyond tumor stage (ie, de novo metastatic disease), there are no known consistent prognostic features either within the primary tumor histology or other patient factors, such as tumor location, alkaline phosphatase status, and age/sex/breed.
      • Selmic L.E.
      • Burton J.H.
      • Thamm D.H.
      • Withrow S.J.
      • Lana S.E.
      Comparison of carboplatin and doxorubicin-based chemotherapy protocols in 470 dogs after amputation for treatment of appendicular osteosarcoma.
      ,
      • LeBlanc A.K.
      • Mazcko C.N.
      • Cherukuri A.
      • Berger E.P.
      • Kisseberth W.C.
      • Brown M.E.
      • et al.
      Adjuvant sirolimus does not improve outcome in pet dogs receiving standard-of-care therapy for appendicular osteosarcoma: a prospective, randomized trial of 324 dogs.
      ,
      • Skorupski K.A.
      • Uhl J.M.
      • Szivek A.
      • Allstadt Frazier S.D.
      • Rebhun R.B.
      • Rodriguez Jr., C.O.
      Carboplatin versus alternating carboplatin and doxorubicin for the adjuvant treatment of canine appendicular osteosarcoma: a randomized, phase III trial.
      Studies examining the prognostic significance of histologic subtype have identified conflicting findings in different data sets.
      • Al-Khan A.A.
      • Nimmo J.S.
      • Day M.J.
      • Tayebi M.
      • Ryan S.D.
      • Kuntz C.A.
      • Simcock J.O.
      • Tarzi R.
      • Saad E.S.
      • Richardson S.J.
      • Danks J.A.
      Fibroblastic subtype has a favourable prognosis in appendicular osteosarcoma of dogs.
      ,
      • Nagamine E.
      • Hirayama K.
      • Matsuda K.
      • Okamoto M.
      • Ohmachi T.
      • Kadosawa T.
      • Taniyama H.
      Diversity of histologic patterns and expression of cytoskeletal proteins in canine skeletal osteosarcoma.
      In this study, we took advantage of a larger patient cohort accumulated during a prospective randomized clinical trial conducted in >300 canine patients.
      • LeBlanc A.K.
      • Mazcko C.N.
      • Cherukuri A.
      • Berger E.P.
      • Kisseberth W.C.
      • Brown M.E.
      • et al.
      Adjuvant sirolimus does not improve outcome in pet dogs receiving standard-of-care therapy for appendicular osteosarcoma: a prospective, randomized trial of 324 dogs.
      This yielded a well-annotated canine OS data set in which to examine osteosarcoma histology and explore the potential of artificial intelligence (AI)–derived biomarkers. Specifically, we investigate whether techniques in AI using adversarial learning could support the development of a histologic subtype classifier for osteosarcomas that adapts from dogs to humans and a prognostic signature in dogs based on digital pathology whole slide images.
      • Bera K.
      • Schalper K.A.
      • Rimm D.L.
      • Velcheti V.
      • Madabhushi A.
      Artificial intelligence in digital pathology - new tools for diagnosis and precision oncology.
      • Harmon S.A.
      • Sanford T.H.
      • Brown G.T.
      • Yang C.
      • Mehralivand S.
      • Jacob J.M.
      • Valera V.A.
      • Shih J.H.
      • Agarwal P.K.
      • Choyke P.L.
      • Turkbey B.
      Multiresolution application of artificial intelligence in digital pathology for prediction of positive lymph nodes from primary tumors in bladder cancer.
      • Harmon S.A.
      • Tuncer S.
      • Sanford T.
      • Choyke P.L.
      • Turkbey B.
      Artificial intelligence at the intersection of pathology and radiology in prostate cancer.

      Materials and Methods

      Curation of Hematoxylin and Eosin–Stained Slides of Dog and Human Osteosarcomas

      Canine OS tumor samples were curated from a multisite clinical trial.
      • LeBlanc A.K.
      • Mazcko C.N.
      • Cherukuri A.
      • Berger E.P.
      • Kisseberth W.C.
      • Brown M.E.
      • et al.
      Adjuvant sirolimus does not improve outcome in pet dogs receiving standard-of-care therapy for appendicular osteosarcoma: a prospective, randomized trial of 324 dogs.
      Tumors were biopsied pre-amputation and diagnosed as osteosarcoma by anatomic pathologists at Comparative Oncology Trials Consortium (COTC) institutions (https://ccr.cancer.gov/comparative-oncology-program/consortium, last accessed May 13, 2022). At the time of surgical limb amputation, additional tumor tissue was collected by COTC investigators as a part of the standard-of-care portion of the trial schema. All tumors were collected before any treatment. Dogs were randomized to receive either standard of care or standard of care + adjuvant sirolimus (rapamycin) therapy. Statistical analysis of the primary clinical outcomes of the entire cohort of dogs found no differences in disease-free interval or survival between the two arms; thus, cases were included together in the analysis presented herein. In addition, we obtained 39 human osteosarcoma samples from an in-house pathology residency training cohort. Of these 39 samples, only 11 were utilized in our study for validation of domain-agnostic features. Tumor tissue was placed in 10% neutral-buffered formalin for 24 hours and then subjected to EDTA slow decalcification. Tissue was then sectioned and stained with hematoxylin and eosin, according to standard histopathologic practice. Three canine cases were excluded from this study as slides from these cases were not available. Slides from remaining 306 canine cases and 39 human cases were digitized using Hamamatsu S60 digital scanner (Hamamatsu Photonics, Hamamatsu, Japan) in ×40 magnification or 0.23 μm per pixel. No additional manual quality control of surgical tumor specimen size or percentage tumor tissue was completed before data collection. The methods were performed in accordance with relevant guidelines and regulations and approved by each participating COTC veterinary institution that enrolled canine patients onto the clinical trials from which the image data were derived.

      Annotation and Preprocessing of Whole Slide Image Data

      Pathologist annotations for 95 dog slides and 11 human slides were obtained in xml format using HALO (Albuquerque, NM). Each annotation file contained coordinates of roughly marked region boundaries for each histologic subtype within each slide. Because osteoblastic subtype is the most dominant subtype in osteosarcoma, the main tumor areas were marked and annotated as osteoblastic. Any regions within this area exhibiting divergent histology were annotated as necrotic: vessel rich (VR), chondroblastic, fibroblastic, or giant cell rich.
      • Beck J.
      • Ren L.
      • Huang S.
      • Berger E.
      • Bardales K.
      • Mannheimer J.
      • Mazcko C.
      • LeBlanc A.
      Canine and murine models of osteosarcoma.
      ,
      • Meuten D.J.
      Tumors in Domestic Animals.
      In canine tumors annotated as VR, CD31 immunohistochemistry was used to confirm the presence of tumor cell (CD31-) lined vascular spaces (Supplemental Figure S1).
      • Ferrer L.
      • Fondevila D.
      • Rabanal R.M.
      • Vilafranca M.
      Immunohistochemical detection of CD31 antigen in normal and neoplastic canine endothelial cells.
      ,
      • Giuffrida M.A.
      • Bacon N.J.
      • Kamstock D.A.
      Use of routine histopathology and factor VIII-related antigen/von Willebrand factor immunohistochemistry to differentiate primary hemangiosarcoma of bone from telangiectatic osteosarcoma in 54 dogs.
      Any unmarked regions falling outside main tumor areas were classified as other and consisted primarily of nontumor tissue, osteoid formations, and, in some cases, slide preparation artifacts, such as folded tissue and slide debris.
      Training deep learning models on whole slide image tiles extracted from multiple magnifications has proven to be effective in a weakly supervised learning setting where region-level annotations by pathologists are not available and histologic features of interest are open ended.
      • Campanella G.
      • Hanna M.G.
      • Geneslaw L.
      • Miraflor A.
      • Werneck Krauss Silva V.
      • Busam K.J.
      • Brogi E.
      • Reuter V.E.
      • Klimstra D.S.
      • Fuchs T.J.
      Clinical-grade computational pathology using weakly supervised deep learning on whole slide images.
      • D'Amato M.
      • Szostak P.
      • Torben-Nielsen B.
      A comparison between single- and multi-scale approaches for classification of histopathology images.
      • Kuklyte J.
      • Fitzgerald J.
      • Nelissen S.
      • Wei H.
      • Whelan A.
      • Power A.
      • Ahmad A.
      • Miarka M.
      • Gregson M.
      • Maxwell M.
      • Raji R.
      • Lenihan J.
      • Finn-Moloney E.
      • Rafferty M.
      • Cary M.
      • Barale-Thomas E.
      • O'Shea D.
      Evaluation of the use of single- and multi-magnification convolutional neural networks for the determination and quantitation of lesions in nonclinical pathology studies.
      • Lu M.Y.
      • Williamson D.F.K.
      • Chen T.Y.
      • Chen R.J.
      • Barbieri M.
      • Mahmood F.
      Data-efficient and weakly supervised computational pathology on whole-slide images.
      However, in this study, we had region-level pathologist annotations that were based on previously defined histologic subtypes of osteosarcoma that are distinguishable at ×10 magnification level.
      • Dahlin D.C.
      Pathology of osteosarcoma.
      The smallest regions of interest annotated by the pathologist have an area of approximately 25,000 μm2 and are represented by at least one tile of size 256 × 256 at ×10 magnification. A larger tile size would have resulted in fewer training tiles per histologic subtype, which would further increase class imbalance and cause overfitting, whereas a smaller tile size would have obscured important architectural features that go beyond cellular morphology (eg, tumor cells surrounding blood vessels, which are a characteristic feature of telangiectatic osteosarcoma). Hence, to train our image classification model, each whole slide image was scanned at ×10 magnification level and broken down into 256 × 256 pixel tiles.
      Tiles containing >85% of white space were filtered out. Each remaining tile was assigned a single label based on any overlapping pathologist annotations. If a tile contained one or more tumor lesions of divergent histology (ie, a region exceeding 15% of the tile area), the tile was assigned the histologic class of the most dominant lesion (ie, the divergent lesion covering the highest percentage area). Otherwise, the tile was assigned label osteoblastic. For example, if a tile had 35% of its area marked as fibroblastic, then the tile gets assigned the label fibroblastic. If a tile is dominated by nontumor tissue or hemorrhage, it was assigned the label other. All other tiles from unmarked slides were regarded as unlabeled.
      For training, we randomly selected 80% of all labeled tiles from dogs (source domain) and additionally 2000 randomly selected labeled tiles from humans (target domain). Of the remaining 20% labeled tiles from dogs, half were randomly selected for validation and hyperparameter tuning, and the remaining half were held out for testing along with the remaining labeled human tiles that were not selected for training. For reproducibility, we fixed the random seed in our codes generating the train, validation, and test splits. The distribution of tiles by histologic subtype and train, validation, and test split is shown in Figure 1 and Supplemental Table S1.
      Figure thumbnail gr1
      Figure 1Overview of the training data and adversarial learning approach. A: Nonoverlapping whole slide image (WSI) patches from 95 canine whole slide images were extracted at ×10 base magnification and split at random into 80% train, 10% validation, and 10% test. The distribution of patches by each class is shown. B: Nonoverlapping whole slide image patches from 11 human whole slide images were extracted at ×10 base magnification. A total of 2000 patches (approximately 3% of all labeled human patches) were reserved for domain adversarial training of the histologic subtype classifier. The rest were held out for testing. See for details on how each whole slide image patch was assigned a class. C: Overview of the supervised domain adversarial learning approach. The domain classifier is made to work against the histologic subtype classifier by introducing a gradient reversal layer just before the domain classifier. For more details on the algorithm, see . Avg, average; CB, chondroblastic; FB, fibroblastic; GC, giant cell rich; max, maximum; OB, osteoblastic; VR, vessel rich.
      Before feeding a tile as input to the classification model, each tile was rescaled to 224 × 224 pixels, and its per-channel pixel intensities (ranging from 0 to 1) were normalized to follow a standard normal distribution using the following per-channel mean intensity and SDs estimated from the dog training data: mean (r = 0.8938, G = 0.5708, B = 0.7944) and SD (r = 0.1163, G = 0.1528, B = 0.0885). Furthermore, to artificially augment the size of the training set, each tile from a minibatch during training was flipped on one side at random.

      Domain Adversarial Training of a Histologic Subtype Classification Model for Osteosarcomas

      Let (X1,Y1),(X2,Y2),,(Xm,Ym) be examples from a source domain (d=dogs) and (Xm+1,Ym+1),,(Xn,Yn) be examples from a target domain (d=humans) where the number of examples available is typically much less than the number of examples available from the source domain. To train a classification model that adapts from the source domain to target domain, we extend the algorithm of Ganin and Lempitsky
      • Ganin Y.
      • Lempitsky V.
      Unsupervised domain adaptation by backpropagation.
      to the supervised setting. Specifically, let θf be the parameter of the feature extraction backbone Gf(.;θf), (ie, the function that takes as input an example Xi and maps it to a set of features), let θy be the parameter of the subtype classifier Gy(.;θy), (ie, the function that receives input from the feature extractor and predicts class label Yi), and let θd be the parameter of the domain classifier Gd(.;θd) (ie, the function that receives input from the feature extractor and predicts the domain label di). Furthermore, let:
      E(θf,θy,θd)=i=1m+nL(Gy(Gf(Xi;θf);θy),Yi)λi=1m+nL(Gd(Gf(Xi;θf);θd),di)


      =i=1m+nLyi(θf,θy)λi=1m+nLdi(θf,θd)
      (1)


      The first term in Equation 1 represents the subtype classification error, whereas the second term in Equation 1 represents the domain classification error and the hyperparameter λ controls the trade-off between the two errors. The goal of a domain adaption algorithm is then to find the saddle point of E:
      (θˆf,θˆy)=argminθf,θyE(θf,θy,θˆd)
      (2)


      θˆd=argmaxθdE(θˆf,θˆy,θd)
      (3)


      The domain classifier tries to minimize the domain classification error (because of the λ term), and the subtype classifier tries to minimize the subtype classification error. To find the saddle point, the domain classifier is trained adversarially with the label classifier. Consequently, the parameters of the feature extractor θf at the saddle point minimize the subtype classification error (ie, the learned features are discriminative) while maximizing the domain classification error (ie, the learned features are domain invariant). Adversarial training is implemented in practice by simply adding a gradient reversal layer just before the domain classifier and performing standard stochastic gradient descent (Figure 1). The update rule for the parameters after incorporating the gradient reversal layer is given by Equations 4, 5, and 6:
      θfθfμ(LyiθfλLdiθf)
      (4)


      θyθyμLyiθy
      (5)


      θdθdμLdiθf
      (6)


      The hyperparameter μ represents the learning rate. To obtain a head start during training, we initialize the parameters of the feature extraction portion of the resnet50 convolutional neural network (θf) to the values obtained from pretraining resnet50 on the ImageNet data set.
      • Deng J.
      • Dong W.
      • Socher R.
      • Li L.J.
      • Li K.
      • Li F.F.
      ImageNet: a large-scale hierarchical image database.
      Initializing convolutional neural networks with pretrained weights from ImageNet has previously demonstrated success in transfer learning on many digital pathology applications.
      • Ahmed S.
      • Shaikh A.
      • Alshahrani H.
      • Alghamdi A.
      • Alrizq M.
      • Baber J.
      • Bakhtyar M.
      Transfer learning approach for classification of histopathology whole slide images.
      ,
      • Sharmay Y.
      • Ehsany L.
      • Syed S.
      • Brown D.E.
      HistoTransfer: understanding transfer learning for histopathology.
      With the help of stochastic gradient descent, we then simultaneously train the histologic subtype classifier and domain classifier over several epochs using the same resnet50 backbone to find parameters (θf,θy,θd) that get us closest to the saddle point of E. To aid in faster convergence, we decrease the learning rate hyperparameter over each epoch, following Ganin and Lempitsky
      • Ganin Y.
      • Lempitsky V.
      Unsupervised domain adaptation by backpropagation.
      :
      μ(p)=μ0(1+αp)β
      (7)


      Similarly, the hyperparameter λ is increased over each epoch, following Ganin and Lempitsky,
      • Ganin Y.
      • Lempitsky V.
      Unsupervised domain adaptation by backpropagation.
      while periodically setting it to 0 every three epochs.
      λ(p)=2(1+eαp)1
      (8)


      Such hyperparameter annealing is commonly practiced, achieving better convergence during training.
      • Loshchilov I.
      • Hutter F.
      Sgdr: stochastic gradient descent with warm restarts.
      In Equations 7 and 8, P represents the training progress (fraction of total number epochs completed). The hyperparameters μ0=0.001,α=10,andβ=0.75 are following Ganin and Lempitsky.
      • Ganin Y.
      • Lempitsky V.
      Unsupervised domain adaptation by backpropagation.
      The training batch size was set to 256 (sampling 32 patches per whole slide image in each batch). As an early stopping criterion, model training was halted after 15 epochs as the gap between train error and validation error begins to widen after 15 epochs. Hence, model training was halted after 15 epochs. The parameters achieving the best performance on the validation data set over 15 epochs were saved and eventually used for making predictions on held-out test data. The resnet50 architecture and training algorithm were implemented in python using PyTorch on an in-house dedicated server using a single Nvidia RTX A6000 GPU with 48 GB of VRAM.

      Spatial Probability Map Generation and Burden Estimation for Each Histologic Subtype

      To generate spatial probability maps, each whole slide image was processed by the trained patch-level histologic subtype classifier from left to right in a sliding window manner with a window size of 256 × 256 pixels and an overlap of 64 pixels. The resulting probability maps generated were further down sampled to ×5 base magnification via local average pooling of tile probabilities. We eventually generate six spatial probability maps: one for each class (excluding the other class, representing normal/benign/hemorrhagic tissue). The resulting probability maps can then be converted to gray scale or color images and visualized as shown in Figures 2 and 3.
      Figure thumbnail gr2
      Figure 2AD: Pathologist-marked regions versus classifier-generated spatial probability maps for each osteosarcoma subtype over whole slide images of tumor samples from dogs. The probability maps (depicted below each whole slide image) are generated by applying the trained patch-level subtype classifier in a sliding window manner over the whole slide image using a window size of 256 × 256 pixels. For more details, see . Original magnification, ×10 (AD). CB, chondroblastic; FB, fibroblastic; GC, giant cell rich; N, necrosis; OB, osteoblastic; VR, vessel rich.
      Figure thumbnail gr3
      Figure 3A and B: Pathologist-marked regions versus classifier-generated spatial probability maps for each osteosarcoma subtype over whole slide images of tumor samples from humans. The probability maps (depicted below each whole slide image) are generated by applying the trained patch-level subtype classifier in a sliding window manner over the whole slide image using a window size of 256 × 256 pixels. Original magnification, ×10 (A and B). CB, chondroblastic; FB, fibroblastic; GC, giant cell rich; N, necrosis; OB, osteoblastic; VR, vessel rich.
      Having generated spatial probability maps for each histologic subtype, one can then estimate its absolute burden in each patient's tumor while accounting for variable number of slides scanned per case using the following approach:
      Φsubtypecase=1NijPij(subtype)>0.5


      Pij(subtype) represents the probability of region i,j being classified a particular subtype. The summation term represents the total area. The term N in the denominator represents the number of slides scanned per case. We choose to quantify the absolute burden of each subtype instead of relative burden because each tumor was scanned at the same base magnification, and we had access to multiple slides scanned for each tumor in our cohort, including slides with tissue artifacts, such as folded tissue and osteoid formations. See Supplemental Table S2 for the estimated absolute burden of each subtype for all 306 canine cases analyzed in this study.

      Data Preprocessing for K-Means Clustering Analysis

      Given the estimated burden of each histologic subtype in each dog sample, we first center and scale the data and then perform a principal component analysis. The projections of each sample along the first two principal components, which capture most of the variability in the data, are then used for K-means clustering.

      Implementation Details of K-Means Clustering and Survival Analysis

      To perform K-means clustering, we used the kmeans() utility function implemented in R stats package with the following options set: maximum iterations = 500, and nstart (number of random initializations of cluster centers) = 100. For performing Kaplan-Meier and Cox proportional hazards regression analysis of the clinical data, we used the survfit() and cph() utility functions from the R survival package. Results of these analyses were plotted using the ggsurvplot() and ggforest utility functions from R survminer and GGally packages.

      Code Availability

      The code to train a classification model using domain adversarial learning, trained model weights, and scripts to reproduce the downstream results are available (https://github.com/spatkar94/adversarialdogs.git, last accessed September 30, 2022).

      Results

      Overview of Whole Slide Imaging Cohorts Analyzed in this Study and the Adversarial Learning Approach

      To precisely characterize the morphologic heterogeneity of osteosarcomas, we systematically collected and scanned 600 hematoxylin and eosin–stained slides of treatment-naïve primary tumors from a diverse collection of 306 dogs enrolled in a two-armed National Cancer Institute COTC clinical trial.
      • LeBlanc A.K.
      • Mazcko C.N.
      • Cherukuri A.
      • Berger E.P.
      • Kisseberth W.C.
      • Brown M.E.
      • et al.
      Adjuvant sirolimus does not improve outcome in pet dogs receiving standard-of-care therapy for appendicular osteosarcoma: a prospective, randomized trial of 324 dogs.
      The distribution of dogs analyzed in this study by geographic location and breed is summarized in Supplemental Tables S3 and S4. In addition, 39 de-identified hematoxylin and eosin slides of human osteosarcomas were collected to evaluate species-agnostic histologic features. A veterinary anatomic pathologist (J.B.) annotated 95 and 11 slides from canine and human samples, respectively, to identify regions of necrosis or tumor-specific histologic patterns,
      • Beck J.
      • Ren L.
      • Huang S.
      • Berger E.
      • Bardales K.
      • Mannheimer J.
      • Mazcko C.
      • LeBlanc A.
      Canine and murine models of osteosarcoma.
      • Maxie G.G.
      • Meuten D.J.
      Tumors in Domestic Animals.
      including osteoblastic, chondroblastic, fibroblastic, giant cell–rich, and VR regions. Unannotated regions were classified as other.
      We then trained a resnet50 convolutional neural network on whole slide image patches of osteosarcoma to classify them into different histologic subtypes, necrosis, or nontumor areas in both dogs (source domain) and humans (target domain). Figure 1, A and B, and Supplemental Table S1 depict the distribution of whole slide image patches corresponding to each class in training, validation, and test data sets generated for dogs and humans, respectively. Patches from both the dog and human training set were simultaneously fed to a resnet50 convolutional neural network trained using a domain adversarial approach (Figure 1C), which encourages neural networks to learn features that are important for the classification task of interest while at the same time less sensitive to domain-specific differences in the data.
      • Ganin Y.
      • Lempitsky V.
      Unsupervised domain adaptation by backpropagation.
      This was achieved by simultaneously training two classifiers that share the same feature extraction backbone. One classifier aimed to classify whole slide image patches into one of the predefined classes, whereas the other classifier aimed to distinguish the domain of each patch (ie, whether the patch comes from a dog or human sample). During training, the weights of the shared feature extraction backbone are updated such that we arrive at an equilibrium that minimizes classification error while maximizing domain error. Patches from the validation set were used to monitor for any signs of overfitting of the classification model (see Materials and Methods for more details). In the evaluation phase, patches from the held-out test set were evaluated using the trained histologic subtype classifier.

      Adversarial Learning Improves Domain Adaptation of the Histologic Subtype Classifier from Dogs to Humans

      Having trained a patch-level histologic subtype classification model in a domain adversarial manner, we next evaluated the performance of the trained model on held-out test whole slide image patches in both dogs and humans. To evaluate the model's performance, we computed the per-class precision, recall, and F1 scores obtained by comparing the model-predicted class labels of each whole slide image patch in the test set with the ground-truth labels obtained from overlapping pathologist annotations (see Materials and Methods). On average, the model achieved an F1-score of 0.77 (95% CI, 0.74–0.79) in dogs, and an F1-score of 0.8 (95% CI, 0.79–0.81) in humans (Figure 4, A–D ). Overall, the histologic subtype classification model adapts from dogs (source domain) to humans (target domain) after seeing <5% of labeled examples from the target domain. The subtype that had low precision (20%) and low recall (23%) on the target domain is the chondroblastic subtype and was most often confused with the more dominant osteoblastic subtype.
      Figure thumbnail gr4
      Figure 4Performance evaluation on held-out whole slide image patches from the test set in both dogs and humans. A and B: Confusion matrices generated after evaluating model predictions on dog and human whole slide image patches from the held-out test set. The rows represent the predicted class of each whole slide patch (ie, the class achieving the highest probability based on the classification model). The columns represent the ground truth (ie, pathologist-assigned class). Below each confusion matrix is a histogram depicting the distribution of ground truth class labels in the held-out test set. C: The evolution of the test error achieved by the classification model on human whole slide image patches (target domain) as we progress through each training epoch. The red points represent the test error trajectory achieved through adversarial learning. The rest represent the test error trajectories of the remaining control methods. The test error is defined as the average multiclass cross-entropy loss over the entire epoch. D and E: Estimated per-class precision and recall of the classification model on held-out test patches in dogs and humans. The error bars were approximately determined by a bootstrap analysis, where we repeatedly down sampled the test data sets to 50% original size and recomputed precision and recall on each down-sampled version. CB, chondroblastic; FB, fibroblastic; GC, giant cell rich; OB, osteoblastic; VR, vessel rich.
      To evaluate the effect of domain adversarial training on model generalizability from source domain (dogs) to target domain (humans), we performed three control experiments: i) train the image classification model on labeled data from the source domain only and evaluate on target domain (transfer learning), ii) train the image classification model on labeled data from target domain only and evaluate on target domain, and iii) train the image classification model on labeled data from both the source and target domain using standard supervised learning and evaluate on target domain. For each experiment, the classification model was trained starting from the same set of initialized weights and hyperparameters. Overall, we found that the domain adversarial learning approach achieved significantly lower test error per epoch compared with the other three controls when evaluated on the target domain (Figure 4E).
      To visualize the predictions of the patch-level histologic subtype classification model on the whole slide image, we generated spatial probability maps depicting regions of high versus low probability for each histologic subtype based on application of the patch-level histologic subtype classification model over the whole slide image in a sliding window manner (see Materials and Methods for details). As a qualitative validation, Figures 2 and 3 depict pathologist-marked region boundaries within four dog and two human osteosarcoma surgical specimens covering each histologic subtype along with classifier-derived probability maps (one per histologic subtype) over the whole slide image.

      Unsupervised Exploratory Analysis of Whole Slide Imaging Features Uncovers Distinct Populations of Dogs with Different Responses to Standard-of-Care Therapy

      Having generated spatial probability maps of each subtype, we next estimate the absolute burden of each subtype in each canine sample and apply the K-means clustering algorithm to identify clusters of dogs with similar whole slide tumor histology (Supplemental Table S2) (see Materials and Methods). Figure 5A plots the average silhouette score of inferred clusters for different values of K.
      • Rousseeuw P.J.
      Silhouettes - a graphical aid to the interpretation and validation of cluster-analysis.
      The higher the average silhouette score, the more compact and well separated are the clusters (maximum score = 1). The error bars indicate the CI estimated by repeatedly performing K-means clustering on randomly down-sampled versions of the original cohort (down-sampling to approximately 80% original cohort size), when keeping K fixed. The highest silhouette score is achieved for K = 3 clusters. Figure 5B depicts the data distribution along the first two principal components and corresponding cluster memberships.
      Figure thumbnail gr5
      Figure 5K-means clustering analysis of 306 canine osteosarcoma tumors based on estimated burden of histologic subtypes. A: The average silhouette score as a function of the number of clusters used by the K-means algorithm to cluster the data. The higher the average silhouette score, the better the clustering. The smallest value of K achieving the highest silhouette score represents the best possible clustering of the data. B: Principal component (PC) analysis plot, depicting the distribution of all canine osteosarcoma cases based on the estimated burden of each histologic subtype. Points belonging to cluster 1 are red, points belonging to cluster 2 are green, and points belonging to cluster 3 are blue. C: Distribution of the burden of each histologic subtype in each cluster. ∗∗P < 0.01, ∗∗∗P < 0.001, and ∗∗∗∗P < 0.0001 (U-test). Avg, average; CB, chondroblastic; FB, fibroblastic; GC, giant cell rich; N, necrosis; OB, osteoblastic; VR, vessel rich.
      We next examined the distribution of the estimated burden of each subtype in each cluster and the clinical outcomes. The clinical characteristics of the cases analyzed in this study are provided in Table 1. See Supplemental Table S5 for all the clinical metadata. Cluster 3 had significantly higher levels of the vessel-rich regions, whereas cluster 2 had significantly higher tumor necrosis relative to the rest of the cohort and slightly elevated levels of the chondroblastic subtype (Figure 5C). Overall, we observe that dogs belonging to cluster 3 had significantly worse clinical outcomes compared with the other two clusters. Figure 6A shows a Kaplan-Meier plot depicting differences in overall survival rates between dogs belonging to cluster 3 and rest of the cohort (log-rank test P = 0.038), whereas Figure 6B depicts the differences in disease-free interval rates between the dogs belonging to cluster 3 and rest of the cohort (log-rank test P = 0.0071). All dogs belonging to cluster 3 relapsed within 12 months after receiving adjuvant treatment. This negative association remained significant despite adjusting for relevant clinical parameters such as tumor location (proximal humerus versus non-proximal humerus), alkaline phosphatase levels (elevated versus normal), age, weight, sex, and adjuvant treatment type in a multivariable Cox proportional hazards regression model.
      Table 1Clinical Characteristics of the Dog Osteosarcoma Cohort (N = 306)
      Clinical characteristicsValue
      Age, years8.1 (1.4–15.6)
      Weight, kg38.8 (21.2–94.5)
      Tumor location
       Proximal humerus64 (21)
       Non-proximal humerus242 (79)
      ALP levels
       Elevated74 (24)
       Normal232 (76)
      Sex
       Castrated male171 (56)
       Intact male13 (4)
       Spayed female118 (39)
       Intact female4 (1)
      Disease-free interval, time from surgery, days157 (3–1127)
      Overall survival, time from surgery, days235 (3–1652)
      Treatment
       Standard of care155 (51)
       Standard of care + sirolimus (rapamycin)151 (49)
      For continuous variables, values in parentheses represent the minimum and maximum range, and values outside the parentheses represent the median over the entire cohort. All other data are given as number (percentage).
      ALP, alkaline phosphatase.
      Figure thumbnail gr6
      Figure 6Survival outcomes of cluster 3 versus clusters 1 and 2. A: Top: Kaplan-Meier plot depicting the overall survival rates of cases belonging to cluster 3 versus rest (cluster 1 or cluster 2). Bottom: Estimated hazard ratio of each factor. B: Top: Kaplan-Meier plot depicting the disease-free survival rates of cases belonging to cluster 3 versus rest (cluster 1 or cluster 2). Bottom: Estimated hazard ratio of each factor. The log-rank test P value was estimated to determine the significance of the differences in survival rates. ∗P < 0.05, ∗∗∗P < 0.001 (log-rank test). AIC, Akaike information criterion; ALP, alkaline phosphatase; DFI, disease-free interval; NPH, non-proximal humerus; PH, proximal humerus; SOC, standard of care.
      Finally, we performed subgroup analysis to ensure prognostic signatures remain significant in unlabeled data not used in training. The first subgroup consists of 55 reviewed cases (n = 95 pathologist-annotated slides). The second subgroup consists of the remaining 251 unreviewed cases. In each subgroup, the survival association remains consistent, thus demonstrating the clinical utility of model predictions beyond cases previously annotated by the pathologist (Supplemental Figure S2).

      Discussion

      Through the activities of the National Cancer Institute COTC, this study examines the largest data set of canine osteosarcomas to date for which complete clinical outcome data are available and standardized therapy was applied (n = 306). With the help of this large resource, we demonstrate how deep domain adversarial learning can be used to train a histologic subtype classifier that adapts from dog to human osteosarcoma despite utilizing a small fraction of human data for training. Although this is not the first application of deep learning in osteosarcomas,
      • Mishra R.
      • Daescu O.
      • Leavey P.
      • Rakheja D.
      • Sengupta A.
      Convolutional neural network for histopathological analysis of osteosarcoma.
      • D'Acunto M.
      • Martinelli M.
      • Moroni D.
      Deep learning approach to human osteosarcoma cell detection and classification.
      • Fu Y.
      • Xue P.
      • Ji H.Z.
      • Cui W.T.
      • Dong E.Q.
      Deep model with Siamese network for viable and necrotic tumor regions assessment in osteosarcoma.
      it is the first attempting to identify histologic features of osteosarcoma that transfer from canine to human samples, to the best of our knowledge.
      With the help of the trained species-agnostic histologic subtype classifier, we performed an unsupervised exploratory analysis of whole slide imaging data of 306 dogs and identified distinct clusters that respond differently to standardized chemotherapy based on the classifier-estimated burden of histologic subtypes. Our results are consistent with some prior reports indicating that the presence of specific histologic subtypes may have prognostic value
      • Al-Khan A.A.
      • Nimmo J.S.
      • Day M.J.
      • Tayebi M.
      • Ryan S.D.
      • Kuntz C.A.
      • Simcock J.O.
      • Tarzi R.
      • Saad E.S.
      • Richardson S.J.
      • Danks J.A.
      Fibroblastic subtype has a favourable prognosis in appendicular osteosarcoma of dogs.
      ,
      • Nagamine E.
      • Hirayama K.
      • Matsuda K.
      • Okamoto M.
      • Ohmachi T.
      • Kadosawa T.
      • Taniyama H.
      Diversity of histologic patterns and expression of cytoskeletal proteins in canine skeletal osteosarcoma.
      ; however, a rigorous quantitative evaluation of OS histology that takes tumor heterogeneity into account has not been previously explored, likely because of the difficulty in accumulating a large enough data set and the immense manual labor by the pathologist in annotating each region. This is the first exploratory study using AI to define prognostic value of variant histologic features within a large population of dogs receiving standardized care in a prescriptive clinical trial. As with the diagnostic and therapeutic approach to any cancer, many separate factors should be considered when devising a treatment and prognosis. The predictive value of our approach should be considered alongside other patient factors and not considered the sole method by which prognosis can be assigned for canine patients with OS. Nevertheless, information gleaned from our approach is of substantial clinical value to clinicians treating dogs with OS.
      In this study, we refrain from quantifying overlap between pathologist annotations and AI predictions using Dice or IoU metrics. These metrics are preferable in segmentation applications, where the ground truth segmentation boundaries are precisely defined.
      • Taha A.A.
      • Hanbury A.
      Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool.
      However, because of intratumor heterogeneity, osteoblastic tumor cells are frequently observed intermixed with other histologic subtypes.
      • Beck J.
      • Ren L.
      • Huang S.
      • Berger E.
      • Bardales K.
      • Mannheimer J.
      • Mazcko C.
      • LeBlanc A.
      Canine and murine models of osteosarcoma.
      ,
      • Meuten D.J.
      Tumors in Domestic Animals.
      ,
      • Dahlin D.C.
      Pathology of osteosarcoma.
      Hence, it is not feasible for pathologists to precisely mark region boundaries of each histologic subtype at high resolution for each slide. Although the pathologist annotated most tumor tissue in all annotated sections, there are examples where unannotated tumor tissue was present. Interestingly, these cases offer another example demonstrating the ability of the model to identify tumor tissue that would not be captured by Dice or IoU metrics. For example, in Figure 2D, there are several regions that were predicted to contain osteoblastic tumor cells. On review, the pathologist was able to confirm the presence of osteoblastic tumor tissue in these locations (Supplemental Figure S3). This highlights a potential utility of AI in identifying foci of tumor distal to the main tumor mass. This may be particularly important in tumors that require complete excision and could help by re-orientating the pathologist toward specific regions to review.
      In this study, tumors enriched for VR regions were associated with reduced disease-free interval and OS. These vascular structures define the rare telangiectatic subtype of osteosarcoma, which is characterized by blood-filled cystic spaces surrounded by thin septa lined by tumor cells.
      • Beck J.
      • Ren L.
      • Huang S.
      • Berger E.
      • Bardales K.
      • Mannheimer J.
      • Mazcko C.
      • LeBlanc A.
      Canine and murine models of osteosarcoma.
      • Maxie G.G.
      • Meuten D.J.
      Tumors in Domestic Animals.
      Although an early study
      • Matsuno T.
      • Unni K.K.
      • McLeod R.A.
      • Dahlin D.C.
      Telangiectatic osteogenic sarcoma.
      suggested that telangiectatic OS carries a poor prognosis in human patients, others suggest that although there may be a correlation with clinical features, such as pathologic fracture, an association with prognosis is less clear.
      • Huvos A.G.
      • Rosen G.
      • Bretsky S.S.
      • Butler A.
      Telangiectatic osteogenic sarcoma: a clinicopathologic study of 124 patients.
      In dogs, the telangiectatic subtype has been associated with poor prognosis in studies of OS originating in the ulna
      • Sivacolundhu R.K.
      • Runge J.J.
      • Donovan T.A.
      • Barber L.G.
      • Saba C.F.
      • Clifford C.A.
      • de Lorimier L.P.
      • Atwater S.W.
      • DiBernardi L.
      • Freeman K.P.
      • Bergman P.J.
      Ulnar osteosarcoma in dogs: 30 cases (1992-2008).
      (n = 30) or flat and irregular bones
      • Hammer A.S.
      • Weeren F.R.
      • Weisbrode S.E.
      • Padgett S.L.
      Prognostic factors in dogs with osteosarcomas of the flat or irregular bones.
      (n = 45). In our case set, we defined VR regions as containing blood-filled spaces lined by tumor cells. On hematoxylin and eosin staining, these vascular spaces were multifocally lined by polygonal cells rather than flat, spindle-shaped cells, which were more likely to be interpreted as endothelium histologically. CD31 immunohistochemistry staining confirmed the presence of vessels lined by tumor cells in VR-annotated canine osteosarcomas (Supplemental Figure S2). Some VR regions also contained cellular debris, which has been described in human OS.
      • Bacci G.
      • Ferrari S.
      • Ruggieri P.
      • Biagini R.
      • Fabbri N.
      • Campanacci L.
      • Bacchini P.
      • Longhi A.
      • Forni C.
      • Bertoni F.
      Telangiectatic osteosarcoma of the extremity: neoadjuvant chemotherapy in 24 cases.
      • Liu J.J.
      • Liu S.
      • Wang J.G.
      • Zhu W.
      • Hua Y.Q.
      • Sun W.
      • Cai Z.D.
      Telangiectatic osteosarcoma: a review of literature.
      • Sangle N.A.
      • Layfield L.J.
      Telangiectatic osteosarcoma.
      Although VR morphology was uncommon in our data set, the presence of tumor cell–lined vascular structures in largely solid tumors suggests that vascular differentiation can occur within a focal region of these histologically diverse tumors. Such tumors are less likely to be classified as telangiectatic OS, which may inhibit the prognostication of histologic subtype in OS. This is emphasized by a study of OS originating in the ulna (n = 30) that identified reduced survival in dogs with either pure or mixed telangiectatic morphology (ie, telangiectatic or osteoblastic-telangiectatic
      • Sivacolundhu R.K.
      • Runge J.J.
      • Donovan T.A.
      • Barber L.G.
      • Saba C.F.
      • Clifford C.A.
      • de Lorimier L.P.
      • Atwater S.W.
      • DiBernardi L.
      • Freeman K.P.
      • Bergman P.J.
      Ulnar osteosarcoma in dogs: 30 cases (1992-2008).
      ). In fact, up to 65% of canine osteosarcomas are reported to demonstrate multiple histologic subtypes.
      • Nagamine E.
      • Hirayama K.
      • Matsuda K.
      • Okamoto M.
      • Ohmachi T.
      • Kadosawa T.
      • Taniyama H.
      Diversity of histologic patterns and expression of cytoskeletal proteins in canine skeletal osteosarcoma.
      This underlines the utility of AI, which allows pathologists to rapidly quantify the abundance of major and minor histologic patterns within heterogeneous tumors.
      Despite the merits of this study, there are still a few notable limitations that should be considered. First, we did not have access to human clinical outcome data to assess the prognostic value added by our approach over what is currently clinically practiced for humans. A future direction will be to apply this method to a larger set of human OS images with matched clinical outcomes to determine algorithm performance in a translational setting. Second, our study is based on annotations from a single anatomic pathologist. Agreement between pathologists can vary based on the feature of interest. This may be greater in cases where pathologists must consider an aggregate of histologic features to assign a tumor grade. For example, in one veterinary study of osteosarcomas, agreement was considered moderate for necrosis (ICC = 0.626), whereas agreement on grade was fair using three different classification systems.
      • Schott C.R.
      • Tatiersky L.J.
      • Foster R.A.
      • Wood G.A.
      Histologic grade does not predict outcome in dogs with appendicular osteosarcoma receiving the standard of care.
      In the future, we aim to convene a comparative pathology board of M.D. and D.V.M. pathologists to review canine and human osteosarcoma histology with the goal of assessing the impact of our model on interobserver variability, identifying additional features, such as immune cell infiltration, that may be incorporated into our prognostic model alongside ongoing genomic work. Third, the data are severely imbalanced, with only a handful of canine and human tumor cases exhibiting uncommon histologic subtypes. To ensure that there exist enough training examples of each class for the patch-level classifier, pathologist-annotated whole slide images were broken into nonoverlapping patches scanned at high magnification and split at random into train validation and test sets (see Materials and Methods). Patch-based training of neural networks in digital pathology has enabled accurate detection and quantification of complex histologic features on few whole slide images because of thousands of image patches that can be extracted during training at high magnifications.
      • Kuklyte J.
      • Fitzgerald J.
      • Nelissen S.
      • Wei H.
      • Whelan A.
      • Power A.
      • Ahmad A.
      • Miarka M.
      • Gregson M.
      • Maxwell M.
      • Raji R.
      • Lenihan J.
      • Finn-Moloney E.
      • Rafferty M.
      • Cary M.
      • Barale-Thomas E.
      • O'Shea D.
      Evaluation of the use of single- and multi-magnification convolutional neural networks for the determination and quantitation of lesions in nonclinical pathology studies.
      ,
      • Salvi M.
      • Acharya U.R.
      • Molinari F.
      • Meiburger K.M.
      The impact of pre- and post-image processing techniques on deep learning frameworks: a comprehensive review for digital pathology image analysis.
      However, neural networks trained this way are prone to overfitting to slide, staining, or scanner-specific properties.
      • Schomig-Markiefka B.
      • Pryalukhin A.
      • Hulla W.
      • Bychkov A.
      • Fukuoka J.
      • Madabhushi A.
      • Achter V.
      • Nieroda L.
      • Buttner R.
      • Quaas A.
      • Tolkach Y.
      Quality control stress test for deep learning-based diagnostic model in digital pathology.
      In this work, we reasoned that an adversarial learning approach could help neural networks overcome the bias that would be present in domain-specific training paradigms. Adversarial training can, however, be complex in practice compared with standard supervised learning approaches. This is especially relevant during initial phases of training, where noisy signals from the domain classifier can derail the learning algorithm.
      • Ganin Y.
      • Lempitsky V.
      Unsupervised domain adaptation by backpropagation.
      This issue is mitigated by having a good initialization of model parameters and by gradually increasing the influence of domain classifier in the learning process, as defined in detail in Materials and Methods. Last, no additional manual quality control of surgical tumor specimens was completed before data collection from different sites. Instead, our model was adversarially trained to classify nontumor regions in addition to the six different histologic subtypes of osteosarcoma based on pathologist annotations. We expect the robustness and accuracy of the classification model to improve as additional data are collected.
      In summary, deep domain adversarial learning could be a powerful addition to the modern pathologist's toolbox for identification of domain-agnostic histologic and molecular features of tumors and is likely to be useful for many other comparative oncology applications, especially where human data are scarce.

      Acknowledgments

      We thank the Comparative Oncology Clinical Trials Consortium (COTC) members for execution of the COTC-21/022 trials, which provided the clinical outcome data that were analyzed herein; and Dr. Markku Miettinen for granting access to 39 human osteosarcoma slides from his residency training materials.

      Supplemental Data

      Figure thumbnail figs1
      Supplemental Figure S1CD31 labeling of vessel-rich (VR) osteosarcoma regions. A: Whole slide hematoxylin and eosin (H&E), osteosarcoma regions annotated as vessel rich by the pathologist. B and C: H&E and corresponding CD31 immunohistochemistry labeling of a VR region. D: CD31+ endothelial cells lining blood vessels (CD31-positive control). E: A blood-filled space lined by CD31 cells, consistent with telangiectatic tumor morphology. Scale bars: 2 mm (A); 500 μm (B and C); 20 μm (D and E). CB, chondroblastic; OB, osteoblastic.
      • Supplemental Figure S2

        Relationship between cluster membership and survival outcomes among previously annotated (A and B) versus unannotated cases (C and D). P values determining the significance of differences in survival rates were determined by the log-rank test. DFI, disease-free interval.

      Figure thumbnail figs2
      Supplemental Figure S3Example of a case where the trained histologic subtype classifier detects a region of osteoblastic (OB) tumor not annotated by the pathologist. In black: pathologist-marked regions containing osteoblastic tumor (in initial review). Zoomed in on left, example osteoblastic tumor lesion detected by the trained histologic subtype classifier and confirmed by the pathologist. Scale bars: 2 mm (top and bottom right panels); 500 μm (left panel). AI, artificial intelligence.

      References

        • Ottaviani G.
        • Jaffe N.
        The epidemiology of osteosarcoma.
        Cancer Treat Res. 2009; 152: 3-13
        • Misaghi A.
        • Goldin A.
        • Awad M.
        • Kulidjian A.A.
        Osteosarcoma: a comprehensive review.
        SICOT J. 2018; 4: 12
        • Beck J.
        • Ren L.
        • Huang S.
        • Berger E.
        • Bardales K.
        • Mannheimer J.
        • Mazcko C.
        • LeBlanc A.
        Canine and murine models of osteosarcoma.
        Vet Pathol. 2022; 59: 399-414
        • Maxie G.G.
        Jubb, Kennedy & Palmer's Pathology of Domestic Animals. Vol 2. Elsevier Health Sciences, 2015
        • Meuten D.J.
        Tumors in Domestic Animals.
        John Wiley & Sons, 2020
        • Gorlick R.
        • Meyers P.A.
        Osteosarcoma necrosis following chemotherapy: innate biology versus treatment-specific.
        J Pediatr Hematol Oncol. 2003; 25: 840-841
        • LeBlanc A.K.
        • Breen M.
        • Choyke P.
        • Dewhirst M.
        • Fan T.M.
        • Gustafson D.L.
        • Helman L.J.
        • Kastan M.B.
        • Knapp D.W.
        • Levin W.J.
        • London C.
        • Mason N.
        • Mazcko C.
        • Olson P.N.
        • Page R.
        • Teicher B.A.
        • Thamm D.H.
        • Trent J.M.
        • Vail D.M.
        • Khanna C.
        Perspectives from man's best friend: National Academy of Medicine's Workshop on Comparative Oncology.
        Sci Transl Med. 2016; 8: 324ps5
        • LeBlanc A.K.
        • Mazcko C.N.
        Improving human cancer therapy through the evaluation of pet dogs.
        Nat Rev Cancer. 2020; 20: 727-742
        • LeBlanc A.K.
        • Mazcko C.N.
        • Khanna C.
        Defining the value of a comparative approach to cancer drug development.
        Clin Cancer Res. 2016; 22: 2133-2138
        • Selmic L.E.
        • Burton J.H.
        • Thamm D.H.
        • Withrow S.J.
        • Lana S.E.
        Comparison of carboplatin and doxorubicin-based chemotherapy protocols in 470 dogs after amputation for treatment of appendicular osteosarcoma.
        J Vet Intern Med. 2014; 28: 554-563
        • Al-Khan A.A.
        • Nimmo J.S.
        • Day M.J.
        • Tayebi M.
        • Ryan S.D.
        • Kuntz C.A.
        • Simcock J.O.
        • Tarzi R.
        • Saad E.S.
        • Richardson S.J.
        • Danks J.A.
        Fibroblastic subtype has a favourable prognosis in appendicular osteosarcoma of dogs.
        J Comp Pathol. 2020; 176: 133-144
        • LeBlanc A.K.
        • Mazcko C.N.
        • Cherukuri A.
        • Berger E.P.
        • Kisseberth W.C.
        • Brown M.E.
        • et al.
        Adjuvant sirolimus does not improve outcome in pet dogs receiving standard-of-care therapy for appendicular osteosarcoma: a prospective, randomized trial of 324 dogs.
        Clin Cancer Res. 2021; 27: 3005-3016
        • Nagamine E.
        • Hirayama K.
        • Matsuda K.
        • Okamoto M.
        • Ohmachi T.
        • Kadosawa T.
        • Taniyama H.
        Diversity of histologic patterns and expression of cytoskeletal proteins in canine skeletal osteosarcoma.
        Vet Pathol. 2015; 52: 977-984
        • Skorupski K.A.
        • Uhl J.M.
        • Szivek A.
        • Allstadt Frazier S.D.
        • Rebhun R.B.
        • Rodriguez Jr., C.O.
        Carboplatin versus alternating carboplatin and doxorubicin for the adjuvant treatment of canine appendicular osteosarcoma: a randomized, phase III trial.
        Vet Comp Oncol. 2016; 14: 81-87
        • Bera K.
        • Schalper K.A.
        • Rimm D.L.
        • Velcheti V.
        • Madabhushi A.
        Artificial intelligence in digital pathology - new tools for diagnosis and precision oncology.
        Nat Rev Clin Oncol. 2019; 16: 703-715
        • Harmon S.A.
        • Sanford T.H.
        • Brown G.T.
        • Yang C.
        • Mehralivand S.
        • Jacob J.M.
        • Valera V.A.
        • Shih J.H.
        • Agarwal P.K.
        • Choyke P.L.
        • Turkbey B.
        Multiresolution application of artificial intelligence in digital pathology for prediction of positive lymph nodes from primary tumors in bladder cancer.
        JCO Clin Cancer Inform. 2020; 4: 367-382
        • Harmon S.A.
        • Tuncer S.
        • Sanford T.
        • Choyke P.L.
        • Turkbey B.
        Artificial intelligence at the intersection of pathology and radiology in prostate cancer.
        Diagn Interv Radiol. 2019; 25: 183-188
        • Ferrer L.
        • Fondevila D.
        • Rabanal R.M.
        • Vilafranca M.
        Immunohistochemical detection of CD31 antigen in normal and neoplastic canine endothelial cells.
        J Comp Pathol. 1995; 112: 319-326
        • Giuffrida M.A.
        • Bacon N.J.
        • Kamstock D.A.
        Use of routine histopathology and factor VIII-related antigen/von Willebrand factor immunohistochemistry to differentiate primary hemangiosarcoma of bone from telangiectatic osteosarcoma in 54 dogs.
        Vet Comp Oncol. 2017; 15: 1232-1239
        • Campanella G.
        • Hanna M.G.
        • Geneslaw L.
        • Miraflor A.
        • Werneck Krauss Silva V.
        • Busam K.J.
        • Brogi E.
        • Reuter V.E.
        • Klimstra D.S.
        • Fuchs T.J.
        Clinical-grade computational pathology using weakly supervised deep learning on whole slide images.
        Nat Med. 2019; 25: 1301-1309
        • D'Amato M.
        • Szostak P.
        • Torben-Nielsen B.
        A comparison between single- and multi-scale approaches for classification of histopathology images.
        Front Public Health. 2022; 10: 892658
        • Kuklyte J.
        • Fitzgerald J.
        • Nelissen S.
        • Wei H.
        • Whelan A.
        • Power A.
        • Ahmad A.
        • Miarka M.
        • Gregson M.
        • Maxwell M.
        • Raji R.
        • Lenihan J.
        • Finn-Moloney E.
        • Rafferty M.
        • Cary M.
        • Barale-Thomas E.
        • O'Shea D.
        Evaluation of the use of single- and multi-magnification convolutional neural networks for the determination and quantitation of lesions in nonclinical pathology studies.
        Toxicol Pathol. 2021; 49: 815-842
        • Lu M.Y.
        • Williamson D.F.K.
        • Chen T.Y.
        • Chen R.J.
        • Barbieri M.
        • Mahmood F.
        Data-efficient and weakly supervised computational pathology on whole-slide images.
        Nat Biomed Eng. 2021; 5: 555-570
        • Dahlin D.C.
        Pathology of osteosarcoma.
        Clin Orthop Relat Res. 1975; : 23-32
        • Ganin Y.
        • Lempitsky V.
        Unsupervised domain adaptation by backpropagation.
        (International Conference on Machine Learning)37. 2015: 1180-1189
        • Deng J.
        • Dong W.
        • Socher R.
        • Li L.J.
        • Li K.
        • Li F.F.
        ImageNet: a large-scale hierarchical image database.
        (Cvpr: 2009 IEEE Conference on Computer Vision and Pattern Recognition)1-4. 2009: 248-255
        • Ahmed S.
        • Shaikh A.
        • Alshahrani H.
        • Alghamdi A.
        • Alrizq M.
        • Baber J.
        • Bakhtyar M.
        Transfer learning approach for classification of histopathology whole slide images.
        Sensors. 2021; 21: 5361
        • Sharmay Y.
        • Ehsany L.
        • Syed S.
        • Brown D.E.
        HistoTransfer: understanding transfer learning for histopathology.
        in: 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE, 2021: 1-4
        • Loshchilov I.
        • Hutter F.
        Sgdr: stochastic gradient descent with warm restarts.
        arXiv. 2016; ([Preprint])
        • Rousseeuw P.J.
        Silhouettes - a graphical aid to the interpretation and validation of cluster-analysis.
        J Comput Appl Math. 1987; 20: 53-65
        • Mishra R.
        • Daescu O.
        • Leavey P.
        • Rakheja D.
        • Sengupta A.
        Convolutional neural network for histopathological analysis of osteosarcoma.
        J Comput Biol. 2018; 25: 313-325
        • D'Acunto M.
        • Martinelli M.
        • Moroni D.
        Deep learning approach to human osteosarcoma cell detection and classification.
        (Multimedia and Network Information Systems)833. 2019: 353-361
        • Fu Y.
        • Xue P.
        • Ji H.Z.
        • Cui W.T.
        • Dong E.Q.
        Deep model with Siamese network for viable and necrotic tumor regions assessment in osteosarcoma.
        Med Phys. 2020; 47: 4895-4905
        • Taha A.A.
        • Hanbury A.
        Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool.
        BMC Med Imaging. 2015; 15: 29
        • Matsuno T.
        • Unni K.K.
        • McLeod R.A.
        • Dahlin D.C.
        Telangiectatic osteogenic sarcoma.
        Cancer. 1976; 38: 2538-2547
        • Huvos A.G.
        • Rosen G.
        • Bretsky S.S.
        • Butler A.
        Telangiectatic osteogenic sarcoma: a clinicopathologic study of 124 patients.
        Cancer. 1982; 49: 1679-1689
        • Sivacolundhu R.K.
        • Runge J.J.
        • Donovan T.A.
        • Barber L.G.
        • Saba C.F.
        • Clifford C.A.
        • de Lorimier L.P.
        • Atwater S.W.
        • DiBernardi L.
        • Freeman K.P.
        • Bergman P.J.
        Ulnar osteosarcoma in dogs: 30 cases (1992-2008).
        J Am Vet Med Assoc. 2013; 243: 96-101
        • Hammer A.S.
        • Weeren F.R.
        • Weisbrode S.E.
        • Padgett S.L.
        Prognostic factors in dogs with osteosarcomas of the flat or irregular bones.
        J Am Anim Hosp Assoc. 1995; 31: 321-326
        • Bacci G.
        • Ferrari S.
        • Ruggieri P.
        • Biagini R.
        • Fabbri N.
        • Campanacci L.
        • Bacchini P.
        • Longhi A.
        • Forni C.
        • Bertoni F.
        Telangiectatic osteosarcoma of the extremity: neoadjuvant chemotherapy in 24 cases.
        Acta Orthop Scand. 2001; 72: 167-172
        • Liu J.J.
        • Liu S.
        • Wang J.G.
        • Zhu W.
        • Hua Y.Q.
        • Sun W.
        • Cai Z.D.
        Telangiectatic osteosarcoma: a review of literature.
        Onco Targets Ther. 2013; 6: 593-602
        • Sangle N.A.
        • Layfield L.J.
        Telangiectatic osteosarcoma.
        Arch Pathol Lab Med. 2012; 136: 572-576
        • Schott C.R.
        • Tatiersky L.J.
        • Foster R.A.
        • Wood G.A.
        Histologic grade does not predict outcome in dogs with appendicular osteosarcoma receiving the standard of care.
        Vet Pathol. 2018; 55: 202-211
        • Salvi M.
        • Acharya U.R.
        • Molinari F.
        • Meiburger K.M.
        The impact of pre- and post-image processing techniques on deep learning frameworks: a comprehensive review for digital pathology image analysis.
        Comput Biol Med. 2021; 128: 104129
        • Schomig-Markiefka B.
        • Pryalukhin A.
        • Hulla W.
        • Bychkov A.
        • Fukuoka J.
        • Madabhushi A.
        • Achter V.
        • Nieroda L.
        • Buttner R.
        • Quaas A.
        • Tolkach Y.
        Quality control stress test for deep learning-based diagnostic model in digital pathology.
        Mod Pathol. 2021; 34: 2098-2108