| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |







From the Swim Across America Laboratory,* Departments of Surgery,¶ Medicine,
and Pathology,
Memorial-Sloan Kettering Cancer Center, New York and Columbia Genome Center,
Columbia University, New York, New York
| Abstract |
|---|
|
|
|---|
It is possible to classify some STS by their recurrent chromosomal translocations or somatic mutation,6 such as the presence of SYT-SSX fusion transcript in synovial sarcoma,7,8 EWS-ATF1 in clear-cell sarcoma,9,10 TLS-CHOP in myxoid/round-cell liposarcoma11,12 and ASPL-TFE3 in alveolar soft-part sarcoma.13 Most of these translocations produce chimeric transcription factors, which presumably deregulate the expression of several target genes.14 In the case of gastrointestinal stromal tumors (GIST), a distinct somatic mutation has been described in KIT,15-17 which leads to ligand-independent constitutive activation of its encoded receptor tyrosine kinase. This in turn results in altered cell proliferation and tumorigenesis.
The group of tumors characterized by numerous, non-recurrent chromosomal alterations includes MFH, conventional fibrosarcoma, leiomyosarcoma, de-differentiated liposarcoma and pleomorphic liposarcoma. In particular, the diagnosis of MFH has been long controversial. Originally described in the 1960s as a fibrous xanthoma,18-20 MFH was considered a true histiocytic tumor displaying facultative fibroblastic properties. Subsequent ultrastructural evaluation found the predominant cell type to be in fact a fibroblast or one of its variants, leading to the conclusion that MFH should be reclassified as pleomorphic fibrosarcoma.21,22 Others consider MFH to be a final common pathway for certain types of STS and represent tumor progression or de-differentiation.23-25
The molecular classification of cancer has recently been prompted by the sequencing and annotation of the human genome and technical advancement in gene transcription profiling.26-28 These profound scientific advancements have permitted high-throughput analysis and molecular correlation between tumors that provides insight into molecular pathways and mechanisms. The support vector machine (SVM) model has, in particular, been shown to be useful in classification tasks using gene expression data.29-31
In this study, we investigated the gene expression profiles of 51 high-grade STS, representing nine different histological subtypes. We focused on high-grade lesions, as these often pose a diagnostic challenge and would potentially benefit from molecular-based classification and a diagnostic algorithm. Using hierarchical cluster analysis, multidimensional scaling and SVM analysis, we determined the molecular relationship of STS and compared this to the current histological classification, for the purpose of a novel biology-based model of STS.
| Materials and Methods |
|---|
|
|
|---|
Tumor specimens, obtained from 51 patients undergoing surgery at Memorial Sloan-Kettering Cancer Center, included MFH (n = 11), conventional fibrosarcoma (n = 8), leiomyosarcoma (n = 6), round-cell liposarcoma (n = 4), pleomorphic liposarcoma (n = 3), de-differentiated liposarcoma (n = 5), clear-cell sarcoma (n = 4), synovial sarcoma (n = 5), and GIST (n = 5). Specimens were collected under an IRB-approved tissue procurement protocol. Representative tumor tissue was embedded in OCT compound and frozen as tissue blocks using liquid nitrogen. Tumor specimens were selected for analysis according to validation of histological diagnosis. Round-cell liposarcoma, de-differentiated liposarcoma and pleomorphic liposarcoma were dissected from microscopically identified regions within the frozen tumor block, to ensure selection of high-grade areas only. Prior therapy was not considered an exclusion criterion, as we showed in a pilot study that tumors did not cluster differently by prior treatment. For additional details on genotype, subtype, prior therapy, site and stage, see Supplemental Data at http://www.amjpathol.org, or http://www.mskcc.org/genomic.sts.32 Tumor specimens have been used in a similar study in the classification of clear-cell sarcoma.33
Histological and Molecular Diagnosis
In all cases histological slides were available from the primary resection specimen and were reviewed independently by two soft-tissue pathologists (C.R.A., J.M.W.). Histological diagnosis was supported in every case by an appropriate immunohistochemical panel and/or molecular genetic evaluation. RT-PCR using total RNA extracted from frozen tissue was performed for detection of specific fusion transcripts such as SYT-SSX, TLS-CHOP, and EWS-ATF1, used in the molecular diagnosis of synovial sarcoma,34 myxoid/round-cell liposarcoma,12 and clear-cell sarcoma,10 respectively. All GIST tumors were tested for the presence of KIT mutations, using PCR amplification of genomic DNA, followed by direct sequencing.35 These studies were performed in the laboratories of the Division of Molecular Pathology, Memorial Sloan-Kettering Cancer Center.
RNA Isolation and Gene Expression Profiling
Cryopreserved tumor sections were homogenized under liquid nitrogen by mortar and pestle. Total RNA was extracted in Trizol reagent and purified using the Qiagen Rneasy kit. RNA quality was assessed on ethidium bromide agarose gel electrophoresis. cDNA was then synthesized in the presence of oligo(dT)24-T7 from Genset Corp. (La Jolla, CA). cRNA was prepared using biotinylated UTP and CTP and hybridized to HG U95A oligonucleotide arrays (Affymetrix Inc., Santa Clara, CA). Fluorescence was measured by laser confocal scanner (Agilent, Palo Alto, CA) and converted to signal intensity by means of Affymetrix Microarray Suite v4.0 software. For complete expression data, see Supplemental Data at http://www.amjpathol.org, or http://www.mskcc.org/genomic.sts.32
Hierarchical Cluster Analysis
Hierarchical cluster analysis was performed using XCluster (http://genome-www.stanford.edu/
sherlock/cluster.html), using a centered Pearson correlation coefficient distance metric and average linkage to measure cluster distances during partitioning.36
A nonparametric bootstrap was used to estimate confidence of the cluster structure.37
For each bootstrap sample, the clustering obtained was compared to the clustering obtained with the original data set. Two clusters (branches of the hierarchy) were considered identical if they contained the same members.
Multidimensional Scaling Analysis
As an alternative and independent way of visualizing the cluster structure of the data a multidimensional scaling analysis was done. To deal with both the large range and the negative values of the expression data we took as the distance function 1/2(1 r), where r is the Spearman rank-order correlation coefficient. The multidimensional scaling was done using S-PLUS38 projecting the data into three dimensions.
Support Vector Machine Analysis
The ability of a machine-learning algorithm to correctly classify each tumor type was measured using SVM analysis with hold-one-out cross-validation.29,30 In brief, during the training phase the SVM takes as input a microarray data matrix, and labels each sample as either belonging to a given class (positive) or not (negative). The SVM treats each sample in the matrix as a point in a high-dimensional feature space, where the number of genes on the microarray determines the dimensionality of the space. The SVM learning algorithm then identifies a hyperplane in this space that best separates the positive and negative training examples. The trained SVM can then be used to make predictions about a test samples membership in the class. This approach allows us to collect unbiased measurements of the ability of the SVM to classify each sample. We used a standard "hold-one-out" training/testing scheme, in which the SVM is trained separately on training sets made up of all but one of the samples, and then tested on the single "held out" sample. Because a classifiers performance can be hindered by the inclusion of irrelevant data, we used feature selection to identify genes that are most important for classification. The genes in the training data set were ranked in order of their proposed importance in distinguishing the positives from the negatives, as described in more detail in the next section, and the top N genes were taken for each trial. The value N was varied in 12 powers of 2, ranging from 4 to 8192. Thus, the SVM was run 51 times on each of 12 different numbers of features (genes), for each of the tumor classes. Each held-out test sample was counted as either a false positive, false negative, true positive, or true negative.
Gene Ranking for Feature Selection
To select genes that were the most informative for the SVM, we tested a variety of methods including the Fisher score method30 and parametric and nonparametric statistics. Data reported here were derived from Students t-test, because it yielded the best SVM performance overall. Each gene in each training data set was subjected to the following procedure. A standard Students t-test was used to compare the expression in one tumor type to that in the remaining samples. The resulting P values were then used to rank the genes, and the desired number of genes was then selected for use. The corresponding data from the training set was used to train the SVM, and the same genes were used for the test data. It is important to note that the genes were selected solely on the basis of the training data. Finally, a t-test statistic as determined for all samples was used to provide an overall ranking of the genes in order of relevance for each tumor classification. This ranking was used to provide an overview of the most important genes for distinguishing the class.
| Results |
|---|
|
|
|---|
We determined the gene expression profile of 51 adult soft tissue sarcomas using 12,559 oligonucleotide probe sets on the U95A GeneChip from Affymetrix. Tumor specimens included nine different histological subtypes, which taken together cover more than 75% of STS cases diagnosed in the United States.
We explored three approaches to data analysis. In the first, we used unsupervised cluster analysis to identify groups of tumors related by similarity in overall gene expression profile using all genes represented on the U95A GeneChip (Figure 1)
. We identified two principal clusters that discriminate specimens by karyotypic and morphological features. STS characterized by non-recurrent genetic aberrations and karyotypic complexity show poor overall similarity in both gene expression profile and bootstrap analyses. In contrast, STS characterized by single recurrent genetic events clustered distinctly in strong groups. This was shown for all cases of GIST, synovial sarcoma, clear-cell sarcoma and round-cell liposarcoma. Similarly, visualized using multidimensional scaling analysis once again using all genes represented on the U95A GeneChip (MDS) (Figure 2)
.
|
|
Although the pleomorphic STS were not strongly related overall by gene expression profile, predominant groups were observed on hierarchical cluster analysis in concordance with histological classification. In particular, 5 of 6 leiomyosarcoma specimens (S20-S24) co-clustered with a de-differentiated liposarcoma (S29). This de-differentiated liposarcoma was noted previously to contain divergent leiomyosarcomatous differentiation on routine histological and immunohistochemical assessment. These 6 specimens were designated as "genomic leiomyosarcoma group #1" for further discussion. Similarly, 9 of 11 MFH specimens (S36-S40, S43-S46), including 5 of 6 lesions with myxoid features, clustered together with a single fibrosarcoma (S5). This was designated as "genomic MFH group" for further discussion. The remaining specimens appeared heterogeneous.
Support Vector Machine Analysis
Our second approach incorporated the use of SVM analysis to explore the outcome of genomic diagnosis in both previously-defined histological subtypes and potential novel genomic groups. Specimens were divided into two groups to establish training classes for each diagnostic category. The positive class contained all specimens that belong to a specific category. The negative class contained the remaining specimens. We performed hold-one-out cross-validation, in which one specimen was hidden from the SVM during training and was subsequently given to the "machine" as a test specimen. This was performed over a range of gene numbers to identify the range in which the "machine" operates optimally in diagnosing an unknown specimen. The outcome of the analysis was compared to the predicted subtype of the test specimen and indicated as true/false positive or true/false negative.
SVM analysis achieved both high sensitivity and high specificity in GIST, synovial sarcoma, round-cell liposarcoma, and clear-cell sarcoma. In the case of MFH, leiomyosarcoma, and de-differentiated liposarcoma, genomic reclassification of these tumors by cluster analysis improved SVM performance (Figure 3)
. Interestingly, de-differentiated liposarcomas were diagnosed accurately using as few as four genes, but only up to 64 genes. This limited range of sensitivity is consistent with a genomic-based relationship over few genes that is sufficient for SVM diagnosis yet insufficient to generate clusters using global gene expression. In the case of leiomyosarcoma, the designated "genomic leiomyosarcoma group #1" behaved poorly in SVM analysis, as observed by consistent misclassifications as false positive and false negative. We explored this further by hypothesizing an alternative "genomic leiomyosarcoma group #2" which included the outlier leiomyosarcoma specimens S26. This hypothetical cluster gained support by demonstrating consistently perfect SVM performance over a large range in the number of genes used. These results, taken together, demonstrate the efficacy of a diagnostic algorithm in validating and, in particular, exploring the outcome of cluster analysis techniques.
|
Our third approach to data analysis was the identification of genes, consistent with each tumor subtype for the purpose of useful biological discovery (Figure 4)
. In the case of MFH, leiomyosarcoma, and de-differentiated liposarcoma, genomic classification was used. This was performed using Students t-test analysis and cross-referencing the top scoring 500 genes against both the published literature and the gene ontology consortium database (http://www.geneontology.org/) using NetAffx (http://www.affymetrix.com). We further limited this analysis to the top 50 genes for any particular STS subtype. We identified the known genetic markers for distinct subtypes of STS, including KIT (GIST), SYT-SSX (synovial sarcoma), PPAR
(round-cell liposarcoma) and MITF (clear-cell sarcoma). In addition, we discovered several genes that are implicated in diverse biological processes, pathways, and states of differentiation.
|
40
in 5 of 5 specimens and the KIT ligand, stem cell factor (SCF), in 2 of 5 specimens (S15, S17). This finding was not related to any particular mutation in KIT (Table 1)
|
and MYC oncogene. Clear-cell sarcomas demonstrated several genes associated with their melanocytic lineage,33
including SOX10, gp100, and MITF. De-differentiated liposarcoma were characterized by genes located on 12q, including CDK4 and MDM2. Round-cell liposarcomas were characterized by lipid metabolism and adipogenic profiles and included several homeobox genes. Leiomyosarcomas were characterized by genes implicated in the smooth-muscle phenotype. For complete gene list data, see Supplemental Data at http://www.amjpathol.org, or http://www.mskcc.org/genomic.sts.32
| Discussion |
|---|
|
|
|---|
Data from this analysis demonstrates that STS characterized by specific translocations display remarkably homogenous and distinct global gene expression profiles, as evident in the case of synovial sarcomas, round-cell liposarcomas and clear-cell sarcomas. This phenomenon was similarly observed in GISTs, characterized by recurrent genetic mutations in KIT. The observation of distinct gene expression profiles in these tumors is striking, in particular their consistent ability to cluster using different algorithms. This finding in GIST is consistent with a previous study that showed 13 GISTs to display a distinct gene expression profile relative to 6 spindle-cell sarcomas.41 Furthermore, the GIST separated from leiomyosarcoma, including intraabdominal tumors, in support of their different histogenesis. Our findings are supported in a recent study by Nielsen et al.42 Using cDNA microarray technology to profile 41 soft tissue tumors, their study identified GIST, synovial sarcoma and a subset of leiomyosarcoma as distinct groups on hierarchical cluster analysis.
Synovial sarcomas were furthermore shown to be distinct subtypes of STS in recent studies by Allander et al43 and Nagayama et al.44 Interestingly, the latter study suggests synovial sarcoma to be related to MPNST. We identified close proximity of several fibrosarcoma specimens to synovial sarcoma. These three tumor types are often indistinguishable on routine light microscopy and may indeed represent a common class of primitive mesenchymal tumors.
The present study also describes the use of a supervised learning algorithm, SVM analysis, in the diagnosis of STS. The diagnosis of tumors characterized by specific genetic events was highly accurate using as few as between 4 and 32 genes. Errors were predominantly confined to reduced specificity at low gene numbers and an eventual drop-off in sensitivity between 1000 and 8000 genes. These findings suggest that, aside from pathognomonic genetic changes that have been reported for these tumors, collective information from an extremely diverse number of genes may be considered in their diagnosis and underlying biology.
Data from this report also reveal that STS characterized by pleomorphic phenotypes and complex karyotypes display relatively inconsistent gene expression profiles, in keeping with their cytogenetic heterogeneity. However, within this group of pleomorphic STS, leiomyosarcoma and a subset of MFH were distinguished by their ability to cluster. This particular finding prompted us to explore the possibility of diagnosing these tumors using a genomic platform. SVM analysis attained perfect performance over a limited range in gene number when diagnosing genomic MFH compared to histological MFH. This observation supports our claim that the genomic group MFH is distinct and amenable to objective diagnosis. Since MFH is diagnosed at different rates by different pathologists we do not have a good sense nationwide or worldwide if specific drugs are better for this subtype or not, beyond just using doxorubicin, ifosfamide, DTIC, or combinations thereof. The identification of a subset of MFH with a particular characteristic expression profile could potentially facilitate an objective diagnosis of this tumor type and assist in subsequent therapeutic studies.
Unlike genomic MFH, improved SVM performance with specimens selected by genomic classification was not initially shown for leiomyosarcoma. The above findings were intriguing for two reasons. First, it provided further support that the ability to diagnose the genomic MFH group by SVM analysis was not only a consequence of their ability to cluster, but in fact demonstrated that the other tumors in this study were sufficiently different so as not to be misdiagnosed as MFH by SVM analysis. Second, the observation of a consistent misclassification of genomically defined leiomyosarcoma prompted us to repeat this SVM analysis including the specimen that was excluded on cluster analysis. This removed the false positive occurrence in SVM analysis and also improved overall performance.
These observations that SVM performance improved when diagnosing genomic groups versus histological groups was not surprising as these tumors were selected largely on the basis of genomic correlation. However, this finding was significant and demonstrated an important and logical extension of genomic profiling. It illustrated that genomic correlation between tumors may be exploited to recognize novel classifications, against which meaningful biological/clinical correlates may be considered. We concluded that the genomic classification by cluster analysis of adult STS and SVM support is feasible and presents a user-independent reproducible mechanism by which to establish biology-based classification of soft tissue sarcoma.
Further inspection of the gene lists that discriminate subtypes of STS was particularly informative for biological discovery. In particular we identified features consistent with autocrine growth loops in a subset of GIST, involving SCF and KIT, and in synovial sarcoma, involving WNT5a and components of the downstream signaling pathway, including FRIZZLED-1.
Mutations in the KIT occur somatically in many sporadic GISTs. These mutations activate the tyrosine kinase activity of KIT and induce constitutive signaling. Inhibition of the tyrosine kinase activity of KIT by imatinib mesylate induces tumor regression in GISTs.45 The finding of SCF, also known as KIT ligand, in subset of GISTs is a novel and noteworthy finding that may have implications in understanding potential autocrine growth effects in GIST involving the KIT pathway.
The recent study by Nagayama et al44 similarly identified several genes related to the WNT signaling pathway in synovial sarcoma, including WNT inhibitory factor 1 and Frizzled homolog 10. The finding of PRAME as a discriminating gene in several independent studies42-44 in synovial sarcoma suggests a particularly robust association of the tumor antigen and this STS subtype.
Results of this analysis point to current treatment strategies for patients with STS, including imatinib (STI-571) for GIST and PPAR
agonists for myxoid/round-cell liposarcomas and suggest additional therapeutic considerations. These include blockade of PI-3 kinase with wortmannin or similar compounds in GIST, and the use of retinoid agonists/antagonists or blockade of WNT signaling in synovial sarcoma.
Whereas Allander et al43 identified a strong association between ERBB2 expression and synovial sarcoma, we did not identify a similar association. This discrepant finding is likely based on tumor subtype selection as we included only monophasic synovial sarcoma in our study and their group identified ERBB2 to be predominantly expressed in biphasic synovial sarcoma.
We have approached the challenge of sarcoma classification using a combination of clustering techniques to propose novel groups and supervised diagnostic techniques to test the proposed grouping. This combined approach allows us to consider the distinction between groups of tumors in terms of diagnostic sensitivity and specificity rather than by similarity in gene expression profile alone. The classification of STS will continue to evolve as additional subtypes of this disease are introduced into the molecular classification scheme. More detailed analysis of the gene expression profiles of each of the more than 50 subtypes of STS will clarify the biological differences within STS and will hopefully propose therapies specific for each subclass of STS, if not therapy specific for an individual patients tumor. The present study proposes multiple molecular pathways that may become potential targets for therapeutic intervention, and represents one step toward a comprehensive molecular understanding of this rare and heterogeneous group of diseases.
| Acknowledgements |
|---|
| Footnotes |
|---|
Supported by the National Institutes of Health (grant CA-47179 to M.F.B., A.N.H., and C.C.C.), the Etta Weinheim Memorial Fund, and the National Science Foundation (grant IIS-0093302 to W.S.N.).
Accepted for publication May 5, 2003.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
I. Matushansky, E. Hernando, N. D. Socci, T. Matos, J. Mills, M. A. Edgar, G. K. Schwartz, S. Singer, C. Cordon-Cardo, and R. G. Maki A Developmental Model of Sarcomagenesis Defines a Differentiation-Based Classification for Liposarcomas Am. J. Pathol., April 1, 2008; 172(4): 1069 - 1080. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Singer, N. D. Socci, G. Ambrosini, E. Sambol, P. Decarolis, Y. Wu, R. O'Connor, R. Maki, A. Viale, C. Sander, et al. Gene Expression Profiling of Liposarcoma Identifies Distinct Biological Types/Subtypes and Potential Therapeutic Targets in Well-Differentiated and Dedifferentiated Liposarcoma Cancer Res., July 15, 2007; 67(14): 6626 - 6636. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. H. C. Bramwell Controversies in Surgical Oncology: Routine Anthracycline-based Adjuvant Chemotherapy for Stage III Extremity Soft Tissue Sarcoma Ann. Surg. Oncol., April 1, 2007; 14(4): 1254 - 1256. [Full Text] [PDF] |
||||
![]() |
J. S. Gold, S. M. van der Zwan, M. Gonen, R. G. Maki, S. Singer, M. F. Brennan, C. R. Antonescu, and R. P. De Matteo Outcome of Metastatic GIST in the Era before Tyrosine Kinase Inhibitors Ann. Surg. Oncol., January 1, 2007; 14(1): 134 - 142. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. P. Agaram, P. Besmer, G. C. Wong, T. Guo, N. D. Socci, R. G. Maki, D. DeSantis, M. F. Brennan, S. Singer, R. P. DeMatteo, et al. Pathologic and Molecular Heterogeneity in Imatinib-Stable or Imatinib-Responsive Gastrointestinal Stromal Tumors Clin. Cancer Res., January 1, 2007; 13(1): 170 - 181. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Clark, C. Fisher, I. Judson, and J. M. Thomas Soft-Tissue Sarcomas in Adults N. Engl. J. Med., August 18, 2005; 353(7): 701 - 711. [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |