| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Published online before print January 10, 2008
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







From the Laboratory of Receptor Biology and Gene Expression,*the Section on Genomic Variation,
Pediatric Oncology Branch, and the Genetics Branch,||National Cancer Institute, Bethesda, Maryland; the Computer Science Department,¶University of Massachusetts, Lowell, Massachusetts; the Department of Genetics,
Institute for Cancer Research, Rikshospitalet-Radiumhospitalet Medical Center, Oslo, Norway; and the Faculty of Medicine,
University of Oslo, Oslo, Norway
| Abstract |
|---|
|
|
|---|
B in a novel feed-forward, self-amplifying, autoregulatory module regulated by the ERBB family of growth factor receptors. The existence of this pathway was verified in vivo by chromatin immunoprecipitation and shown to be deregulated in breast cancer cells overexpressing ERBB2. This analysis indicates that approaches of this type can provide unique insights into the differential regulatory molecular programs associated with breast cancer and will aid in identifying specific transcriptional networks and pathways as potential targets for tumor subtype-specific therapeutic intervention.
Recently, the application of hierarchical clustering to distinguish molecular phenotypes has identified five different molecular subtypes of breast cancer based on their differential expression of
534 genes.2,3
Three of these classes are characterized as having low to absent expression of estrogen receptor (ER) and other specific transcription factors when compared to the other subtypes.4
These three are referred to as 1) basal-like subtype, characterized by high expression of keratins 5 and 17, laminin, and fatty acid binding proteins, genes that are often more expressed in the basal cell of normal breast ascini; 2) the ERBB2+ subtype, characterized by higher expression of the epidermal growth factor (EGF) receptor family member and other genes associated with amplification of the ERBB2 locus at 17q22.24, which includes the growth factor receptor adaptor protein GRB7; and 3) the normal-like subtype, characterized by expression of a large number of genes normally expressed in adipose tissue and other tissues of nonepithelial origin and higher expression of genes associated more with basal epithelial cell expression than luminal epithelial cell expression. The last two remaining molecular phenotypes are breast tumor subtypes referred to as luminal A and luminal B. These two groups characteristically have the highest expression of ER. In addition, luminal A subtype is characterized by higher expression of the transcription factors GATA3, hepatocyte nuclear factor 3
, the estrogen-inducible secreted factor trefoil factor 3 (TFF3), and the estrogen-induced solute carrier SLC39A6/LIV-1. Luminal B is characterized by lower expression of luminal type genes. These markers have been shown repeatedly to reliably segregate breast carcinomas derived from independent data sets and samples into these five specific molecular subtypes.2,3
Recent attempts to identify the mechanism underlying the coordinated gene expression patterns observed in various molecular signatures of cancer are based on the rational assumption that co-expressed genes share similar properties of gene regulation. A large contribution to this coordinated expression occurs at the level of transcription; therefore, it is reasonable to presume that similarly regulated genes should have a higher probability of being regulated by similar transcription factors and transcriptional pathways.8,9
The major targets of these transcription factors and pathways are the noncoding sequences that reside primarily in the regulatory regions upstream of the start of transcription of target genes. An assembly of transcription factor binding sites (TFBSs) found to be nonrandomly shared by several members of a gene list associated with a molecular phenotype can therefore be readily considered to represent a regulatory signature of that phenotype. Identification of such regulatory signatures will help define the transcriptional pathways and molecular signaling events that are integrated to mediate the coordinated expression of multiple genes.8,9 The transcriptional networks elucidated by such an approach will provide important insight into the active molecular events responsible for the evolution of specific tumor subtypes of cancer and suggest new functional molecular targets for therapeutic intervention.
In this study we examined the promoter composition of the genetic signatures inferred from a previously published study that defined five different molecular subtypes of breast cancer that correlated with specific clinical outcomes.2,3
Using a position weight matrix scoring system, the relative enrichment of each of these tumor subtype-specific signatures for 409 different TFBS matrices was determined using a reference background model containing the proximal promoter region of 15,318 RefSeq genes. Comparison of the TFBS significance scoring by hierarchical clustering and principal component analysis (PCA) identified groups and clusters of enriched TFBS from which sets of transcriptional regulatory networks were inferred. One novel inference derived from this approach was the empirically validated identification of a positive autoregulatory loop through which ERBB2 utilizes nuclear factor (NF)-
B pathways to enforce its own expression. This approach represents a novel and powerful method through which the regulatory circuitry underlying breast cancer subtypes associated with specific patient outcomes can be distinguished and dissected in the context of functionally relevant molecular targets and pathways.
| Materials and Methods |
|---|
|
|
|---|
Hierarchical clustering data2
were used to construct an initial list of genes contained within the boundaries of the positive peaks of expression delimited by the gene order and clusters defined in Sorlie and colleagues2
(Figure 1, A and B)
. To do this the median expression of each of the 534 genes in each sample cluster was determined. This produced five median expression profiles for each of the 534 genes of each cluster (Figure 1B)
. The genes within the major contiguous positive peaks for each cluster were then broadly selected (color coded peaks in Figure 1B
). This resulted in an initial overlapping list of genes containing: 136 genes from the basal subtype, 15 genes from the ERBB2+ subtype, 243 genes from the luminal A subtype, 217 genes from the luminal B subtype, and 108 genes from the normal-like subtype. These gene lists were then refined to remove overlapping and other noninformative genes by significance ranking for subtype discriminators using one-way analysis of variance and selecting genes with P values less than 0.01 (before correction for multiple comparisons) (see Supplementary Table S1 at http://ajp.amjpathol.org). The result was a final list of 221 genes: 95 unique genes for luminal A subtype, 21 unique genes for luminal B subtype, 66 unique genes for basal subtype, six unique genes for ERBB2+ tumor subtype, and 13 unique genes for the normal-like tumor subtype (see Supplementary Table S1 at http://ajp.amjpathol.org). Promoter sequences from three 600-bp promoter regions (proximal: –500 bp to +100 bp; upstream: –1100 bp to –500; and downstream: +100 bp to +700 bp, all relative to transcription start site) were retrieved for each gene from the five different tumor subtype-specific gene lists using the ProSpector web-based promoter annotation tool.10
|
Statistical Analysis
One-way analysis of variance, hierarchical clustering, PCA, and intensity plots were generated using Partek Pro 5.1. Principal component (PC) loading correlation values were calculated with Partek Pro 5.1. PCA biplot analysis was performed as previously described.11
For this analysis, TFBS matrices with a correlation value less than 0.75 in any of the first four PCs were removed resulting in 208 matrices. This list was filtered further for a P value
0.05 in one of any of the five subtypes leaving 44 matrices for Biplot analysis. Randomization was performed by taking 40,000 random gene list selections of 6, 13, 21, 66, and 95 genes from the reference background list of 15,318 RefSeq genes and analyzing for frequency of the 409 matrices in each of the 40,000 random iterations for each of the five gene list sizes using MatInspector. Perl scripts were used to create the 40,000 random gene lists, to calculate matches for each of the 409 matrices in each of the random gene lists, and to calculate P values for each of the 409 matrices in each of the random gene lists.
Pathway and Network Analysis
Gene lists were analyzed with the Ingenuity Pathway Analysis software (Ingenuity Systems, Redwood City, CA). Networks were constructed by overlaying the genes in the gene list, called Focus Genes, onto a global molecular network developed from information contained in the Ingenuity Pathways knowledge base. Networks of these focus genes were then algorithmically generated based on their connectivity. A network is a graphical representation of the molecular relationships between genes. Genes are represented as nodes, and the biological relationship between two nodes is represented as an edge (line). All edges are supported by at least one reference from the literature, from a textbook, or from canonical information stored in the Ingenuity Pathways knowledge base. P values for the enrichment of canonical pathways were generated based on the hypergeometric distribution and calculated with the right-tailed Fishers exact t-test for 2 x 2 contingency tables. The composite gene list constructed using TFBSs included all known genes cognate for the particular binding site.
Chromatin Immunoprecipitation
MCF-7 and MDA-MB-231 breast cancer cell lines (gifts from Dr. Alfred Johnson and Dr. Ira Pastan, respectively, from the National Cancer Institute, Bethesda MD) were grown in Dulbeccos modified Eagles medium and 10% fetal calf serum at 37°C and 5% CO2. The cells were serum-starved overnight before a 1-hour stimulation with 12 ng/ml of EGF (Peprotech, Rocky Hill, NJ). Cells were then harvested, cross-linked with formalin, and chromatin immunoprecipitation was performed as previously described.12 The antibodies used were a 1:1 mixture containing 5 µg each of affinity-purified antibodies against p65 (sc-109; Santa Cruz Biotechnology, Santa Cruz, CA) and c-rel/Rel.13 Primers used for the ERBB2 promoter were 5'-TATTTTATCCTTGGTGTCGTGGCAGC-3' and 5'-CATTGGCTGGCACTGGTCCC-3'.
| Results |
|---|
|
|
|---|
Breast cancer subtype-specific signatures were derived from a previously published hierarchical clustering of 534 genes (represented by 552 clones; 500 of these correspond to a single unique UniGene cluster) expressed in 115 breast cancer samples (Figure 1A)
.2
As noted by Sorlie and colleagues,2
specific core genes within these 534 genes showed particular discriminatory power. For our promoter analysis we sought to expand this core list of subtype-specific genes. Therefore lists of up-regulated genes, comprising the core genes grouped with the adjacent members within each cluster, as delimited by the dendrograms for each subtype (Figure 1A)
, were constructed based on the peak median gene expression for the 534 genes (Figure 1B)
. Although focusing exclusively on up-regulated expression creates a bias toward finding presumed positive correlations between TFBS enrichment and gene expression, this simplified approach results in minimizing noise that would result from attempts to distinguish and interpret both positive and negative discriminators during the promoter analysis. Based on this assumption, a preliminary list of subtype-specific genes was constructed from the peak of median expression for each subtype. To reduce stochastic noise in these lists we performed a one-way analysis of variance analysis to remove noninformative genes (P value >0.01, before correction for multiple comparisons) that did not contribute significantly to the classification of the five tumor subtypes (see Supplementary Table S1 at http://ajp.amjpathol.org). This reduced the original 534 genes to a total 201 RefSeq mapped genes comprising five nonoverlapping subtype-specific genetic signatures (95 for luminal A, 21 for luminal B, 66 for basal, 6 for ERBB2+, and 13 for normal-like subtype; see Supplementary Table S1 at http://ajp.amjpathol.org). As shown in Figure 1C
, hierarchical clustering of this refined list resembles the basic partitioning of the tumor subtypes shown in the original hierarchical clustering analysis.2,3
A similar approach was recently described by Muggerud and colleagues.14
Promoter Analysis Reveals Tumor Subtype-Specific Enrichment of TFBSs
To search for potential common regulatory pathways associated with the five subtype-specific genetic signatures, the promoter regions of each gene were extracted from –500 bp upstream to +100 bp downstream of the transcription start site (see Materials and Methods). These promoter regions were then scored for matches to 409 different TFBS matrices using the MatInspector module of GEMS Launcher 4.1 (Genomatix). The observed frequency of the 409 matrices in the promoters of each subtype-specific signature was then compared to the expected frequency using a reference background model of 15,318 human RefSeq genes. The null hypothesis that the observed frequency of TFBS in the selected upstream sequences could be explained as a random fluctuation, as compared to the background frequency, was used to estimate a P value for significant enrichment. A complemented Poisson distribution was used to model the random processes governing the frequency of TFBS in the human genome. We verified this theoretical assumption by comparing the P values derived from the Poisson distribution to frequencies simulated in a random permutation test with 40,000 randomly selected lists of genes of the same size as the original lists. This analysis showed good agreement between the analytic and experimental distributions for each of the gene signatures (see Materials and Methods and Supplementary Figure S1 at http://ajp.amjpathol.org). The P value for each of the 409 matrices in each of the tumor subgroup gene lists was then analyzed by hierarchical clustering after –log2 transformation (Figure 2)
. Clustering of the matrices according to tumor subtype reveals multiple distinct groups of TFBSs that are enriched significantly and selectively within the five subtypes (Figure 2
, Table 1
). These patterns of TFBS enrichment are highly position-dependent because they are not recapitulated at contiguous 600-bp regions downstream or upstream of the proximal –500 to +100 regions characterized in this study (see Supplementary Figure S3 at http://ajp.amjpathol.org).
|
|
0.05) bind factors that participate in well known transcriptional pathways. For example, binding sites for NF-
B (NF-
B) (P
0.042), E2F factors (P
0.01), EGR1 (early growth response protein 1) (P
0.01), and SMAD (P
0.01) are enriched in the ERBB2+ group indicating that signaling through NF-
B-, E2F-, EGR-, and transforming growth factor (TGF)-β-regulated pathways are likely to play functional roles in the biology of tumors classified by this molecular signature.15-17
The two groups of breast cancer that showed the greatest overlap in promoter composition were luminal B and basal-like subtype. A common feature of these groups is the relatively high GC content of many of the consensus sequences representing the matrices enriched in both groups despite the fact that the average GC content of the promoter regions of these groups are not substantially different from the other subtypes (Table 1
Interestingly, the promoter signatures derived from this analysis show relationships among the different tumor subtypes that are different from those implied from the hierarchical clustering of the original gene expression data (Figure 1A)
. Dendrograms of the gene expression signature suggest a close relationship between the basal-like and ERBB2+ phenotypes whereas clustering of the regulatory signatures of these gene groups suggest a closer relationship between luminal B and the basal-like molecular phenotypes (Figure 2G)
. Although this may reflect differences in the clustering parameters (also note that the rederived gene list of 201 genes in Figure 1C
has a slightly different relationship than that of Figure 1A
), this finding suggests that such similarities in the molecular phenotypes may be derived from very different signaling pathways and may therefore implicate different regulatory and/or oncogenic origins despite the fact that they are both associated with a poor prognosis.
PCA of Tumor Subtype-Specific Regulatory Signatures Reveals Distinct Regulatory Trends and Minimizes Redundancies
In Figure 3A
PCA was used to reduce the initial 409 variables or dimensions, represented by the 409 TFBS matrices, to three dimensions that capture most of the trends (variance) in the data set. As shown in Figure 3A
, the tumor subgroups are separated in a three-dimensional space defined by the first three principal components (PCs) of the PCA representing 80.4% of the variance of the data set. Because greater than 80% of the trends in the data can be described in the three PCs, we focused on those patterns that were most apparent in three dimensions. Each PC is a composite derived from specific contribution from each of the original 409 matrices. Thus the relative position of the data points representing the five tumor subgroups are compared in three dimensional space based on their aggregate enrichment for the 409 TFBS. The distance between the data points for each tumor subtype in the three dimensions represents the relative similarity of their regulatory signatures. In other words, shorter distances between signatures indicate greater similarity in promoter composition. As assessed by this method, the tumor subtypes have regulatory signatures that are very distinct. Consistent with the hierarchical clustering in Figure 2
, the greatest similarity exists between the basal-like and luminal B regulatory signatures.
|
0.75). Major contributors to PC4 are BRN2 (L = –0.83), EVI1 (EVI1-myleoid transforming protein) (L = –0.90), NBRE (L = 0.80), NRSE (L = –0.82), PAX3 (L = –0.86), PAX9 (L = –0.79), SOX9 (L = –0.82), and ZNF35 (L = –0.83). All but NBRE have negative correlations, suggesting they are reciprocally enriched in one of the two subtypes. In this case all but NBRE are significantly enriched in the basal-like tumor subtype compared to luminal B. Interestingly most of the factors that bind these sites play substantial roles in embryogenesis and differentiation (see Discussion). Also both three- and two-dimensional PCAs show persistent separation of the luminal B, basal-like, and luminal A subtypes as a common group, distinct from the normal-like and ERBB2+ subtypes, along PC1. An examination of the PC loading of PC1 shows a strong correlation with AP2 (L = 0.987) suggesting that enrichment of this GC-rich matrix is a major distinction separating luminal B, basal-like, and luminal A from the normal-like and ERBB2+ regulatory signatures.
The parameters of PC loading can also be used to filter the original variables to produce a set of reduced size that best characterizes the trends in the regulatory signatures, highlighting those matrices that are significant discriminators despite borderline or low-ranking P values. To do this the 409 matrices were screened by PC loading for those matrices showing a PC loading ( L )
0.75 in any one of the four PCs and a P value
0.05 in any one of the tumor subtypes. This reduced the list of 409 matrices to a final list of 44 (see Supplementary Table S3 at http://ajp.amjpathol.org). In Figure 3D
these 44 original variables are superimposed on the PCA separation of the subtypes in the form of a biplot image. This projection provides a graphic illustration of how the 44 matrices discriminate the different tumor subtype signatures. For instance, AP1 is a dominating discriminating factor and pathway for the normal- like tumor subtype. The ERBB2+ signature projects in the direction of the NF-
B and E2F matrices. The basal-like subtype projects along various PAX matrices, whereas the luminal B subtype projects along the AHR/ARANT pathways. Finally luminal A appears to project in a direction bounded by STAF and NF1 pathways. As shown in Figure 3E
, these 44 matrices produced a separation of breast cancer subtypes that is similar, although not identical, to the hierarchical clustering in Figure 2G
.
Tumor Subtype-Specific Genes and Transcriptional Regulators Are Associated with Specific Regulatory Networks
To search for regulatory networks or pathways that appear most associated with the regulatory signatures found in the tumor subtype promoters, the genes of each respective subtype were combined with the transcription factor genes cognate for the most significantly enriched TFBSs within each subtype. This created a composite list of the regulated and regulator for each genetic signature.10
These enhanced lists were then used to interrogate the Ingenuity Knowledge database25
to construct regulatory networks based on curated interactions between the genes in each respective list.11
Figure 4
shows the five networks that were most populated by the five composite gene lists described above. The basal-like group is dominated by a highly connected Myc node. Luminal A is characterized by multiple hormone and orphan receptor nodes and, interestingly, luminal B contains a central hub with p53. The normal-like signature is dominated by AP-1 pathways and components. Most notably the ERBB2+ signature contains multiple nodes linking NF-
B, E2F, EGR, Forkhead, and SMAD pathways, and show many linkages to canonical pathways that regulate cell-cycle progression.
|
B-associated signaling (P = 4.27, E-03). The basal-like subtype composite signature is also enriched in cell-cycle regulatory pathways in addition to WNT/catenin signaling and endoplasmic reticulum/ER-pathways. Normal-like breast cancer subtype shows relative overpopulation with diverse cytokine- and growth factor-responsive pathways downstream of Jak/Stat and MAP kinase signaling circuitry. The luminal A tumor subtype signature shows a high association with pathways linked to metabolism of hydrophobic amino acids, VEGF and IGF-1 signaling, although IGF-1 signaling is overrepresented in three of the five signatures. Finally, the luminal B composite signature is characterized by sonic hedgehog signaling, notch signaling, sterol synthesis, and p38 MAPK and cell-cycle signaling. The latter two are also shared by the basal-like subtype signature.
|
B and ERBB2 in a Self-Amplifying Regulatory Loop in Human Breast Cancer Cells
The selective enrichment of NF-
B TFBSs in the promoters of the ERBB2+ tumor subtype is consistent with observations that overexpression of ERBB2 is associated with increased activation of NF-
B.26-30
An intriguing aspect of this relationship is that in addition to ERRB2, the growth receptor adaptor protein, GRB7, a direct modulator of the ERBB2 receptor family31
(Figure 4D)
, is also a target of NF-
B pathways because both ERBB2 and GRB7 contain binding sites for NF-
B in their promoters (Table 1
and Supplementary Figure S4 at http://ajp.amjpathol.org). It is therefore very likely that these genes participate in a self-enhancing feed forward loop that amplifies NF-
B molecular signals driven by ERBB2 (see highlighted region in Figure 4D
).
At high levels ERBB2 dimerizes with itself and becomes active in the absence of ligand.32
When present at physiological levels, ERBB2 readily heterodimerizes with all other members of the ERBB family, particularly EGFR, to produce ligand-specific complexes responsive to secreted EGF-1.33
To test whether or not an autoregulatory loop linking NF-
B to ERBB2 may exist in human breast cancer cells, we examined the in vivo association of NF-
B complexes with ERBB2 promoter before and after EGF-1 stimulation by chromatin immunoprecipitation (ChIP) in cells known to express normal (MCF-7) and amplified levels (MDA-MB-231) of ERBB2.33
An antibody cocktail containing affinity-purified antibodies specific for human p65/RelA and c-rel/Rel was used to perform ChIP in resting and EGF-1 stimulated MCF-7 and MDA-MB-231 breast cancer cells. As shown in Figure 5A
, when normalized to nonspecific antibody and input DNA, there is a significant increase in NF-
B association with the ERBB2 promoter of MCF-7 cells after treatment with EGF-1. In contrast, NF-
B binding to ERRB2 in resting MDA-MB-231 is significantly higher than either resting or stimulated MCF-7. Moreover, the response to EGF-1 appears deregulated because the addition of EGF-1 fails to induce further NF-
B binding and instead shows some variable depression in MDA-MB-231. These novel data demonstrate that an autoregulatory loop, similar to what is schematically outlined in Figure 5B
, exists in human breast cancer and is the first demonstration that NF-
B associates with the ERBB2 promoter in vivo in human breast-derived cells.
|
| Discussion |
|---|
|
|
|---|
B, E2F, EGR1, and SMAD (TGF-β) transcriptional pathways. The second network, which was more highly associated with the basal-like molecular subtype of breast cancer, was dominated by PAX transcriptional circuitry. The implications of these inferred interactions forms the foundation for specific hypothesis generation with the goal of defining the underlying biology of these breast cancer subtypes and uncovering promising new therapeutic targets and prognostic molecular markers. Defining functional promoter composition in complex organisms continues to be a daunting task.34-36 In the field of cancer research, a variety of approaches have been conceived and each has particular strengths and weaknesses.6,8,9 In this study we used position weight matrix scoring.37 A typical problem faced by this and many other similar promoter annotation approaches is the number of false positives generated in the analysis. To minimize this occurrence we chose to use a background model containing the promoter regions of 15,318 RefSeq genes to use as a reference for statistical enrichment assuming a Poisson distribution. This type of reference model has the advantage over using a reference of random DNA sequences in that it reflects and maintains the natural bias toward GC richness that exists in many promoter regions of the human genome.38-41 Thus GC-rich TFBS are not inappropriately overrepresented. The robustness of the enrichment analysis is illustrated by our postanalysis permutation testing which indicates that our significance scoring is conservative (Supplementary Figure S1 at http://ajp.amjpathol.org). A recent very interesting reexamination of human promoters suggests that mammalian promoters can be classified into four categories characterized by the GC content upstream and downstream of the transcription start site (combination of high or low GC content downstream or upstream of the transcription start site).41 By this classification all five genetic signatures analyzed in this study are GC-rich (>55%) upstream and downstream of the transcription start site (class A, according to Bajic and colleagues41 ) (see Supplementary Figure S2 at http://ajp.amjpathol.org). Thus the differences in promoter composition identified in this study are more likely to reflect true biology rather than asymmetric fluctuations in GC content.
Another significant feature of position-weighted TFBS matrices that hinders even the most focused promoter analysis is their inherent redundancy and degenerate nature. This is a property that is not readily handled within the limitations of the separation provided by hierarchical clustering. Although this feature insures the ability to detect subtle differences, it has the negative result of adding significant noise to any multivariate analysis. We used the method of PCA to address some of these flaws by reducing high dimensional variables into fewer dimensions that explain the most characteristic features or trends in the data sets.10,11,13,42 The noise reduction provided by this transformation had the net effect of limiting the negative contribution of the more degenerate and redundant matrices. In this way we feel we increased the likelihood that observed correlations will reflect true linkages with more informative biological significance.
When the results of hierarchical clustering and PCA are compared, the results were similar although not identical in several aspects. Each approach indicated that the basal-like and lumen B-like subtypes are more similar to each other than the other three suggesting common regulatory phenotype for these two subtypes. There was more variability for the relationship between the ERBB2+ subtype versus the normal-like and lumen A subtype. Interestingly, when top binding sites clustered to each subgroup by hierarchical clustering in Figure 2
was compared to the most discriminating TFBS motifs by PCA in Figure 3E
, the agreement was 60% for the normal subtype, 56% for ERBB2+, 82% for luminal B, 86% for the basal-like subtype, and only 7% for the lumen A. The reason for major discrepancy between the two techniques for luminal A-type annotation is not clear. However, it may arise from the greater bias of the PCA analysis to strong enrichment signals because most of the TFBSs scored as PC loading discriminators are the most highly enriched TFBS in the group. It would be interesting in the future to compare different thresholds used in the analysis of the PCA and for position weight matrix scoring. In the current analysis we arbitrarily chose the default threshold cutoff of 0.75 for position weight matrix and we used 0.75 for the PC coefficients because these thresholds have performed well as discriminators in prior studies.10,11
One of the most compelling aspects of this study is the correct identification and subsequent empirical validation of a autoregulatory linkage of NF-
B pathways to ER-negative/ERBB2-positive breast cancer (Figures 2 to 5)
. Since its initial identification several years ago, the functional interaction of NF-
B pathways in ER-negative/ERBB2-positive breast cancer has been extensively examined.26-30
Now it is widely recognized that NF-
B plays a role in a variety of different human cancers.43
ERBB2 is a member of the ERBB superfamily of receptor tyrosine kinases (RTKs) that mediate growth signaling in many different cellular lineages. It is overexpressed in more than 20% of invasive breast cancers and a founding feature of the ERBB2+ tumor subtype signature that is associated with poor prognosis.44
There are four members of the ERBB family including epidermal growth factor (EGFR/ERBB1), ERBB2 (NEU/HER2), ERBB3 (HER3), and ERBB4.33
Feed forward loops are important network motifs that can act as biological switches to render cellular processes more sensitive to sustained, rather than transient stimuli.45,46
This is an essential property of the events that control epithelial growth and differentiation. This is particularly important because co-expression of EGFR (ERBB1) and ERBB2 is found in greater than 10% of patients with breast cancer and carries a poorer prognosis than elevated ERBB2 expression alone.47,48
Heterodimers between ERBB2 and ERBB3 are believed to be the most biologically active and tumor-promoting forms.32
Both the hetero- and homodimerization of ERBB2+ are thought to play a significant role in the activation of NF-
B protein complexes and the evolution of breast cancer.49,50
The identification of a feed forward network motif that drives signaling from the ERBB receptor network through NF-
B has important therapeutic and prognostic implications. As suggested in the schematic illustration in Figures 4D and 5B
, NF-
B functions as a hub, connecting and self-amplifying ligand-dependent signals emanating from different combinations of the ligand-bound ERBB2 family receptors. The involvement of GRB7 indicates that this feed forward loop will influence the activity of many other growth factor receptors including c-Kit, PDGFR (platelet-derived growth factor receptor), and insulin receptor.31
The fact that multiple ERBB2 ligands and receptor dimers influence NF-
B signaling highlights the importance of ligand-receptor interactions in the pathophysiology of breast cancer and emphasizes an area important for therapeutic intervention. ERBB2 is targeted therapeutically in breast cancer patients by humanized anti-ERBB2 antibodies such as Herceptin.51
The interaction of the ERBB and NF-
B networks, inferred from our study, provides a rationale for the design of combinatorial therapeutic strategies that will simultaneously target the NF-
B, EGFR, and ERBB2 components in this regulatory network. A representative example would be the combination of agents such as Velcade (which targets NF-
B52
), Herceptin (which targets ERBB2), and Iressa (which targets EGFR53
) as a therapeutic regimen. It should be noted that while this work was in review, a report describing the combinatorial use of Velcade and Herceptin in a preclinical study was published.54
As we predicted, the combination of these compounds showed significant synergy against ERBB2+ tumors. It reasonable then to assume that one of the major underlying mechanisms for this synergy is the multicomponent disruption of the tumors addiction to reinforced signaling through NF-
B.
EGR1 was also inferred as an interacting member of the ERBB2+ regulatory signature (Figures 2, 4, and 5)
. EGR1 has been previously shown to play a major role in cell growth, differentiation, survival, and transformation of other epithelial tumors.55
Recent studies suggest that EGR1 regulates EGFR expression.56
Therefore it is possible that EGR1 may participate in a second self-amplifying feed forward response in the regulation of ERBB2 networks in breast cancer. The EGR family members of the ERBB2+ tumor subtype regulatory signature may therefore be therapeutic targets in this form of breast cancer.
A previously unrecognized regulatory limb in the ERBB2+ transcriptional network is the TGF-β-regulated pathway (Figure 2, 4, and 5)
. The SMAD transcription factors are major effectors of TGF-β signaling.57
Reports of a cross talk between NF-
B and TGF-β signaling had previously been described, but with conflicting outcomes. In some cases the cross-TGF-β signaling was inhibitory for NF-
B signaling58
whereas in others it was stimulatory.59
The precise role of the interaction between NF-
B and TGF-β in breast cancer will require further investigation. It is tempting to speculate that the tumor microenvironment may have a role in determining the influence TGF-β with inflammatory components within the tumor microenvironment possibly by producing interactions between NF-
B and TGF-β signaling that may act in synergy to promote more aggressive phenotypes.
Like ERBB2+, the basal-like molecular signature is also associated with a poor prognosis. The basal-like subtype is uniquely enriched with transcription factors linked to development and differentiation including Sox and several Pax transcription factors.21,60 Recent phenotypic characterization of basal-like tumors by immunohistochemistry indicates that they are typically negative for ERBB2 and ER with infrequent expression of myoepithelial markers.61 The lack of myoepithelial markers is contrary to their presumed basal-like cell origin.3 Other possibilities for the origin of these cells include epithelial-to-mesenchymal transition or derivation from breast epithelial stem cells.62,63 The high enrichment of the basal-like regulatory signature with binding sites for factors that control differentiation and development would argue in favor of either of these possibilities. A curious observation in the basal-like group is the enrichment for ER-binding sites and a marginal enrichment for p53-binding sites. The fact that both of these genes are mutated or absent in this tumor subtype3 suggests that several of the genes responsible for the basal-like molecular phenotype may be potential targets for repression by intact ER and p53 signaling. Given the regulatory similarity between the ER-positive, lumen B molecular signature, and the basal-like molecular signature, it would be reasonable to speculate that the basal-like phenotype could have originated from a more luminal B-like phenotype after loss of ER expression and mutation of p53. Similar to the insights gained from the analysis of the ERBB2+ tumor subtype, these inferences could be important for future pharmacological and gene therapeutic strategies that specifically target these types of breast cancers.
Finally, it must be stressed that this study does not use TFBSs as predictors of outcomes. It is more or less an annotation of the most enriched binding sites from gene sets that have been previously defined as predictive. These annotations are then used to define pathways that are associated with the signatures. Thus the flaws of this approach will be no fewer than that of the original study. One drawback that must be considered is that the original gene expression study is derived from samples of tumor that may be a mixture of many different tissue types including stromal elements and inflammatory components. Therefore the pathway inference interpretation should be approached with appropriate caution or expanded to consider that the pathway inferences could represent a composite signature of both the tumor and the tumor microenvironment. When comparable expression data from microdissected tissue samples become available it will be of interest to perform a similar analysis.
| Acknowledgements |
|---|
| Footnotes |
|---|
Supported by the Intramural Research Program of the National Institutes of Health, National Cancer Institute.
G.I. and S.H.N. contributed equally to this study.
Supplemental material for this article can be found on http://ajp. amjpathol.org.
Accepted for publication November 1, 2007.
| References |
|---|
|
|
|---|
| |||||