Epithelium percentage estimation facilitates epithelial quantitative protein measurement in tissue specimens
© Chen et al.; licensee BioMed Central Ltd. 2013
Received: 5 September 2013
Accepted: 26 October 2013
Published: 1 December 2013
The rapid advancement of high-throughput tools for quantitative measurement of proteins has demonstrated the potential for the identification of proteins associated with cancer. However, the quantitative results on cancer tissue specimens are usually confounded by tissue heterogeneity, e.g. regions with cancer usually have significantly higher epithelium content yet lower stromal content.
It is therefore necessary to develop a tool to facilitate the interpretation of the results of protein measurements in tissue specimens.
Epithelial cell adhesion molecule (EpCAM) and cathepsin L (CTSL) are two epithelial proteins whose expressions in normal and tumorous prostate tissues were confirmed by measuring staining intensity with immunohistochemical staining (IHC). The expressions of these proteins were measured by ELISA in protein extracts from OCT embedded frozen prostate tissues. To eliminate the influence of tissue heterogeneity on epithelial protein quantification measured by ELISA, a color-based segmentation method was developed in-house for estimation of epithelium content using H&E histology slides from the same prostate tissues and the estimated epithelium percentage was used to normalize the ELISA results. The epithelium contents of the same slides were also estimated by a pathologist and used to normalize the ELISA results. The computer based results were compared with the pathologist’s reading.
We found that both EpCAM and CTSL levels, measured by ELISA assays itself, were greatly affected by epithelium content in the tissue specimens. Without adjusting for epithelium percentage, both EpCAM and CTSL levels appeared significantly higher in tumor tissues than normal tissues with a p value less than 0.001. However, after normalization by the epithelium percentage, ELISA measurements of both EpCAM and CTSL were in agreement with IHC staining results, showing a significant increase only in EpCAM with no difference in CTSL expression in cancer tissues. These results were obtained with normalization by both the computer estimated and pathologist estimated epithelium percentage.
Our results show that estimation of tissue epithelium percentage using our color-based segmentation method correlates well with pathologists' estimation of tissue epithelium percentages. The epithelium contents estimated by color-based segmentation may be useful in immuno-based analysis or clinical proteomic analysis of tumor proteins. The codes used for epithelium estimation as well as the micrographs with estimated epithelium content are available online.
KeywordsEpithelium Cancer Stroma Computer-aided classification
The rapid advancement of high-throughput tools for measurement of proteins from cancer tissues or body fluids has demonstrated the potential for the identification of proteins associated with diseases in all areas of medicine. Most of these high-throughput tools utilize either mass spectrometry (MS)-microarray-, or immunosorbent assays for quantitative analysis of proteins . With the advantage of quantitative measurement, currently, many protein assays with good sensitivity and specificity have been developed for research and clinical use in serum, urine and other body fluids. However, the analysis of proteins in tissue specimens is limited to the semi-quantitative immunohistochemistry (IHC) assay that are required to obtain the tissue spatial information and cell type-specific staining patterns. The usage of quantitative protein assays such as MS, microarray, or enzyme linked immunosorbent assay (ELISA) on tissue specimens, however, has its limitations. Due to the loss of spatial information, the measurements acquired are usually confounded by tissue heterogeneity. Since tissue specimens contain various types of cells, where the expressions of target proteins differ, protein assay results become hard to interpret and may even be misleading.
With respect to cancer research, assessment of the expression of epithelial proteins is of great interest, since over 90% of the carcinoma is of epithelial origin . Compared to regions with normal tissue, regions with cancer usually have significantly higher epithelium content yet lower stromal content. Depending on tumor density, the epithelium to stroma ratio may vary considerably and may influence protein quantitation readings significantly when an epithelial protein is concerned, e.g. a higher epithelial protein reading in tumor tissues might be solely due to the increased epithelial content of the epithelium rather than the biological overexpression of that protein. Therefore, it would be important to consider the epithelium content when we analyze the protein levels using quantitative protein assays.
There are a number of approaches to identify and quantify epithelium content from histology slides. Traditionally, the epithelium contents are read based on nuclei counts from a hematoxylin and eosin (H&E) stained histology slide by a pathologist. Another approach is to stain the histology slide with anti-cytokeratin antibody CAM 5.2 (staining for epithelia) and Masson trichrome (staining for collagenous stromal structures) . More recently, with the digitization of whole slide imaging, a number of algorithms have been developed for computer-assisted readings. These methods rely on image features such as morphology, texture, color and intensity to segment images and classify them into various pathologically different regions. Automated histopathological image analysis reduces the inter-and intra-observer errors and provides additional quantitative information to aid diagnosis . However, to our knowledge, these measurements have never been utilized for protein measurements.
Epithelial cell adhesion molecule (EpCAM) and cathepsin L (CTSL) are epithelial proteins that have been found abundantly expressed in prostate adenocarcinomas [5, 6]. EpCAM is a well known tumor associated antigen and is expressed in various adenocarcinomas and squamous cell carcinomas (e.g. prostate, lung, colon, gastric carcinomas) [7–9]. Its expression on normal epithelia, on the other hand, is rather variable yet much lower than the carcinoma cells . CTSL is a lysosomal cystein proteinase that plays a major role in the catabolism of intracellular and extracellular proteins . Studies on prostate cancer cell lines suggested that CTSL was associated with the motility of prostate tumor cells and therefore might be involved in tumor metastasis [11, 12]. Although previous studies suggested an increase in CTSL mRNA expression in prostate adenocarcinomas , a recent study showed that CTSL staining in prostate tissues is comparable between prostate adenocarcinomas and normal tissues .
In this study, we assessed EpCAM and CTSL levels with ELISA in prostate cancer tissues and determined the effect of epithelium content on tissue protein quantitation. To determine the effect of tissue heterogeneity on the interpretation of the ELISA result, we developed an in-house color-based segmentation method for estimation of epithelium content and applied the method on ELISA results. A pathologist estimation of the epithelium content was also applied on ELISA results. We found that both EpCAM and CTSL levels, measured by ELISA itself, were greatly affected by epithelium content in the tissue specimens. However, after normalization by epithelium percentage, ELISA measurements of both EpCAM and CTSL were in agreement with IHC staining results, demonstrating the need of normalization using epithelium content in quantitative measurement of epithelial proteins in tissue specimens.
Materials and methods
LSAB + Kits, biotin blocking system, antibody dilution buffer were from Dako, Carpinteria, CA. Goat-anti-CTSL antibody, antigen retrieval buffer, recombinant protein, capture and detection antibody of human CTSL, EpCAM, streptavidin-HRP conjugates and ELISA plates were from R&D Systems, Minneapolis, MN. All other chemicals were from Sigma-Aldrich (St. Louis, MO).
Samples and clinical information were obtained with informed consent and performed with the approval of the Institutional Review Board of the Johns Hopkins University. Formalin fixed paraffin embedded (FFPE) prostate tissue slides were acquired for 6 individuals with primary prostate tumors. Additional thirty-six OCT-embedded prostate tumors were collected from radical prostatectomy at Johns Hopkins Hospital and Johns Hopkins Bayview Medical Center under the NCI-funded Johns Hopkins prostate cancer SPORE project. These tumors includes nineteen specimens with a Gleason score of 6, seven specimens with a Gleason score of 7, five specimens with a Gleason score of 8 and five specimens with a Gleason score of 9 (Additional file 1: Table S1). Eight OCT-embedded normal prostate tissues were collected from healthy transplant donors. All specimens were snap-frozen, embedded in OCT and stored at −80°C till use.
Immunohistochemical staining and tissue microarrays
IHC staining was performed on FFPE prostate tissue slides from 6 individuals with primary prostate tumors. Sections of tissue were deparaffinized and rehydrated. Tissues were incubated in antigen retrieval buffer at 92-95°C for 10 min. CTSL was stained with Universal LSAB™ + Kits per manufacture’s protocol. Briefly, tissues were blocked by peroxidase block and 3% BSA/PBS for 30 min each followed by avidin and biotin block with Biotin Blocking System for 15 min each at room temperature. The tissues were then incubated with goat anti-CTSL primary antibody in antibody dilution buffer at 4 μg/mL followed by incubation with anti-goat biotin labeled secondary antibody and high sensitivity streptavidin-HRP for 30 min each. The CTSL staining was detected with DAB chromogen.
Measurements of proteins from clinical specimens using ELISA
The protein samples were collected by sectioning the OCT-embedded frozen prostate tissues. The adjacent sections of about every 15 tissue sections (6 μm each) were stained with H&E for use in the computer-aided and pathologist estimation of epithelium content. For tumor specimens, the adjacent H&E slides were also used for cryostat micro-dissection to enrich the tumor tissue in the collected sample for immunoassay analysis. Places where tissues were trimmed were marked in the H&E slides and excluded for epithelium percentage estimation. An estimated number of ten to twenty 6 μm-thick tissue sections were collected in sterile screw-cap bullet tubes for each sample for protein assays. Proteins were then extracted from tissue sections using cell lysis buffer (50 mM Tris, pH 8.0, 150 mM NaCl, 0.1% SDS, 0.5% Na Deoxycholate, 1% Triton × 100). BCA assay was performed to determine and adjust the protein concentration for each tissue sample to 1 μg/mL with PBS. Tissue EpCAM and CTSL levels were then measured with ELISA assay as described before . Briefly, CTSL (1 μg/mL) or EpCAM (4 μg/mL) capture antibody were coated overnight in a 96-well plate. The wells were then blocked with 3% BSA, incubated with 100 μL diluted sample, with CTSL (0.5 μg/mL) or EpCAM (0.2 μg/mL) biotinylated detection antibody for 1 h each, and with streptavidin-HRP conjugates (1:200) for 30 min. The assays were then developed with TMB substrate, stopped with H2SO4 and measured by reading the plate at 450 nm with a spectrophotometer.
Estimation of epithelium ratio in prostate tissue specimens
An in-house color-based segmentation method was developed for estimation of epithelial areas in prostate tissue specimens using H&E stained face sections of prostate tissues. For each of the 44 cases (36 tumors and 8 normal prostate tissues), digital slides were acquired by scanning the H&E stained slide with AT Turbo (Aperio technologies, Vista, CA.). Every 2.1 × 1.3 mm2 area of the micrograph was then saved into a .tiff file at a resolution of 72 pixels per inch using the ImageScope software (Aperio technologies, Vista, CA.) and an estimated number of 13 ± 10 image files were generated for each one of the 44 cases. One image from each case was randomly selected to serve as the training image for the classification algorithm and the rest of the images were used as the test image set. The training images were used as the input to the classification training code. All computer simulations were implemented in MATLAB (Mathworks, Natick, MA). Each training image was segmented into four regions based on the pixel colors using a k-means clustering algorithm. K-means clustering algorithm is a clustering analysis tool for grouping a number of observations into k clusters based on the similarities between the observations. Briefly, the observations were randomly assigned to clusters for initialization and the centroid of each cluster was calculated. In an iterative manner, the cluster of each observation was updated to its nearest centroid and the centroids of the clusters were re-calculated to reflect the changes to the clusters, until the centroids converged to the optimal values. In other words, each color cluster was formed by minimizing the squared euclidean distance of the cluster members to its centroid. This will group pixels with similar colors together in a color cluster of white, bright pink, dark pink or purple in an H&E stained slide where white, pink and purple correspond to lumen, stroma and epithelium respectively. Each micrograph was arranged into 20 × 20 pixel grids such that each grid covered a 0.04 × 0.04 mm2 area of the tissue section. The ratio of the area of the four colors to the total area of the cell was calculated for each grid cell. Clearly, the resultant four color ratios would sum to 1 in each grid cell; thus only three of the color ratios were linearly independent. The epithelial regions of the original training image were manually marked by an experienced researcher to serve as the benchmark for the training of the classification algorithm. The marked epithelial regions are shown by a green shade. The grid cells were then divided into two groups based on whether they were marked as epithelium or not and illustrated on a scatterplot in the space of the three base colors of the H&E micrographs. Any grid cell with white content greater than 70% was marked as luminal. Knowing the class of each of the grid cells based on the marked epithelial regions, K-means clustering was again used to divide the space of the three base colors into three clusters representing the epithelial, stromal and luminal regions. These clusters, established in the space of three optionally selected color ratios, were then used to segment the images in the test dataset into epithelium, stroma and lumen area, thus estimating the percentage of each in the whole face section of prostate tissue. Segmentation code worked similar to the training code as the test image was segmented into four colors and the color ratios were calculated for each grid cell on the image. Each grid cell was classified into epithelial, stromal or luminal depending on its distance from the established clusters in the space of the three color ratios. The epithelium percentage was calculated by the following formula: Epithelium area/(Epithelium area + Stroma area) × 100%. The Matlab codes and H&E micrographs used for epithelium estimation, as well as the micrographs with estimated epithelium content are available for download at http://sdrv.ms/17tWsqd.
Wilcoxon signed rank order test (unpaired, two-sided) was used for determination of statistical significance of EpCAM and CTSL immunoassay measurements.
IHC staining of CTSL in prostate tissue specimens
ELISA measurements of EpCAM and CTSL in prostate tissue specimens
Estimation of epithelium percentage in prostate tissue
Normalization of EpCAM and CTSL measurements with estimated epithelium percentage
As EpCAM and CTSL are expressed in the prostatic epithelium from IHC staining results, we used the epithelium percentage to normalize the EpCAM and CTSL ELISA results. After computer-aided normalization, the average measured EpCAM was 39.11 ± 16.69 and 82.70 ± 39.56 ng/mg total protein/epithelium percentage for normal prostate tissues and prostate tumors respectively (Figure 5C); the average measured CTSL was 27.27 ± 10.38 and 23.44 ± 12.80 ng/mg total protein/epithelium percentage for normal prostate tissues and prostate tumors (Figure 5D). After normalization by the pathologist estimated epithelium percentage, the average measured EpCAM was 32.02 ± 21.32 and 73.27 ± 34.12 ng/mg total protein/epithelium percentage for normal prostate tissues and prostate tumors respectively (Figure 5E); The average measured CTSL was 19.99 ± 7.78 and 20.74 ± 10.53 ng/mg total protein/epithelium percentage for normal prostate tissues and prostate tumors (Figure 5F). With both epithelium estimations, EpCAM expression was significantly increased in prostate tumors by about 2 fold, compared to the 4.75 fold increase without normalization (Figure 3C). In contrast to the significant elevated CTSL expression in prostate tumors from ELISA results without normalization by epithelial content (Figure 3D), CTSL expression was comparable between normal tissues and prostate tumors after epithelium normalization. These results are in agreement with previous reports where IHC staining was used to assess the protein expression of EpCAM and CTSL [10, 14], demonstrating the positive impact of epithelium normalization in analyzing immunoassay results.
Tissue specimen is a great source for identification of disease related molecules, e.g. cancer related molecules/markers. To evaluate protein expression in tissue specimens, IHC staining is one of the most common techniques utilized. IHC staining provides insight into tissue heterogeneity, disease relevance of protein markers, and the expression pattern of protein markers in different cell types. However, IHC staining is subject to inter-observer error and is at best semi-quantitative. Direct measurements of proteins by mass spectrometry, microarray, or immunosorbent assay, on the other hand, are quantitative and can be standardized for quality assurance. Consequently, use of immunosorbent assays to measure protein expression in tissue specimens is highly desirable if the result can be properly interpreted.
In this study, we introduced epithelial percentage normalization as a tool in interpreting immunoassay results for epithelial proteins. We developed a tool for automated segmentation of micrograph slides into epithelial and non-epithelial regions using k-means clustering. K-means clustering is one the fastest and simplest clustering analysis tools that can reduce large datasets into smaller, more manageable subspaces based on the similarities observed in the dataset. The proposed epithelial percentage estimation method segments the image into four colors that are most dominant in typical H&E micrograph slides. The four colors are determined by the k-means clustering of the single pixels on each micrograph. Therefore, although these colors are categorized as white, light and dark pink and purple, they might slightly vary from one slide to the other depending on the strength of the staining on each slide. Subsequently, this method automatically accounts for the variations between the color staining on different tissue sections, which is a significant challenge in universal image processing of histology slides. In addition, this algorithm achieved an accuracy of 84% on a database of normal and prostate cancer tissue sections. The false classification of 16% was almost equally divided between the false positive and false negative results. Although the false positive and negative results are not desirable and must be minimized, in the context of epithelium percentage estimation, the inaccuracy introduced by false classifications is cancelled out to a great degree. Therefore, the equity of false positive and negative results explains the high correlation of the estimated epithelium percentage with the marked epithelial percentage, which is desirable for normalizing and interpreting the immunoassay for determining the biological changes in protein expression.
It needs to be noted that to use epithelium percentage estimation in epithelial protein measurement in tissue specimens, the H&E stain needs to be representative of the entire tissue that is studied for a given protein, and the protein needs to be measured on the same exact piece of tissue. This is because the range of epithelial to stromal ratios on a given mass of tissue varies greatly depending on the size of the tissue and the homogeneity of the tissue structures (e.g. a given prostate cancer could be 80% epithelial in some area and 5% in others). In this study, to ensure the accuracy of epithelium percentage estimated, adjacent H&E slide of approximately 15 sections of prostate tissues (6 μm each) was used for epithelium estimation.
With EpCAM and CTSL expressions in prostate tumor tissues, we demonstrated that normalization by epithelial percentage is useful in analyzing ELISA results for epithelial proteins. With IHC staining carried out in this study and in a previously published study [10, 14], we showed that CTSL expression was not significantly different between normal tissue and prostate tumors after epithelium normalization. While ELISA itself misleadingly showed a significant increase of CTSL in prostate tumors, with normalization by epithelium percentage, ELISA analysis also showed that CTSL expression was comparable between these two groups. EpCAM was shown to be up-regulated in prostate carcinomas in a number of studies with IHC staining [7–9]. In this study, we also found a significant increase in EpCAM with immunoassay analysis both with and without normalization by epithelium percentage. However, the difference of EpCAM expression between tumor and normal tissues dropped significantly from 4.68 to around 2, which better depicted the biological differences of EpCAM between tumor and normal cells.
In addition to facilitate analysisof ELISA results of epithelial proteins, epithelium percentage estimation can also be used in other quantitative assays (e.g. mRNA expression, protein activity assay, clinical proteomic analysis etc.) where spatial information is lost due to sample homogenization to account for epithelium heterogeneity. With epithelium percentage normalization, improvement in the accuracy of biochemical measurements in homogenized tissue specimens may be achieved. In addition, by accounting for tissue heterogeneity, interesting new protein markers may be identified. However, further studies needs to be carried out in testing the accuracy of epithelium percentage estimation.
In summary, we developed an in-house color-based segmentation method for estimation of epithelium content and demonstrated the accuracy of the method in epithelium estimation. Using EpCAM and CTSL as examples, we demonstrated that protein expressions measured by immunoassays correlate well with that measured IHC staining, suggesting that normalization by epithelium percentage is helpful in interpreting ELISA and similarly other biochemical or proteomics based assay results.
This work was supported by Grant U01CA152813 from the National Institutes of Health, National Cancer Institute, the Early Detection Research Network (NIH/NCI/EDRN) and grant (U24CA160036) from the National Cancer Institute Clinical Proteomic Tumor Analysis Consortium (CPTAC).
- Zhang H, Chan DW: Cancer biomarker discovery in plasma using a tissue-targeted proteomic approach. Cancer Epidemiol Biomarkers Prev. 2007, 16: 1915-1917. 10.1158/1055-9965.EPI-07-0420View ArticlePubMedGoogle Scholar
- Society AC: Cancer Facts and Figures 2013. Book Cancer Facts and Figures 2013. 2013, City: American Cancer Society,Google Scholar
- Veltri RW, Park J, Miller MC, Marks L, Kojima M, Van Rootselaar C, Khan MA, Partin AW: Stromal-epithelial measurements of prostate cancer in native Japanese and Japanese-American men. Prostate Cancer Prostatic Dis. 2004, 7: 232-237. 10.1038/sj.pcan.4500738View ArticlePubMedGoogle Scholar
- Gurcan MN, Boucheron LE, Can A, Madabhushi A, Rajpoot NM, Yener B: Histopathological image analysis: a review. IEEE Revi Biomed Eng. 2009, 2: 147-171.View ArticleGoogle Scholar
- Baeuerle PA, Gires O: EpCAM (CD326) finding its role in cancer. Br J Cancer. 2007, 96: 417-423. 10.1038/sj.bjc.6603494PubMed CentralView ArticlePubMedGoogle Scholar
- Chauhan SS, Goldstein LJ, Gottesman MM: Expression of cathepsin L in human tumors. Cancer Res. 1991, 51: 1478-1481.PubMedGoogle Scholar
- Went PT, Lugli A, Meier S, Bundi M, Mirlacher M, Sauter G, Dirnhofer S: Frequent EpCam protein expression in human carcinomas. Hum Pathol. 2004, 35: 122-128. 10.1016/j.humpath.2003.08.026View ArticlePubMedGoogle Scholar
- Woelfle U, Breit E, Zafrakas K, Otte M, Schubert F, Muller V, Izbicki JR, Loning T, Pantel K: Bi-specific immunomagnetic enrichment of micrometastatic tumour cell clusters from bone marrow of cancer patients. J Immunol Methods. 2005, 300: 136-145. 10.1016/j.jim.2005.03.006View ArticlePubMedGoogle Scholar
- Went P, Vasei M, Bubendorf L, Terracciano L, Tornillo L, Riede U, Kononen J, Simon R, Sauter G, Baeuerle PA: Frequent high-level expression of the immunotherapeutic target Ep-CAM in colon, stomach, prostate and lung cancers. Br J Cancer. 2006, 94: 128-135. 10.1038/sj.bjc.6602924PubMed CentralView ArticlePubMedGoogle Scholar
- Trzpis M, McLaughlin PM, De Leij LM, Harmsen MC: Epithelial cell adhesion molecule: more than a carcinoma marker and adhesion molecule. Am J Pathol. 2007, 171: 386-395. 10.2353/ajpath.2007.070152PubMed CentralView ArticlePubMedGoogle Scholar
- Colella R, Jackson T, Goodwyn E: Matrigel invasion by the prostate cancer cell lines, PC3 and DU145, and cathepsin L + B activity. Biotech Histochem. 2004, 79: 121-127. 10.1080/10520290400010572View ArticlePubMedGoogle Scholar
- Colella R, Casey SF: Decreased activity of cathepsins L + B and decreased invasive ability of PC3 prostate cancer cells. Biotech Histochem. 2003, 78: 101-108. 10.1080/10520290310001593856View ArticlePubMedGoogle Scholar
- Friedrich B, Jung K, Lein M, Turk I, Rudolph B, Hampel G, Schnorr D, Loening SA: Cathepsins B, H, L and cysteine protease inhibitors in malignant prostate cell lines, primary cultured prostatic cells and prostatic tissue. Eur J Cancer. 1999, 35: 138-144. 10.1016/S0959-8049(98)00273-1View ArticlePubMedGoogle Scholar
- Tian Y, Bova GS, Zhang H: Quantitative glycoproteomic analysis of optimal cutting temperature-embedded frozen tissues identifying glycoproteins associated with aggressive prostate cancer. Analytical Chem. 2011, 83: 7013-7019. 10.1021/ac200815q. 10.1021/ac200815qView ArticleGoogle Scholar
- Chen J, Xi J, Tian Y, Bova GS, Zhang H: Identification, prioritization and evaluation of glycoproteins for aggressive prostate cancer using quantitative glycoproteomics and antibody-based assays on tissue specimens. Proteomics. 2013, 13 (15): 2268-2277. 10.1002/pmic.201200541PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.