A proteomic repertoire of autoantigens identified from the classic autoantibody clinical test substrate HEp-2 cells

Background Autoantibodies are a hallmark of autoimmune diseases. Autoantibody screening by indirect immunofluorescence staining of HEp-2 cells with patient sera is a current standard in clinical practice. Differential diagnosis of autoimmune disorders is based on commonly recognizable nuclear and cytoplasmic staining patterns. In this study, we attempted to identify as many autoantigens as possible from HEp-2 cells using a unique proteomic DS-affinity enrichment strategy. Methods HEp-2 cells were cultured and lysed. Total proteins were extracted from cell lysate and fractionated with DS-Sepharose resins. Proteins were eluted with salt gradients, and fractions with low to high affinity were collected and sequenced by mass spectrometry. Literature text mining was conducted to verify the autoantigenicity of each protein. Protein interaction network and pathway analyses were performed on all identified proteins. Results This study identified 107 proteins from fractions with low to high DS-affinity. Of these, 78 are verified autoantigens with previous reports as targets of autoantibodies, whereas 29 might be potential autoantigens yet to be verified. Among the 107 proteins, 82 can be located to nucleus and 15 to the mitotic cell cycle, which may correspond to the dominance of nuclear and mitotic staining patterns in HEp-2 test. There are 55 vesicle-associated proteins and 12 ribonucleoprotein granule proteins, which may contribute to the diverse speckled patterns in HEp-2 stains. There are also 32 proteins related to the cytoskeleton. Protein network analysis indicates that these proteins have significantly more interactions among themselves than would be expected of a random set, with the top 3 networks being mRNA metabolic process regulation, apoptosis, and DNA conformation change. Conclusions This study provides a proteomic repertoire of confirmed and potential autoantigens for future studies, and the findings are consistent with a mechanism for autoantigenicity: how self-molecules may form molecular complexes with DS to elicit autoimmunity. Our data contribute to the molecular etiology of autoimmunity and may deepen our understanding of autoimmune diseases.

has been rather intriguing. Normally, the immune system reserves immune responses to attack invading microorganisms and protect the body. Due to unclear circumstances, however, the immune response can sometimes go astray and mistakenly attack its own tissue. From a molecular point of view, certain self-molecules are targeted, as if they are non-self, by autoreactive cells, autoantibodies (autoAbs), or other factors, which subsequently leads to damage of the tissue where the autoantigens (autoAgs) reside.
Among the hundreds of thousands of human molecules, only a small portion have been reported to be autoAgs. Moreover, the autoAgs appear to be a random collection of molecules that are expressed in different parts of human body and exhibit various biological functions. Thus, it is mysterious how these different molecules can all trigger a similar cascade of autoimmune responses. A key, missing mechanism is how non-antigenic self-molecules become autoantigenic non-self. Theoretically, a molecule may change itself in many ways by alternating its chemical composition or biochemical properties. A protein may change by mutation, glycosylation, phosphorylation, methylation, citrullination, or fusion with another protein. While simple compositional or structural changes can certainly generate a huge random mix of altered molecules, it cannot explain how they may induce autoimmune responses, let alone similar ones.
Based on our studies, we have proposed a uniform autoantigenicity mechanism by which autoAgs share a common biochemical property in their affinity to dermatan sulfate (DS), and by which autoAgs forming a molecular complex with DS to induce autoreactive B cell responses [1][2][3][4][5]. DS, a glycosaminoglycan polysaccharide, is expressed most abundantly in the skin and other connective tissues [6], and its expression has been reported to increase during high cellular turnaround, such as wound healing [7][8][9]. It is possible that DS is upregulated to facilitate dead cell clearance and new cell growth for tissue regeneration. We found that, when cells are dying or under stress, they express certain self-molecules to which DS has peculiar affinity [1]. By forming complexes with DS, these self-molecules are transformed from non-antigenic singular molecules to antigenic complexes. Furthermore, these DS-autoAg complexes are capable of stimulating autoreactive CD5 B cell proliferation and differentiation [1], likely via PKC-and PI3Kdependent signaling pathways [10,11].
To relate our findings to clinical utilities, this study examined proteins from HEp-2 cells, the classic substrate in the routine clinical tests for antinuclear autoantibodies (ANAs) [12]. HEp-2 cells were chosen for clinical tests because of their human origin, high mitotic activity, and ability to induce expression of clinically important autoantigens such as mitotic nuclear autoantigens. AutoAbs in patient sera are typically detected by indirect fluorescence staining of HEp-2 cells, and clinical diagnoses are based on about 30 commonly recognizable staining patterns, e.g., nuclear, cytoplasmic, homogenous, or speckled [12]. Despite its clinical popularity, this test lacks molecular details of the autoAg targets, as it is not known exactly which autoAgs are recognized by the autoAbs or whether a single or several autoAgs are recognized. Further tests of specific autoAgs by molecular assays such as ELISA are often performed using proteins extracted from HEp-2 cells or other means. This study was thus pursued to identify the repertoire of auto-Ags produced by HEp-2 cells for development of future molecular clinical tests.

HEp-2 cell culture and protein extraction
HEp-2 cell line was obtained from ATCC (Manassas, VA, USA). HEp-2 cells were cultured in Eagle's Minimum Essential Medium supplemented with 10% fetal bovine serum (ThermoFisher Scientific) and penicillin-streptomycin-glutamine mixture (ThermoFisher Scientific) at 37 °C in 75 cm 2 tissue culture flasks. About 100 million cells were harvested and used for protein extraction. Harvested cells were suspended in 10 mL of 50 mM phosphate buffer (pH 7.4) containing the Roche Complete Mini protease inhibitor cocktail (Sigma Aldrich). Cells were homogenized on ice with a microprobe sonicator until the turbid mixture became nearly clear with no visible cells left. The homogenate was centrifuged at 10,000g at 4 °C for 20 min, and the supernatant was collected as the total protein extract. Protein concentrations were measured with the Bio-Rad RC DC protein assay.

DS-Sepharose resin preparation
Dermatan sulfate (DS) (Sigma-Aldrich) were covalently coupled to EAH Sepharose 4B resins (GE Healthcare) as previously described [2][3][4]. Twenty-mL Sepharose 4B resins were washed with distilled water and then with 0.5 M NaCl solution. The resins were mixed with 100 mg of DS that was pre-dissolved in 10 mL of 0.1 M MES buffer (pH 5.0). The mixture was then added 0.58 g of N-(3-dimethylaminopropyl)-N'-ethylcarbodiimide hydrochloride (Sigma-Aldrich). The coupling reaction was carried out by end-over-end tube rotation at 25 °C for 24 h. After the first 60 min, the pH of the reaction mixture was adjusted to 5.0 with 0.1 M NaOH. After the coupling, the resins were washed three times, first with a low-pH buffer (0.1 M acetate, 0.5 M NaCl, pH 5.0) and then with a high-pH buffer (0.1 M Tris, 0.5 M NaCl, pH 8.0). The DS-Sepharose resins were packed in 10 mM phosphate buffer (pH 7.4) into a C 16/20 FPLC column (GE Healthcare Life Sciences).

DS-affinity fractionation
The total proteins extracted from HEp-2 cells were fractionated in a DS-Sepharose column with a BioLogic Duo-Flow system (Bio-Rad). About 40 mg of proteins in 40 mL of 10 mM phosphate buffer (pH 7.4; buffer A) were loaded into the DS-Sepharose column at a rate of 1 mL/ min. After loading, the column was washed with 60 mL of buffer A to remove excess and non-binding proteins, followed by eluting with 40 mL of 0.2 M NaCl in buffer A to further remove very weakly binding proteins. Proteins with low to high DS-affinity were eluted with sequential 40-mL step gradients of 0.4 M, 0.6 M, and 1.0 M NaCl in buffer A. Fractions were desalted and concentrated to 0.5 mL with 5-kDa cut-off Vivaspin 20 centrifugal filters (Sartorius). The protein concentration of each fraction was measured. Fractionated proteins were further separated by 1-D SDS PAGE using 4-12% NuPAGE Novex Bis-Tris gels with MES running buffer (Invitrogen). Each gel lane was divided into three sections and subjected to protein sequencing.

Mass spectrometry sequencing
Protein sequencing was performed at the Taplin Biological Mass Spectrometry Facility at Harvard Medical School (Boston, MA, USA). Protein gel sections were cut into 1-mm 3 pieces, dehydrated with acetonitrile, and dried in a speed-vac. The gel pieces were rehydrated with 50 mM NH 4 HCO 3 containing 12.5 μg/mL modified sequencing-grade trypsin (Promega) at 4 °C for 45 min. Tryptic peptides were separated on a nano-scale C 18 HPLC capillary column and analyzed after electrospray ionization in an LTQ linear ion-trap mass spectrometer (Thermo Scientific). Peptide sequences and protein identities were assigned by matching the measured fragmentation patterns with protein or translated nucleotide databases using Sequest software. Peptides were required to be fully tryptic peptides with XCorr values of at least 1.5 for 1 + ion, 1.5 for 2 + ion, or 3.0 for 3 + ion. All data were manually inspected. Only proteins with ≥ 2 peptide matches were considered positively identified.

Literature search for autoantigen confirmation
Extensive literature searches with PubMed were carried out to identify whether proteins identified in this study had been previously reported as autoantibody-targeted autoantigens. Keywords in the searches included the protein name and the MeSH term "autoantibodies. " When a particular protein name did not yield any results, alternative protein names were obtained from the Uniprot database and used as keywords for repeated searches. In the case of no results, gene names and alternative gene names were used as additional keywords in searches. When multiple reports for an autoantigen were found, reports with the most relevance or open access and free text were preferably cited in this paper.

Pathway and process enrichment analysis by Metascape
The collection of DS-affinity enriched proteins identified from this study was analyzed with Metascape, a gene annotation and analysis resource [13]. Pathway and process enrichment analysis was carried out with KEGG Pathway, GO Biological Processes, Reactome Gene Sets, Canonical Pathways, CORUM, TRRUST, DisGeNET, and PaGenBase. All genes in the genome were used as the enrichment background. Terms with a p value < 0.01, a minimum count of 3, and an enrichment factor (ratio between the observed counts and the counts expected by chance) > 1.5 were collected and grouped into clusters based on their membership similarities. p-values were calculated based on the accumulative hypergeometric distribution, and q-values were calculated using the Banjamini-Hochberg procedure to account for multiple testings. Kappa scores were used as the similarity metric when performing hierarchical clustering on the enriched terms, and sub-trees with a similarity of > 0.3 were considered a cluster. The most statistically significant term within a cluster was chosen to represent the cluster. Protein-protein interaction enrichment analysis was carried out with BioGrid, InWeb_IM, and OmniPath. The resultant network contained the subset of proteins that form physical interactions with at least one other member in the list. The Molecular Complex Detection (MCODE) algorithm was applied to identify densely connected network components. Pathway and process enrichment analysis was applied to each MCODE component independently, and the three best-scoring terms by p-value have been retained as the functional description of the corresponding components.

Protein-protein association network analysis by STRING
The collection of proteins identified from this study were also analyzed with STRING, a database of known and predicted protein-protein interactions [14]. Specific and meaningful protein-protein associations represent proteins jointly contributing to a shared function, but not necessarily physically binding to each other. The database currently covers 24,584,628 proteins from 5090 organisms. Known interactions are sourced from curated databases (metabolic pathways, protein complexes, signal transduction pathways, etc.), experimental evidence (co-purification, co-crystallization, Yeast2Hybrid, genetic interactions, etc.), gene neighborhoods (groups of genes that are frequently observed in each other's genomic neighborhood), gene fusions (genes that are sometimes fused into single open reading frames), gene co-occurrence (gene families whose occurrence patterns across genomes show similarities), gene text mining (automated, unsupervised searching for proteins that are frequently mentioned together), and co-expression (proteins whose genes are observed to be correlated in expression, across a large number of experiments).

DS-affinity fractionation enriches certain HEp-2 proteins
Proteins were extracted from freshly cultured HEp-2 cells and fractionated with DS-affinity resins. Since DS molecules are poly-anionic, ionic interactions are expected to be a main contributor to DS affinity. We therefore developed an NaCl salt step-gradient method to sequentially dissociate and elute proteins with from DS resins. The identities of proteins in each DS-affinity enriched fraction were obtained by mass spectrometry sequencing.

Proteins with high DS-affinity
From fractions eluted with 1.0 M NaCl from DS-affinity resins, only 7 proteins were identified by mass spectrometry sequencing (Table 1). These include two histone proteins (H4 and H2BE), Scl-70, and Ro/SS-A, all of which are among the most classical nuclear autoantigens (Table 1). Three isoforms of ribosomal proteins are also identified, including 60S ribosomal proteins (L6 and L7) and 40S ribosomal protein S9. Ribosomal proteins are also a class of well-known autoantigens, and heterogenous forms have been reported to be recognize autoantibodies, however, it is not clear exactly which and how many isoforms are autoantigens. L6 and L7 have been reported as autoantigens, whereas S9 awaits further confirmation (Table 1).

DS affinity strongly enriches for autoantigenic proteins
Overall, this study identified a total of 107 proteins from DS-affinity enrichment of HEp-2 cellular extracts (Table 1). Based on current literature, 78 of 107 (72.9%) proteins are confirmed autoantigens. The rest (29 proteins) are potential autoantigens yet to be verified in future studies.

The DS-affinity HEp-2 proteome shows protein network and functional enrichment
To understand the set of DS-affinity-associated proteins identified in this study, we performed various protein network analyses. From protein association network STRING analysis, 106 nodes (proteins, with HSP90AA2 not recognized) gave rise to 330 edges (protein-protein associations) vs. the expected 142 edges (with average node degree of 6.23, and average local clustering coefficient of 0.531, and PPI enrichment p-value of < 1.0e−16). The STRING analysis results indicate that the set of proteins identified from this study have significantly more interactions among themselves than what would be expected of a random set of proteins of similar size drawn from the genome. This insight suggests that DS-affinity proteins may share certain biological functions. While not definitive, it may also indicate that these proteins share functional or biochemical properties (i.e., forming macromolecular charge affinity complexes) and perhaps immunological properties   [2,75] (we have previously shown that DS-autoAg complexes stimulate autoreactive B cells and autoantibody production [1][2][3][4]).
The proteins identified from this study are not randomly distributed, but rather can be classified into 4 clusters, chromosome-binding, RNA-binding, vesicle, and cytoskeleton (Fig. 1a). Based on GO Molecular Function and Cellular Component analysis, nuclear proteins are the most significant group. Of the 107, 82 proteins can be found in the nucleus, including 36 DNA-binding, 7 histone-binding, and 28 RNA-binding proteins. In particular, 17 proteins are expressed in the M phase of cell cycle (Fig. 1b). According to Reactome Pathways, 24 proteins can be involved in cell cycle, with 20 potentially attributable to the mitotic phase and 17 protein attributable to the G2/M check points. In addition to nuclear proteins, another prominent group is related to vesicles/granulates. This group comprises of 48 proteins, including 35 vesicle components, 20 vesicle-mediated transporters, and 12 ribonucleoprotein granule proteins (Fig. 1c). Another prominent group are proteins associated with cytoskeleton organization (32/107).

Discussion
Autoantigens are the key molecules in autoimmunity and autoimmune diseases, as they are the tissue-resident targets of autoimmune responses and autoimmune diseases. The mechanism by which the normally non-antigenic self-molecules become auto-antigenic holds a key to the understanding of autoimmunity. We proposed that dermatan sulfate has particular affinity for autoAgs and that DS can convert self-molecules to autoAgs by forming DS-autoAg molecular complexes to induce immune responses, and hence any self-molecule with affinity to DS would have the potential to be an autoAg [1]. Using this unifying mechanism of autoantigenicity, we have developed a DS-affinity enrichment strategy to identify potential autoAgs in cell lines and animal organs [2][3][4].
In this study, we tested whether this strategy would enable us to identify a profile of known autoantigens and perhaps to uncover unknown potential autoantigens from HEp-2 cells.
HEp-2 cells are the classic substrate in clinical tests of autoantibodies for autoimmune diseases [12]. In the indirect immunofluorescence antibody test, microscope glass slides are coated with HEp-2 cells, and human serum is incubated with the HEp-2 cells to allow serum autoantibodies to react with autoantigens in HEp-2 cells. The binding of autoantibodies is detected by fluorescently tagged anti-human Igs and viewed under a microscope. HEp-2 cells give rise to ~ 30 nuclear and cytoplasmic staining patterns associated with various autoimmune conditions (HEp-2 Image Library of University of Birmingham and [12]).
Nuclear patterns from ANAs (antinuclear antibodies) are the most commonly found, ranging from homogeneous, peripheral and nuclear rim, centromere, nuclear pores, to speckled pattern. A few unique cell cycle specific patterns are only identifiable in dividing and mitotic cells, revealing autoantigens such as those expressed in metaphase centrioles and mitotic spindles. Cytoplasmic staining patterns range from Golgi apparatus, mitochondrial, rods and rings, to uncharacterized patterns. Cytoplasmic fiber staining patterns are typically found in association with actin, vimentin, tubulin, cytokeratin, and tropomyosin, all of which we have found in this study.
Despite its wide utility in clinical autoantibody screening for autoimmune diseases, HEp-2 indirect fluorescence test is significantly limited by the lack of molecular specificity. While some patterns are known to be associated with certain autoantibodies/autoantigens, it remains to be better characterized which and how many autoantigens are involved with each pattern. In clinical diagnosis, a particular staining pattern can appear from patients with different autoimmune diseases. For example, a nuclear homogeneous pattern can be from patients with systemic lupus erythematosus, chronic autoimmune hepatitis, or juvenile idiopathic arthritis, and a fine speckled pattern can be from patients with Sjogren's syndrome, subcutaneous or neonatal lupus erythematosus, or congenital heart block, or other overlap syndrome [12]. For further differentiated diagnoses, samples displaying homogenous patterns are further analyzed by molecular assays such as ELISA for anti-dsDNA, anti-histone, and anti-ENA, whereas samples displaying speckled patterns are further analyzed for anti-ENA, anti-SSA, anti-SSB, and anti-dsDNA.
In this study, we identified various commonly found autoantigens with known associations to different HEp-2 cell staining patterns. For example, histones are mostly involved in homogeneous patterns, and lamin A, B, and C are involved in membranous nuclear rim patterns. Large, variable sized speckles in sponge-like patterns are usually associated with hnRNP, whereas fine speckles are mostly due to SSA/Ro, SSB/La, RNA polymerases and others. Cytoplasmic fiber staining The top four protein-protein interaction networks and components identified by MCODE algorithm of Metascape analysis. Red network is most likely involved in cellular response to heat stress, vesicle-mediated transport, and kinase maturation complex 1. Blue network is most likely involved in PID BARD1 pathway, DHX9-ADAR-vigillin-DNA-PK-Ku antigen complex, and Nop56p-associated pre-rRNA complex. Green network is mostly likely involved in mRNA splicing and C complex spliceosome. Purple network is most likely involved in Nop56p-associated pre-rRNA complex, actin-mediated contraction, and actin filament-based movement patterns are in association with actin, vimentin, tubulin, cytokeratin, and tropomyosin.
In this study, we also identified a number of interesting potential autoantigens. For example, we identified the family of 14-3-3 proteins that have been reported as autoantigens elsewhere (Table 1), but their association with any particular HEp-2 staining pattern has not been described. As another example, mitotic cell cycle-related patterns have thus far remain poorly understood, whereas our study identified 14 proteins could potentially contribute to mitotic cell staining patterns (PCNA, XRCC5, XRCC6, PRKDC, MCM6, MCM3, MCM2, TUBB4B, IQGAP1, and others) (Fig. 1b). As a third example, COPA mutations impair ER-Golgi transport and cause hereditary autoimmune mediated lung disease and arthritis, and four deleterious variants in the COPA gene (encoding coatomer subunit alpha) were identified in families with an apparent Mendelian syndrome of autoimmunity characterized by high-titer autoantibodies [15]. Circulating anti-COPE (coatomer protein complex subunit epsilon) has been identified as a potential marker for cardiovascular and cerebrovascular events in patients with obstructive sleep apnea [16]. Our study identified two members, COPG1 and COPB2, as potential autoantigens (Table 1).
In addition to cellular staining location, this study also revealed possible functional association of auto-Ags identified from DS-affinity. As derived from protein-protein interaction network and pathway analyses, the collection of proteins identified in this study are most likely involved in mRNA metabolism and apoptosis (Fig. 2a). Although the former has no clear role in autoimmunity, apoptosis has established connections to autoimmunity and autoimmune diseases. Apoptosis is well recognized as a source of autoAgs [17]. Our previous study also showed clear evidence that DS is particularly attracted to apoptotic cells and their autoAgs expressed in these cells [1]. In regard to the significant role of apoptosis in autoimmunity, our current findings have reached a consistent conclusion with previous reports by others and us.
It is estimated that there are ~ 20,000 protein-coding genes in the human genome, and ~ 10,000 of these protein-coding genes are expressed in a typical cell. After DS-affinity fractionation, we identified only a small but specific subset 107 proteins (~ 1% of the expressed cellular proteome) from fractions with low to high DS affinity. With 73% of these DS-affinity proteins having verified autoantigenicity, this study demonstrates the powerful feasibility of DS-affinity enrichment for identifying potential autoantigens.

Conclusions
By DS-affinity enrichment of autoantigens from HEp-2 cellular protein extracts, this study identified 107 proteins, with 78 being verified and 29 being potential autoantigens. These proteins are not a random pool, but rather are clustered in the nucleus, vesicles, and cytoskeleton. This set of proteins shows significantly more interactions than random sets of proteins, revealing apoptosis as a prominent underlying process. Results from this study are consistent with our prior work on autoimmunity [1][2][3][4][5] and provide further support for a more general principle of autoantigenicity on how self-molecules become non-self autoAgs, which may help unravel the molecular etiology of autoimmunity and deepen our understanding of autoimmune diseases. The exact association of these proteins with HEp-2 staining patterns and associated diseases merits extensive investigation in future studies. This study also provides many interesting potential autoantigens for future studies.