|09-558 Project Manager: D. C. Jones|
COMPARATIVE EVOLUTIONARY PROTEOMICS OF COTTON
Jonathan Wendel, Iowa State University
Polyploidization is a phenomenon prevalent in higher plants, and all angiosperm species have undergone at least two rounds of polyploidization during their evolutionary history. Within the last decade it has been appreciated that polyploidization often results in dramatic and large-scale responses (Wendel & Doyle, 2004; Comai, 2005), including structural and epigenetic modifications (Shaked et al., 2001; Gaeta et al., 2007; Buggs et al., 2009; Ha et al., 2009; Schnable et al., 2011) as well as changes in gene expression (Wang et al., 2006; Bottley & Koebner, 2008; Flagel et al., 2008; Hovav et al., 2008; Flagel et al., 2009; Rapp et al., 2009; Flagel & Wendel, 2010; Koh et al., 2010; Flagel et al., 2012). Notwithstanding these profound genomic and transcriptomic consequences have been extensively studied in many polyploid plants including cotton, little is known regarding the translational gene products, i.e., the proteome as direct executor of cellular biochemical reactions and physiological activities. Because protein levels are influenced by post-translational modification and inherent variation in stability, it is difficult to infer the representation and activity of proteins as well as their participating metabolic pathways from transcriptomic data alone. Therefore, our project aims to extend the frontier of evolutionary analyses of polyploidy into the realm of proteomics.
The cotton genus Gossypium includes 45 diploid species divided into 8 genome groups (A-G, K), and 5 allotetraploid species (AD), the latter originating from a polyploidy event between an A-genome diploid and a D-genome diploid species 1-2 million years ago. Among the allotetraploid species, G. hirsutum (the source of Upland cotton) and G. barbadense (the source of Pima cotton) were independently domesticated approximately 5000 years ago (Wendel & Cronn, 2003). This well-documented evolutionary framework, coupled with the substantial genomic and transcriptomic resources available, makes Gossypium an excellent system to extend research on evolutionary processes to the proteomic level. Given the agronomic and economic importance of cotton, the proteomic profiling of key developmental processes, such as seed filling and fiber growth, provides valuable baseline data for crop improvement.
As originally described, our specific research objectives are: 1) to develop technology and tools for describing and studying the cotton fiber and seed proteome, 2) to begin to describe the cotton proteome from the standpoint of fiber development, and how this correlates with existing information on the transcriptome, 3) to understand how the proteome responds to genome doubling; that is, what is novel about polyploid cotton fiber and seed relative to that of its antecedent diploids, and 4) to detail the proteomic consequences of cotton fiber evolution and domestication.
To accomplish these objectives, our proteomic analyses include 1) protein profiling of fiber development using wild and domesticated G. hirsutum; 2) protein profiling of fiber development using wild and domesticated G. barbadense; 3) comparative proteomic analysis of allopolyploid cotton (AD genom) relative to the two diploid progenitor G. arboreum (A genome) and G. raimondii (D genome). Using two complementary approaches - two-dimensional gel electrophoresis (2-DE) and Isobaric Tag for Relative and Absolute Quantitation (iTRAQ), a comprehensive research platform were established to conduct proteomic experiments as shown below. Until 2012, we have completed all iTRAQ and most the 2-DE experiments for all analyses, and continued to focus on data analyses and preparing manuscripts.
Two allotetraploid species – G. hirsutum and G. barbadense were independently domesticated for longer, finer, stronger and whiter seed fibers by ancient human cultures, whose cultivars nowadays account for approximately 99% of cotton production around the world. Understanding the molecular basis of their morphological and physiological transformations in parallel has dual relevance to evolutionary biology and crop improvement.
As previously reported, proteomic profiling of fiber development in wild versus domesticated G. hirsutum were conducted through performing 2-DE analysis with twenty-four gels (2 accessions X 4 developmental stages X 3 biological replicates) and triplicated 8-channel iTRAQ coupled with LC-MS/MS analyses. In order to collect comprehensive spectra data with high quality, iTRAQ labeled peptide mixtures were successively submitted to quadrupole time-of-flight QTRAR XL, QSTAR Elite and Thermo LTQ Orbitrap MS systems from 2011 to April 2012. Multiple proteomic analysis programs include Mascot, Proteome Discoverer, ProteoIQ, Scafflod Q+S and ProteinPilot were tested and compared using collected data, and ProteinPilot was chose for best results in identifying proteins and extracting quantitative signals. With following data integration and statistical analyses, the results are still undergoing processing and interpretation, which together with 2-DE results, are expected to be described in a manuscript titled "2-DE and iTRAQ based comparative proteomics of fiber domestication in G. hirsutum".
In parallel, the comparative profiling of fiber proteomes in wild and domesticated G. barbadense were conducted in a similar manner as the iTRAQ analyses for G. hirsutum. Three biological replicates of fiber proteins from a elite cultivar Pima-S7 and a wild accession K101 over a developmental time course (5, 10, 20 and 25 dpa) were independently labeled, pooled and subjected to tandem MS analyses, followed by a thorough search of the spectra data considering amino acid substitutions and post-translational modifications. Using an in-house inclusive database comprising NCBI angiosperm protein sequences and 6-frame translations of various Gossypium EST/contig assemblies, in 2011 we reported a total of 62,769 spectra and 2159 protein accessions identified with above 95% confidence level at approximately 1% false discovery rate. With access to the recently released D-genome diploid G. raimondii genome sequence and A-genome G. arboreum RNA-seq data (Paterson et al., 2012), we constructed a Gossypium diploid protein database to re-submit spectra data for identification. Our new Gossypium diploid protein database allowed a detection of 22% more spectra assigned better identification of non-redundant proteins. Comparison of the identified protein lists also suggested that use of Gossypium diploid protein database would resolve cotton fiber proteome more comprehensively and lead to accurate functional and quantitative analyses. Therefore, our following analyses and results were reported based upon the Gossypium diploid protein database. Functional categorization based on PANTHER protein classification system (Mi et al., 2010) revealed that the identified fiber proteome represented most the functional protein classes encoded in Gossypium genome (Paterson et al., 2012), except categories like extracellular protein and viral protein.
Along with protein identification, a total of 960 proteins were quantitatively profiled through fiber development in wild versus domesticated G. barbadense. As shown in Figure 2, protein expressions were compared between adjacent developmental stages within each accession, as well as between wild and domesticated accession at each developmental stage. Overall, the domesticated Pima S-7 displayed a lower level of expressional variation (151, 15.7%) than did the wild K101 accession (198, 20.6%) during fiber development. Altered by domestication the greater number of differentially expressed proteins occurs during fiber initiation (5 dpa, 122 proteins), followed by less expression changes at stages of elongation (10 dpa, 76 protein) and transition to secondary cell wall synthesis (20 dpa, 64 protein). By 25 dpa, the number of differentially expressed proteins rises to 104 proteins, which suggested a larger number of proteins associated with secondary cell wall synthesis.
Together with other completed analyses including homoeologous protein expressions, post-translational modifications and co-analysis with transcriptomic data, the G. barbadense results were described in details in our manuscript titled "Proteomic Profiling of Developing Cotton Fibers from Wild and Domesticated Gossypium barbadense" in preparation.
With the baseline data generated from fiber proteomes in G. hirsutum and G. barbadense, we designed comparative analyses of fiber proteomes using two developmental stages (10 and 20 dpa) of AD-genome species with respect to their diploid progenitor A and D genomes. Using the same biologically replicated protein samples, most of the twenty-four 2-DE gels (4 genomes X 2 developmental stages X 3 biological replicates) have been completed, and all iTRAQ experiments have been finished with three times more spectra data generated compared to the G. barbadense project. With this large dataset and its future results, we aim to provide a thorough description of cotton fiber proteomes corresponding to genome merger and doubling, thereby expanding our understanding of the molecular basis and consequences of polyploidization.
In addition to fiber proteomes, our project utilizes cotton seeds as another independent system to study protein accumulations in the context of polyploidy. In 2011, our research article "Genomically biased accumulation of seed storage proteins in allopolyploid cotton" was published to demonstrate a biased expression pattern of D-genome derived proteins in the allopolyploid AD genome evident in mature seeds. To better understand proteomic consequences of polyploidization on cotton seeds, we included developing seed tissues through the seed filling process (10, 20, 30 and 40 dpa) for proteomic profiling. Considering it is difficult to design and conduct 2-DE or iTRAQ experiments for quantitatively analyzing thirty-six samples (3 genomes X 4 developmental stages X 3 biological replicates), we pre-fractionated proteins from each sample by SDS-PAGE into multiple segments, and subject the in-gel trypsin digested peptides eluted from each segment to LC-MS/MS analysis. Based upon MS/MS spectral counting of detected peptides from each sample, proteins can be quantified for comparing expression levels (Demartini et al., 2011). In 2012, we initiated and completed all protein experiments and LC-MS/MS runs for this analysis, and a total of 3290 proteins were identified from the Gosspium developing seed proteomes.
This year we continued to explore the cotton seed and fiber proteomes to answer the question how they have been shaped by natural evolutionary history and human-mediated selection during crop improvement. For all major components proposed in this project, we have completed all protein experiments and mass spectrometry analyses, except a few 2-DE gels expected to be finished in early 2013. Our work has been focused on data analyses and writing manuscripts describing the qualitative and quantitative protein profiles in developing fibers and seeds, as well as characterizing post-translational modifications and other proteomic features with respect to cotton evolution, which are to be completed in 2013.
|Project Year: 2012|
Search 2012 Projects:|
▸ New Mexico
▸ North Carolina
▸ South Carolina
▸ Cotton Incorporated Fellow
▸ Crop Improvement
▸ Production Efficiency
▸ Sustainable Cotton
▸ Variety Improvement
▸ All Project Nos.