CMPD: cancer mutant proteome database Release!

http://cgbc.cgu.edu.tw/cmpd

CMPD is designed for providing a comprehensive, integrated and well-annotated resource, focusing on protein sequence-altering variations originated from both germline and cancer-associated somatic variations. The mutated protein sequence pool was based on the exome sequencing results of NCI-60 cell lines, The Cancer CellLine Encyclopedia (CCLE) and 5,600 more cases from 20 TCGA cancer genomics studies. The identified genetic alterations (SNVs and InDels) were converted to all plausible mutated protein sequences according to each altered transcript. Moreover, CMPD also provides intuitive and flexible cascade architecture to prioritize candidate targets of interest. The mutated protein seqeuneces resulted from SNVs and InDels were predicted and added to the CMPD database, which can be retrieved and used in Mass spectrometry (MS)-based seqeunce database search.

Whole-exome sequencing, which centres on the protein coding regions of disease/cancer associated genes, represents the most cost-effective method to-date for deciphering the association between genetic alterations and diseases. Large-scale whole exome/genome sequencing projects have been launched by various institutions, such as NCI, Broad Institute and TCGA, to provide a comprehensive catalogue of coding variants in diverse tissue samples and cell lines. Further functional and clinical interrogation of these sequence variations must rely on extensive cross-platforms integration of sequencing information and a proteome database that explicitly and comprehensively archives the corresponding mutated peptide sequences. While such data resource is a critical for the mass spectrometry-based proteomic analysis of exomic variants, no database is currently available for the collection of mutant protein sequences that correspond to recent large-scale genomic data. To address this issue and serve as bridge to integrate genomic and proteomics datasets, CMPD (http://cgbc.cgu.edu.tw/cmpd) collected over 2 millions genetic alterations, which not only facilitates the confirmation and examination of potential cancer biomarkers but also provides an invaluable resource for translational medicine research and opportunities to identify mutated proteins encoded by mutated genes.