Re-normalize or in other ways re-smooth raw data
Perform comparative genome analysis involving a direct comparison of in-house generated research or clinical data to public datasets (for example, clustering of patient tumor genomes with TCGA tumor genomes)
In this main open access article
published in october 2013 by APPLIED MATHEMATICS (BIOMATHEMATICS issue http://www.scirp.org/journal/am/ ) we show how our human genome MUST be considedred as a NUMERICAL WHOLE. The idea is now to run this kind of analysis on complete genomes DNA from CANCER CELLS (LOH) at individual chromosomes and whole genome scales.
Allow patients and their doctors to access data about them securely
Longitudinal sequencing: obtain the samples from patients at different time points. For examples, biopsy at diagnosis, pretreatment, post-treatment, and relaps
each tumor contains multiple clones, genomic alterations are linked to these clones. clinical factors such as survival, drug response should be correlated with patients at the clonal level but not whole tumor level
Construct background mutation rate (noise) based on the correlation of mutation frequency and expression levels or replication time. It has been shown that longer replication time and lower expression levels imply higher mutation rates among the genome (http://www.nature.com/nature/journal/v499/n7457/full/nature12213.html). Transcription-coupled DNA repair results in high expression levels and low mutation rate. So I ...more »
Provide access to high resolution images and dilution curves for TCGA expression protein arrays for analysis
Provide access to TCGA microsatellite instability data for analysis
One backend framework for custom analysis tools that can be developed by the community.
Provide CPU time for the community with allocations set by NCI to users AND on a pay per node model.
Please provide a process for the deployment of databases and web applications such as MSKCC cBio, or ISB's Regulome Explorer or GeneSpot.
(You could use Github as a platform for accepting contributions.)
In addition to clinical data, tie in claims data. Test feasibility of using CMS virtual data center in conjunction with the NCI cloud to link data. Other multipayer claims databases may also offer longitudinal claims histories.
Bring in statistical data, particularly from longitudinal studies (NLSY, HRES, NHANES) and those that have collected biospecimens. (develop standardized re-consent form)
Subsets of large data sets should be provided for download to test local tools and for development of pipelines before they are uploaded to the cloud.
Correlate expression data from multiple reporters from multiple subjects with genotyping data