Re-normalize or in other ways re-smooth raw data
Perform comparative genome analysis involving a direct comparison of in-house generated research or clinical data to public datasets (for example, clustering of patient tumor genomes with TCGA tumor genomes)
In this main open access article
published in october 2013 by APPLIED MATHEMATICS (BIOMATHEMATICS issue http://www.scirp.org/journal/am/ ) we show how our human genome MUST be considedred as a NUMERICAL WHOLE. The idea is now to run this kind of analysis on complete genomes DNA from CANCER CELLS (LOH) at individual chromosomes and whole genome scales.
each tumor contains multiple clones, genomic alterations are linked to these clones. clinical factors such as survival, drug response should be correlated with patients at the clonal level but not whole tumor level
Construct background mutation rate (noise) based on the correlation of mutation frequency and expression levels or replication time. It has been shown that longer replication time and lower expression levels imply higher mutation rates among the genome (http://www.nature.com/nature/journal/v499/n7457/full/nature12213.html). Transcription-coupled DNA repair results in high expression levels and low mutation rate. So I ...more »
One backend framework for custom analysis tools that can be developed by the community.
Provide CPU time for the community with allocations set by NCI to users AND on a pay per node model.
Please provide a process for the deployment of databases and web applications such as MSKCC cBio, or ISB's Regulome Explorer or GeneSpot.
(You could use Github as a platform for accepting contributions.)
Correlate expression data from multiple reporters from multiple subjects with genotyping data
GPU technologies are rapidly becoming useful for speeding up some workflows by orders of magnitude. It would be useful to have some GPU resources available for cloud computing.
Provide a series of online short videos and short courses that will help users adopt the new tools and instructors to incorporate into courses. (Maybe this is obvious, but high-quality tutorials and case studies take significant time to develop.)
A sample could be analyzed for DNA sequence variations, structural variations, CNVs, Gene or transcript isoform expression, genome-wide methylation patterns, ChIP-seq for specific transcription factors, metabolomic or proteomic analysis, and other molecular profiles. A framework that allows a researcher to readily identify all molecular data types associated with a particular sample and integrate the results of such analyses ...more »
To stimulate learning as much as possible, as quickly as possible, the data cloud could have a utility where interested parties could pose "crowd-sourcing" challenges, e,g, Kaggle. Indeed, Harold Varmus, NCI & leaders in cancer & genomics could pose the leading questions they would like bright people to take a run at answering, e.g. Hilbert's 23 problems
Track data provenance and permissions, including IRB approvals and patient consent and be able to support different levels of permissions rather than insisting on uniform consent
Realign sequencing data sets to a common genome version