Provide access to level 1 and 2 data for TCGA copy number array data for analysis
Track data provenance and permissions, including IRB approvals and patient consent and be able to support different levels of permissions rather than insisting on uniform consent
Provide access to BAM files for TCGA miRNA sequencing data for analysis
To stimulate learning as much as possible, as quickly as possible, the data cloud could have a utility where interested parties could pose "crowd-sourcing" challenges, e,g, Kaggle. Indeed, Harold Varmus, NCI & leaders in cancer & genomics could pose the leading questions they would like bright people to take a run at answering, e.g. Hilbert's 23 problems
A sample could be analyzed for DNA sequence variations, structural variations, CNVs, Gene or transcript isoform expression, genome-wide methylation patterns, ChIP-seq for specific transcription factors, metabolomic or proteomic analysis, and other molecular profiles. A framework that allows a researcher to readily identify all molecular data types associated with a particular sample and integrate the results of such analyses ...more »
Correlate expression data from multiple reporters from multiple subjects with genotyping data
Include ENCODE datasets from both normal and cancer cell lines
Mining cancer data in the cloud is great, but to enable ongoing research there should be a connection to specimens so researchers can pursue followup studies. This will require storing data about specimens from studies such as TCGA - where they are, how they can be accessed and what consent they are governed by. Just as the data from publications should be made available to allow reproduction of results, so should samples ...more »
GPU technologies are rapidly becoming useful for speeding up some workflows by orders of magnitude. It would be useful to have some GPU resources available for cloud computing.
Datasets containing the quantitative inventory of proteins in TCGA tumors are beginning to become available. Both mass spectrometry and affinity-based technologies are generating these data. The cloud should provide a means to connect these data to corresponding TCGA data.
Provide a series of online short videos and short courses that will help users adopt the new tools and instructors to incorporate into courses. (Maybe this is obvious, but high-quality tutorials and case studies take significant time to develop.)
Provide access to high resolution images and dilution curves for TCGA expression protein arrays for analysis
Provide access to TCGA microsatellite instability data for analysis
One backend framework for custom analysis tools that can be developed by the community.
Provide CPU time for the community with allocations set by NCI to users AND on a pay per node model.