Data Priorities

Correlate genome with claims & statistical data

In addition to clinical data, tie in claims data. Test feasibility of using CMS virtual data center in conjunction with the NCI cloud to link data. Other multipayer claims databases may also offer longitudinal claims histories.

 

Bring in statistical data, particularly from longitudinal studies (NLSY, HRES, NHANES) and those that have collected biospecimens. (develop standardized re-consent form)

Submitted by

Voting

2 votes
Active

Data Priorities

Connect data with available specimens for follow-up studies

Mining cancer data in the cloud is great, but to enable ongoing research there should be a connection to specimens so researchers can pursue followup studies. This will require storing data about specimens from studies such as TCGA - where they are, how they can be accessed and what consent they are governed by. Just as the data from publications should be made available to allow reproduction of results, so should samples ...more »

Submitted by

Voting

3 votes
Active

Data Priorities

Access to proteomic data of TCGA samples

Datasets containing the quantitative inventory of proteins in TCGA tumors are beginning to become available. Both mass spectrometry and affinity-based technologies are generating these data. The cloud should provide a means to connect these data to corresponding TCGA data.

Submitted by

Voting

3 votes
Active

Analysis Priorities

Education and usability

Provide a series of online short videos and short courses that will help users adopt the new tools and instructors to incorporate into courses. (Maybe this is obvious, but high-quality tutorials and case studies take significant time to develop.)

Submitted by

Voting

3 votes
Active

Analysis Priorities

Integrative Analysis of molecular datatypes for a given sample

A sample could be analyzed for DNA sequence variations, structural variations, CNVs, Gene or transcript isoform expression, genome-wide methylation patterns, ChIP-seq for specific transcription factors, metabolomic or proteomic analysis, and other molecular profiles. A framework that allows a researcher to readily identify all molecular data types associated with a particular sample and integrate the results of such analyses ...more »

Submitted by

Voting

5 votes
Active

Analysis Priorities

Actively support crowd-sourcing challenges

To stimulate learning as much as possible, as quickly as possible, the data cloud could have a utility where interested parties could pose "crowd-sourcing" challenges, e,g, Kaggle. Indeed, Harold Varmus, NCI & leaders in cancer & genomics could pose the leading questions they would like bright people to take a run at answering, e.g. Hilbert's 23 problems

Submitted by

Voting

7 votes
Active