Analysis Priorities

Construct background mutation rate

Construct background mutation rate (noise) based on the correlation of mutation frequency and expression levels or replication time. It has been shown that longer replication time and lower expression levels imply higher mutation rates among the genome (http://www.nature.com/nature/journal/v499/n7457/full/nature12213.html). Transcription-coupled DNA repair results in high expression levels and low mutation rate. So I ...more »

Submitted by

Voting

1 vote
Active

Data Priorities

Longitudinal sequencing

Longitudinal sequencing: obtain the samples from patients at different time points. For examples, biopsy at diagnosis, pretreatment, post-treatment, and relaps

Submitted by

Voting

0 votes
Active

Analysis Priorities

Education and usability

Provide a series of online short videos and short courses that will help users adopt the new tools and instructors to incorporate into courses. (Maybe this is obvious, but high-quality tutorials and case studies take significant time to develop.)

Submitted by

Voting

3 votes
Active

Data Priorities

Provide data subsets for download and tool development

Subsets of large data sets should be provided for download to test local tools and for development of pipelines before they are uploaded to the cloud.

Submitted by

Voting

2 votes
Active

Data Priorities

Correlate genome with claims & statistical data

In addition to clinical data, tie in claims data. Test feasibility of using CMS virtual data center in conjunction with the NCI cloud to link data. Other multipayer claims databases may also offer longitudinal claims histories.

 

Bring in statistical data, particularly from longitudinal studies (NLSY, HRES, NHANES) and those that have collected biospecimens. (develop standardized re-consent form)

Submitted by

Voting

2 votes
Active

Data Priorities

Patient Access

Allow patients and their doctors to access data about them securely

Submitted by

Voting

0 votes
Active

Data Priorities

Access to proteomic data of TCGA samples

Datasets containing the quantitative inventory of proteins in TCGA tumors are beginning to become available. Both mass spectrometry and affinity-based technologies are generating these data. The cloud should provide a means to connect these data to corresponding TCGA data.

Submitted by

Voting

3 votes
Active

Analysis Priorities

Provide GPU computational resources

GPU technologies are rapidly becoming useful for speeding up some workflows by orders of magnitude. It would be useful to have some GPU resources available for cloud computing.

Submitted by

Voting

3 votes
Active

Data Priorities

Connect data with available specimens for follow-up studies

Mining cancer data in the cloud is great, but to enable ongoing research there should be a connection to specimens so researchers can pursue followup studies. This will require storing data about specimens from studies such as TCGA - where they are, how they can be accessed and what consent they are governed by. Just as the data from publications should be made available to allow reproduction of results, so should samples ...more »

Submitted by

Voting

3 votes
Active

Analysis Priorities

Deployment of databases and web applications

Please provide a process for the deployment of databases and web applications such as MSKCC cBio, or ISB's Regulome Explorer or GeneSpot.

 

(You could use Github as a platform for accepting contributions.)

Submitted by

Voting

2 votes
Active

Analysis Priorities

CANCER whole genomes codon populations analysis

In this main open access article

http://fr.scribd.com/doc/169323556/7401586of18september2013

published in october 2013 by APPLIED MATHEMATICS (BIOMATHEMATICS issue http://www.scirp.org/journal/am/ ) we show how our human genome MUST be considedred as a NUMERICAL WHOLE. The idea is now to run this kind of analysis on complete genomes DNA from CANCER CELLS (LOH) at individual chromosomes and whole genome scales.

Submitted by

Voting

0 votes
Active

Analysis Priorities

Support multiple workflow tools and data access mechanisms

Galaxy and GenePattern are examples of systems that could provide access to data sets, pipelines, and publishable, shareable, and reproducible workflows. Ideally, existing familiar and popular platforms such as these would be supported. In addition to improving or enabling interactions between these tools, effort should be directed towards facilitating programmatic access to the underlying data in order to support custom ...more »

Submitted by

Voting

12 votes
Active