Store results in the cloud (e.g. variant file) with methodologies documented and workflow available in a workflow management system so the analyses can be reproduced with other data
Track data provenance and analysis methods at a level that that supports reproducible research
Construct analysis pipelines that may be reused and shared with other investigators
Analyze whole genome datasets from patients in conjunction with clinical data
Perform analysis in the cloud using custom (locally developed) analysis tools
Provide access to level 1 and 2 data for TCGA expression array data for analysis
Provide access to TCGA tissue slide images for analysis
Investigate the correlation between expression level and mutation profile of a gene list of interest with patient survival.
Link genomic analysis results with “external” data such as molecular interaction networks, drug-target associations, or semantic associations extracted from the literature
Galaxy and GenePattern are examples of systems that could provide access to data sets, pipelines, and publishable, shareable, and reproducible workflows. Ideally, existing familiar and popular platforms such as these would be supported. In addition to improving or enabling interactions between these tools, effort should be directed towards facilitating programmatic access to the underlying data in order to support custom... more »
Most current approaches for BigData analysis involve moving data to a server, HPC infrastructure or cloud where the software tools and reference databases are pre-configured. This is inefficient since this approach requires making redundant copies of data each time and additional costs/time associated with moving data back and forth. Since there is no single tool or workflow to analyze genomic data, multiple copies... more »
Analyze exomic sequence of paired tumor and normal samples, including variant calls
Provide access to level 1 and 2 data for TCGA DNA methylation data for analysis
For those of us who don't want to use the cloud workflow, please make it so we still have access to all the raw data. Please don't lock us into your analytic approaches!
Realign sequencing data sets to a common genome version