Site Map
Related Tools
Myrna: Cloud, differential gene expression |
Related Publications
- Frazee AC, Langmead B, Leek JT. ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets. BMC Bioinformatics 12:449
Other Documentation
- VLDS poster, 6/11 (coming soon)
Authors
Links
ReCount is an online resource consisting of RNA-seq gene count datasets built using the raw data from 18 different studies. The raw sequencing data (.fastq files) were processed with Myrna to obtain tables of counts for each gene. For ease of statistical analysis, we combined each count table with sample phenotype data to form an R object of class ExpressionSet. The count tables, ExpressionSets, and phenotype tables are ready to use and freely available here. By taking care of several preprocessing steps and combining many datasets into one easily-accessible website, we make finding and analyzing RNA-seq data considerably more straightforward.
All columns of the table below are sortable: clicking on the column title will alphebetize or order the column (keeping the rows properly aligned). The columns are as follows:
Study
With a few exceptions, the datasets are named for the first author of the paper from which the .fastq files were obtained. The Katz paper contained both mouse and human reads, so two separate datasets were created. The "maqc" dataset was built from reads obtained from the MicroArray Quality Control Project. The "modencodeworm" and "modencodefly" datasets were generated using reads from papers associated with the modENCODE Consortium.
PMID
Papers from which we collected the .fastq files are accessible via the given clickable PubMed ID.
Species
The species of the samples under study.
Number of biological replicates
The number of distinct biological replicates included in the dataset. Gene counts from technical replicates were pooled. The number of technical replicates pooled to give counts for each biological replicate is available in the ExpressionSet and phenotype table for each dataset.
Number of uniquely aligned reads
In each Myrna run, some reads were discarded because they did not align, and some reads aligned repetitively and were therefore discarded. The count in this column is the number of reads not discarded. *Note that for Montgomery and Pickrell, read counts were for both datasets combined, since both were analyzed with the same Myrna run.
ExpressionSet
Click "link" to download an .RData file containing the gene count table and phenotype data in an ExpressionSet. When the R object is loaded into the workspace, the ExpressionSet will be named study.eset, where "study" is replaced with the dataset name given in the first column of the table. To use the ExpressionSets, you will need to install Bioconductor and run the command library(Biobase). For some preliminary information on using ExpressionSets, please click here.
Count Table
Click "link" to download a .txt file containing the raw gene counts output by Myrna.
Phenotype Table
Click "link" to download a .txt file containing phenotype information for each sample in the count table. Each phenotype table contains a sample.id column and a num.tech.reps column, where sample.id is either the HapMap ID of the sample (if applicable) or the SRX number of the sample, which can be used to search for the sample in NCBI's Sequence Read Archive (SRA). The num.tech.reps column tells how many technical replicates were pooled to obtain gene counts for that sample.
Notes
Brief description of experiment.Please note that to use the ExpressionSets below, you will need to install Bioconductor and run the command library(Biobase)
The Datasets
Study | PMID | Species | Number of biological replicates | Number of uniquely aligned reads | ExpressionSet | Count table | Phenotype table | Notes |
bodymap | not published, but publicly available here | human | 19 | 2,197,622,796 | link | link | link | Illumina Human BodyMap 2.0 -- tissue comparison |
cheung | 20856902 | human | 41 | 834,584,950 | link | link | link | HapMap - CEU |
core | 19056941 | human | 2 | 8,670,342 | link | link | link | lung fibroblasts |
gilad | 20009012 | human | 6 | 41,356,738 | link | link | link | liver; males and femlaes |
maqc | 20167110 | human | 14 (technical)** 2 (biological) |
71,970,164 | original pooled |
original pooled |
original pooled |
experiment: MAQC-2 |
montgomery | 20220756 | human | 60 | *886,468,054 | link | link | link | HapMap - CEU |
pickrell | 20220758 | human | 69 | *886,468,054 | link | link | link | HapMap - YRI |
sultan | 18599741 | human | 4 | 6,573,643 | link | link | link | cell type comparison |
wang | 18978772 | human | 22 | 223,929,919 | link | link | link | tissue comparison |
katz.mouse | 21057496 | mouse | 4 | 14,368,471 | link | link | link | control vs. CUG-BP1 knockdown myoblasts |
mortazavi | 18516045 | mouse | 3 | 61,732,881 | link | link | link | tissue comparison |
trapnell | 20436464 | mouse | 4 | 111,376,152 | link | link | link | time course |
yang | 20363980 | mouse | 1 | 27,883,862 | link | link | link | hybrid cell line, X always inactive |
bottomly | 21455293 | mouse | 21 | 343,445,340 | link | link | link | 2 inbred mouse strains |
nagalakshmi | 18451266 | yeast | 4 | 7,688,602 | link | link | link | priming technique comparison |
hammer | 20452967 | rat | 8 | 158,178,477 | link | link | link | experimental vs. control at 2 time points |
modencodeworm | 19181841 | worm | 46 | 1,451,119,823 | link | link | link | developmental time course |
modencodefly | 21179090 | fly | 147 (technical)** 30 (biological) |
2,278,788,557 | original pooled |
original pooled |
original pooled |
developmental time course |
**These studies originally contained tables with unpooled technical replicates. The unpooled tables are available under the "original" links, while tables with pooled technical replicates are available under the "pooled" links.
Datasets created without truncation
The count tables and ExpressionSets in the above table were created by truncating all reads longer than 35bp to 35bp. Count tables and ExpressionSets created without truncation are available for download here.Ensembl 61 gene information
Below are links to files containing information about the genes in Ensembl 61, the version used in creating these datasets. (These are the genes.txt files from the Ensembl 61 Myrna reference jar.)Manifest Files
Below are links to Myrna manifest files used to create the count tables with Myrna.
- BodyMap
- Cheung
- Core
- Gilad
- Katz - human
- MAQC (links to locally stored files)
- Montgomery/Pickrell (count tables created with same Myrna run)
- Sultan
- Wang
- Katz - mouse
- Mortazavi
- Trapnell
- Yang
- Bottomly
- Nagalakshmi
- Hammer
- modENCODE - worm
- modENCODE - fly
Getting Started with ExpressionSets
Please click here for a few R commands that are useful when working with ExpressionSets.Code Used
Commands passed to MyrnaR code used to create ExpressionSets (requires Bioconductor and additional files)
R code used in the "example applications" section of the paper (requires Bioconductor)