ReCount: analysis-ready RNA-seq gene count datasets

Site Map

Related Tools

Myrna: Cloud, differential gene expression

Related Publications

Frazee AC, Langmead B, Leek JT. ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets. BMC Bioinformatics 12:449

Authors

Links

ReCount is an online resource consisting of RNA-seq gene count datasets built using the raw data from 18 different studies. The raw sequencing data (.fastq files) were processed with Myrna to obtain tables of counts for each gene. For ease of statistical analysis, we combined each count table with sample phenotype data to form an R object of class ExpressionSet. The count tables, ExpressionSets, and phenotype tables are ready to use and freely available here. By taking care of several preprocessing steps and combining many datasets into one easily-accessible website, we make finding and analyzing RNA-seq data considerably more straightforward.

All columns of the table below are sortable: clicking on the column title will alphebetize or order the column (keeping the rows properly aligned). The columns are as follows:

Study

With a few exceptions, the datasets are named for the first author of the paper from which the .fastq files were obtained. The Katz paper contained both mouse and human reads, so two separate datasets were created. The "maqc" dataset was built from reads obtained from the MicroArray Quality Control Project. The "modencodeworm" and "modencodefly" datasets were generated using reads from papers associated with the modENCODE Consortium.

PMID

Papers from which we collected the .fastq files are accessible via the given clickable PubMed ID.

Species

The species of the samples under study.

Number of biological replicates

The number of distinct biological replicates included in the dataset. Gene counts from technical replicates were pooled. The number of technical replicates pooled to give counts for each biological replicate is available in the ExpressionSet and phenotype table for each dataset.

Number of uniquely aligned reads

In each Myrna run, some reads were discarded because they did not align, and some reads aligned repetitively and were therefore discarded. The count in this column is the number of reads not discarded. *Note that for Montgomery and Pickrell, read counts were for both datasets combined, since both were analyzed with the same Myrna run.

ExpressionSet

Click "link" to download an .RData file containing the gene count table and phenotype data in an ExpressionSet. When the R object is loaded into the workspace, the ExpressionSet will be named study.eset, where "study" is replaced with the dataset name given in the first column of the table. To use the ExpressionSets, you will need to install Bioconductor and run the command library(Biobase). For some preliminary information on using ExpressionSets, please click here.

Count Table

Click "link" to download a .txt file containing the raw gene counts output by Myrna.

Phenotype Table

Click "link" to download a .txt file containing phenotype information for each sample in the count table. Each phenotype table contains a sample.id column and a num.tech.reps column, where sample.id is either the HapMap ID of the sample (if applicable) or the SRX number of the sample, which can be used to search for the sample in NCBI's Sequence Read Archive (SRA). The num.tech.reps column tells how many technical replicates were pooled to obtain gene counts for that sample.

Notes

Brief description of experiment.

Please note that to use the ExpressionSets below, you will need to install Bioconductor and run the command library(Biobase)

The Datasets

Study	PMID	Species	Number of biological replicates	Number of uniquely aligned reads	ExpressionSet	Count table	Phenotype table	Notes
bodymap	not published, but publicly available here	human	19	2,197,622,796	link	link	link	Illumina Human BodyMap 2.0 -- tissue comparison
cheung	20856902	human	41	834,584,950	link	link	link	HapMap - CEU
core	19056941	human	2	8,670,342	link	link	link	lung fibroblasts
gilad	20009012	human	6	41,356,738	link	link	link	liver; males and femlaes
maqc	20167110	human	14 (technical)** 2 (biological)	71,970,164	original pooled	original pooled	original pooled	experiment: MAQC-2
montgomery	20220756	human	60	*886,468,054	link	link	link	HapMap - CEU
pickrell	20220758	human	69	*886,468,054	link	link	link	HapMap - YRI
sultan	18599741	human	4	6,573,643	link	link	link	cell type comparison
wang	18978772	human	22	223,929,919	link	link	link	tissue comparison
katz.mouse	21057496	mouse	4	14,368,471	link	link	link	control vs. CUG-BP1 knockdown myoblasts
mortazavi	18516045	mouse	3	61,732,881	link	link	link	tissue comparison
trapnell	20436464	mouse	4	111,376,152	link	link	link	time course
yang	20363980	mouse	1	27,883,862	link	link	link	hybrid cell line, X always inactive
bottomly	21455293	mouse	21	343,445,340	link	link	link	2 inbred mouse strains
nagalakshmi	18451266	yeast	4	7,688,602	link	link	link	priming technique comparison
hammer	20452967	rat	8	158,178,477	link	link	link	experimental vs. control at 2 time points
modencodeworm	19181841	worm	46	1,451,119,823	link	link	link	developmental time course
modencodefly	21179090	fly	147 (technical)** 30 (biological)	2,278,788,557	original pooled	original pooled	original pooled	developmental time course

*Montgomery and Pickrell read counts are for both datasets combined.
**These studies originally contained tables with unpooled technical replicates. The unpooled tables are available under the "original" links, while tables with pooled technical replicates are available under the "pooled" links.

Datasets created without truncation

The count tables and ExpressionSets in the above table were created by truncating all reads longer than 35bp to 35bp. Count tables and ExpressionSets created without truncation are available for download here.

Ensembl 61 gene information

Below are links to files containing information about the genes in Ensembl 61, the version used in creating these datasets. (These are the genes.txt files from the Ensembl 61 Myrna reference jar.)

Manifest Files

Below are links to Myrna manifest files used to create the count tables with Myrna.

BodyMap
Cheung
Core
Gilad
Katz - human
MAQC (links to locally stored files)
Montgomery/Pickrell (count tables created with same Myrna run)
Sultan
Wang
Katz - mouse
Mortazavi
Trapnell
Yang
Bottomly
Nagalakshmi
Hammer
modENCODE - worm
modENCODE - fly

Getting Started with ExpressionSets

Please click here for a few R commands that are useful when working with ExpressionSets.

Code Used

Commands passed to Myrna
R code used to create ExpressionSets (requires Bioconductor and additional files)
R code used in the "example applications" section of the paper (requires Bioconductor)

ReCount

A multi-experiment resource of analysis-ready RNA-seq gene count datasets