ReCount

A multi-experiment resource of analysis-ready RNA-seq gene count datasets

  

There is now a new version of recount that provides processed and summarized expression data for nearly 60,000 human RNA-seq samples from the Sequence Read Archive (SRA). The associated Bioconductor package provides a convenient API for querying, downloading, and analyzing the data. Each processed study consists of meta- and phenotype data, the expression levels of genes and their underlying exons and splice junctions, and corresponding genomic annotation. See our preprint for details.

This website corresponds to the older resource described in our 2011 paper

Site Map

Related Tools

    Myrna: Cloud, differential gene expression

Related Publications

Other Documentation

  • VLDS poster, 6/11 (coming soon)

Authors

Links

ReCount is an online resource consisting of RNA-seq gene count datasets built using the raw data from 18 different studies. The raw sequencing data (.fastq files) were processed with Myrna to obtain tables of counts for each gene. For ease of statistical analysis, we combined each count table with sample phenotype data to form an R object of class ExpressionSet. The count tables, ExpressionSets, and phenotype tables are ready to use and freely available here. By taking care of several preprocessing steps and combining many datasets into one easily-accessible website, we make finding and analyzing RNA-seq data considerably more straightforward.

All columns of the table below are sortable: clicking on the column title will alphebetize or order the column (keeping the rows properly aligned). The columns are as follows:

Study

With a few exceptions, the datasets are named for the first author of the paper from which the .fastq files were obtained. The Katz paper contained both mouse and human reads, so two separate datasets were created. The "maqc" dataset was built from reads obtained from the MicroArray Quality Control Project. The "modencodeworm" and "modencodefly" datasets were generated using reads from papers associated with the modENCODE Consortium.

PMID

Papers from which we collected the .fastq files are accessible via the given clickable PubMed ID.

Species

The species of the samples under study.

Number of biological replicates

The number of distinct biological replicates included in the dataset. Gene counts from technical replicates were pooled. The number of technical replicates pooled to give counts for each biological replicate is available in the ExpressionSet and phenotype table for each dataset.

Number of uniquely aligned reads

In each Myrna run, some reads were discarded because they did not align, and some reads aligned repetitively and were therefore discarded. The count in this column is the number of reads not discarded. *Note that for Montgomery and Pickrell, read counts were for both datasets combined, since both were analyzed with the same Myrna run.

ExpressionSet

Click "link" to download an .RData file containing the gene count table and phenotype data in an ExpressionSet. When the R object is loaded into the workspace, the ExpressionSet will be named study.eset, where "study" is replaced with the dataset name given in the first column of the table. To use the ExpressionSets, you will need to install Bioconductor and run the command library(Biobase). For some preliminary information on using ExpressionSets, please click here.

Count Table

Click "link" to download a .txt file containing the raw gene counts output by Myrna.

Phenotype Table

Click "link" to download a .txt file containing phenotype information for each sample in the count table. Each phenotype table contains a sample.id column and a num.tech.reps column, where sample.id is either the HapMap ID of the sample (if applicable) or the SRX number of the sample, which can be used to search for the sample in NCBI's Sequence Read Archive (SRA). The num.tech.reps column tells how many technical replicates were pooled to obtain gene counts for that sample.

Notes

Brief description of experiment.

Please note that to use the ExpressionSets below, you will need to install Bioconductor and run the command library(Biobase)

The Datasets

Study PMID Species Number of biological replicates Number of uniquely aligned reads ExpressionSet Count table Phenotype table Notes
bodymap not published, but publicly available here human 19 2,197,622,796 link link link Illumina Human BodyMap 2.0 -- tissue comparison
cheung 20856902 human 41 834,584,950 link link link HapMap - CEU
core 19056941 human 2 8,670,342 link link link lung fibroblasts
gilad 20009012 human 6 41,356,738 link link link liver; males and femlaes
maqc 20167110 human 14 (technical)**
2 (biological)
71,970,164 original
pooled
original
pooled
original
pooled
experiment: MAQC-2
montgomery 20220756 human 60 *886,468,054 link link link HapMap - CEU
pickrell 20220758 human 69 *886,468,054 link link link HapMap - YRI
sultan 18599741 human 4 6,573,643 link link link cell type comparison
wang 18978772 human 22 223,929,919 link link link tissue comparison
katz.mouse 21057496 mouse 4 14,368,471 link link link control vs. CUG-BP1 knockdown myoblasts
mortazavi 18516045 mouse 3 61,732,881 link link link tissue comparison
trapnell 20436464 mouse 4 111,376,152 link link link time course
yang 20363980 mouse 1 27,883,862 link link link hybrid cell line, X always inactive
bottomly 21455293 mouse 21 343,445,340 link link link 2 inbred mouse strains
nagalakshmi 18451266 yeast 4 7,688,602 link link link priming technique comparison
hammer 20452967 rat 8 158,178,477 link link link experimental vs. control at 2 time points
modencodeworm 19181841 worm 46 1,451,119,823 link link link developmental time course
modencodefly 21179090 fly 147 (technical)**
30 (biological)
2,278,788,557 original
pooled
original
pooled
original
pooled
developmental time course
*Montgomery and Pickrell read counts are for both datasets combined.
**These studies originally contained tables with unpooled technical replicates. The unpooled tables are available under the "original" links, while tables with pooled technical replicates are available under the "pooled" links.

Datasets created without truncation

The count tables and ExpressionSets in the above table were created by truncating all reads longer than 35bp to 35bp. Count tables and ExpressionSets created without truncation are available for download here.

Ensembl 61 gene information

Below are links to files containing information about the genes in Ensembl 61, the version used in creating these datasets. (These are the genes.txt files from the Ensembl 61 Myrna reference jar.)

Manifest Files

Below are links to Myrna manifest files used to create the count tables with Myrna.

Getting Started with ExpressionSets

Please click here for a few R commands that are useful when working with ExpressionSets.

Code Used

Commands passed to Myrna
R code used to create ExpressionSets (requires Bioconductor and additional files)
R code used in the "example applications" section of the paper (requires Bioconductor)