|Myrna: Cloud, differential gene expression|
- Frazee AC, Langmead B, Leek JT. ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets. BMC Bioinformatics 12:449
- VLDS poster, 6/11 (coming soon)
ReCount is an online resource consisting of RNA-seq gene count datasets built using the raw data from 18 different studies. The raw sequencing data (.fastq files) were processed with Myrna to obtain tables of counts for each gene. For ease of statistical analysis, we combined each count table with sample phenotype data to form an R object of class ExpressionSet. The count tables, ExpressionSets, and phenotype tables are ready to use and freely available here. By taking care of several preprocessing steps and combining many datasets into one easily-accessible website, we make finding and analyzing RNA-seq data considerably more straightforward.
All columns of the table below are sortable: clicking on the column title will alphebetize or order the column (keeping the rows properly aligned). The columns are as follows:
With a few exceptions, the datasets are named for the first author of the paper from which the .fastq files were obtained. The Katz paper contained both mouse and human reads, so two separate datasets were created. The "maqc" dataset was built from reads obtained from the MicroArray Quality Control Project. The "modencodeworm" and "modencodefly" datasets were generated using reads from papers associated with the modENCODE Consortium.
Papers from which we collected the .fastq files are accessible via the given clickable PubMed ID.
The species of the samples under study.
Number of biological replicates
The number of distinct biological replicates included in the dataset. Gene counts from technical replicates were pooled. The number of technical replicates pooled to give counts for each biological replicate is available in the ExpressionSet and phenotype table for each dataset.
Number of uniquely aligned reads
In each Myrna run, some reads were discarded because they did not align, and some reads aligned repetitively and were therefore discarded. The count in this column is the number of reads not discarded. *Note that for Montgomery and Pickrell, read counts were for both datasets combined, since both were analyzed with the same Myrna run.
Click "link" to download an .RData file containing the gene count table and phenotype data in an ExpressionSet. When the R object is loaded into the workspace, the ExpressionSet will be named study.eset, where "study" is replaced with the dataset name given in the first column of the table. To use the ExpressionSets, you will need to install Bioconductor and run the command library(Biobase). For some preliminary information on using ExpressionSets, please click here.
Click "link" to download a .txt file containing the raw gene counts output by Myrna.
Click "link" to download a .txt file containing phenotype information for each sample in the count table. Each phenotype table contains a sample.id column and a num.tech.reps column, where sample.id is either the HapMap ID of the sample (if applicable) or the SRX number of the sample, which can be used to search for the sample in NCBI's Sequence Read Archive (SRA). The num.tech.reps column tells how many technical replicates were pooled to obtain gene counts for that sample.
NotesBrief description of experiment.
Please note that to use the ExpressionSets below, you will need to install Bioconductor and run the command library(Biobase)
|Study||PMID||Species||Number of biological replicates||Number of uniquely aligned reads||ExpressionSet||Count table||Phenotype table||Notes|
|bodymap||not published, but publicly available here||human||19||2,197,622,796||link||link||link||Illumina Human BodyMap 2.0 -- tissue comparison|
|cheung||20856902||human||41||834,584,950||link||link||link||HapMap - CEU|
|gilad||20009012||human||6||41,356,738||link||link||link||liver; males and femlaes|
|montgomery||20220756||human||60||*886,468,054||link||link||link||HapMap - CEU|
|pickrell||20220758||human||69||*886,468,054||link||link||link||HapMap - YRI|
|sultan||18599741||human||4||6,573,643||link||link||link||cell type comparison|
|katz.mouse||21057496||mouse||4||14,368,471||link||link||link||control vs. CUG-BP1 knockdown myoblasts|
|yang||20363980||mouse||1||27,883,862||link||link||link||hybrid cell line, X always inactive|
|bottomly||21455293||mouse||21||343,445,340||link||link||link||2 inbred mouse strains|
|nagalakshmi||18451266||yeast||4||7,688,602||link||link||link||priming technique comparison|
|hammer||20452967||rat||8||158,178,477||link||link||link||experimental vs. control at 2 time points|
|modencodeworm||19181841||worm||46||1,451,119,823||link||link||link||developmental time course|
|developmental time course|
**These studies originally contained tables with unpooled technical replicates. The unpooled tables are available under the "original" links, while tables with pooled technical replicates are available under the "pooled" links.
Datasets created without truncationThe count tables and ExpressionSets in the above table were created by truncating all reads longer than 35bp to 35bp. Count tables and ExpressionSets created without truncation are available for download here.
Ensembl 61 gene informationBelow are links to files containing information about the genes in Ensembl 61, the version used in creating these datasets. (These are the genes.txt files from the Ensembl 61 Myrna reference jar.)
Below are links to Myrna manifest files used to create the count tables with Myrna.
- Katz - human
- MAQC (links to locally stored files)
- Montgomery/Pickrell (count tables created with same Myrna run)
- Katz - mouse
- modENCODE - worm
- modENCODE - fly
Getting Started with ExpressionSetsPlease click here for a few R commands that are useful when working with ExpressionSets.
Code UsedCommands passed to Myrna
R code used to create ExpressionSets (requires Bioconductor and additional files)
R code used in the "example applications" section of the paper (requires Bioconductor)