Crossbow: Whole Genome Resequencing Analysis in the Clouds

Site Map

Latest Release

Crossbow 1.2.1	5/30/13
Please cite: Langmead B, Schatz MC, Lin J, Pop M, Salzberg SL. Searching for SNPs with cloud computing. Genome Biol 10:R134.
For release updates, subscribe to the mailing list.

Related Tools

Bowtie: Ultrafast short read alignment

Hadoop: Open Source MapReduce

Contrail: Cloud-based de novo assembly

CloudBurst: Sensitive MapReduce alignment

Myrna: Cloud, differential gene expression

Tophat: RNA-Seq splice junction mapper

Cufflinks: Isoform assembly, quantitation

SoapSNP: Accurate SNP/consensus calling

Reference jars

H. sapiens: hg18/dbSNP 130

s3n://crossbow-refs/hg18.jar

M. musculus: mm9/dbSNP 128

s3n://crossbow-refs/mm9.jar

E. coli: O157:H7, NCBI (no SNPs)

s3n://crossbow-refs/e_coli.jar

Related publications

Langmead B, Schatz M, Lin J, Pop M, Salzberg SL. Searching for SNPs with cloud computing. Genome Biology 10:R134.
Schatz M, Langmead B, Salzberg SL. Cloud computing and the DNA data race. Nature Biotechnology 2010 Jul;28(7):691-3.
Langmead B, Hansen K, Leek J. Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biology 11:R83.
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10:R25.
Li R, Li Y, Fang X, Yang H, Wang J, Kristiansen K, Wang J. SNP detection for massively parallel whole-genome resequencing. Genome Res. 2009. 19: 1124-1132.

Authors

Crossbow on GitHub - 6/11/2013

Crossbow source now lives in a public GitHub repository.

Version 1.2.1 - May 30, 2013

Fixed some failures caused by the new SRA's utilities interface changes.
Updated the examples to work with the new SRA interface.
Fixed mouse example to properly check crossbow predefined paths if they exists.
Changed soapsnp to static linkage to avoid confusion on multilib platforms where user might use alternate LD_LIBRARY_PATH.

Version 1.2.0 - July 20, 2012

Added support for Hadoop version 0.20.205.
Dropped support for Hadoop versions prior to 0.20.
Updated default Hadoop version for EMR jobs to 0.20.205.
Updated Bowtie version used to 0.12.8.
Fixed issues with streaming jar version parsing.
Fixed documentation bugs regarding --sra-toolkit option, which is superseded by the --fastq-dump option.

Version 1.1.2 - May 23, 2011

Added --just-align and --resume-align options. --just-align causes Crossbow to put the results of the Alignment phase in the --output directory and quit after the alignment phase. You can later "resume" Crossbow by specifying this directory as the --input directory and specifying the --resume-align option.
Fixed issue with .sra input whereby status output from fastq-dump would be interpreted as a read.
Other minor bugfixes.

Version 1.1.1 - February 7, 2011

Added support for the .sra file format, used by the Sequence Read Archive. These files can now be specified in the manifest. Crossbow uses the fastq-convert tool from the SRA Toolkit to convert .sra files to FASTQ files in the preprocess stage.
The examples that included defunct SRA FASTQ files were updated to point to new .sra files instead.

Version 1.1.0 - October 12, 2010

Added --discard-ref-bin and --discard-all options, which can be helpful to reduce Crossbow running time when a run's chief purpose is to test whether it runs all the way through.
Fixed a bug in soapsnp that caused a segmentation fault in the last partition of a chromosome when chromosome length is a multiple of 64.
Revamped the reference jar scripts (in $CROSSBOW_HOME/reftools). The new scripts use Ensembl rather than UCSC & dbSNP. The old scripts (db2ssnp* and *_jar) are still there, but are likely to be deprecated soon.
Fixed a few bugs in the hg19_jar and db2ssnp_hg19 scripts.
Removed the hg18_jar script, which was broken by a reorganization of the dbSNP site.
Uses Bowtie 0.12.7 instead of 0.12.5.
Switched Mouse17 example's manifest files back to use .gz extension instead of .bz2.

Version 1.0.9 - September 13, 2010

Fixed example manifests that point to Short Read Archive files to use .bz2 instead of .gz extensions.

Version 1.0.8 - September 4, 2010

Set the memory cap on the sort task to be inversely proportional to --cpus, to avoid memory footprint blowup on computers with more processors.
Fixed a final issue that affected how Crossbow handles quality value conversion.
Fixed issue whereby bzip2'ed data would be handled incorrectly by the preprocessor.
Fixed counter in Preprocess step that would erroneously refer to unpaired reads as paired. Also "Read data fetched to EC2" has been changed to "Read data fetched".
In EMR mode, updated where user credentials are found; Amazon changed their path sometime around 8/30/2010.
In EMR mode, updated the manner in which the bootstrap action is specified; the old way was disabled by Amazon sometime around 8/30/2010.
Fixed issue whereby ReduceWrap.pl would crash in cases with a large number of bins (>10 million) .
NOTE: The Short Read Archive (SRA) seems to be in the midst of a reorganization that includes files that were previously gzipped being replaced with versions zipped with bzip2. The files will sometimes disappear for a while. If you are having problems with an example where input reads come from the SRA, try renaming files in the manifest file as appropriate. If that doesn't work, please contact us.

Version 1.0.7 - August 27, 2010

Fixed issue whereby the order of the arguments to bowtie would result in a crash when POSIXLY_CORRECT was set.
Fixed --keep-all option, which was causing a crash.
Fixed a lingering quality bug whereby qualities were converted immediately to phred33 but phred64 or solexa64 flags would be spuriously passed to Bowtie.

Crossbow

Genotyping from short reads using cloud computing