Site Map
Latest Release
Related Tools
Bowtie: Ultrafast short read alignment |
Hadoop: Open Source MapReduce |
Contrail: Cloud-based de novo assembly |
CloudBurst: Sensitive MapReduce alignment |
Myrna: Cloud, differential gene expression |
Tophat: RNA-Seq splice junction mapper |
Cufflinks: Isoform assembly, quantitation |
SoapSNP: Accurate SNP/consensus calling |
Reference jars
|
H. sapiens: hg18/dbSNP 130
|
s3n://crossbow-refs/hg18.jar
|
M. musculus: mm9/dbSNP 128
|
s3n://crossbow-refs/mm9.jar
|
E. coli: O157:H7, NCBI (no SNPs)
|
s3n://crossbow-refs/e_coli.jar
|
Related publications
- Langmead B, Schatz M, Lin J, Pop M, Salzberg SL. Searching for SNPs with cloud computing. Genome Biology 10:R134.
- Schatz M, Langmead B, Salzberg SL. Cloud computing and the DNA data race. Nature Biotechnology 2010 Jul;28(7):691-3.
- Langmead B, Hansen K, Leek J. Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biology 11:R83.
- Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10:R25.
- Li R, Li Y, Fang X, Yang H, Wang J, Kristiansen K, Wang J. SNP detection for massively parallel whole-genome resequencing. Genome Res. 2009. 19: 1124-1132.
Authors
Other Documentation
Links
News archive
Crossbow on GitHub - 6/11/2013
Version 1.2.1 - May 30, 2013
- Fixed some failures caused by the new SRA's utilities interface changes.
- Updated the examples to work with the new SRA interface.
- Fixed mouse example to properly check crossbow predefined paths if they exists.
- Changed soapsnp to static linkage to avoid confusion on multilib platforms where user might use alternate LD_LIBRARY_PATH.
Version 1.2.0 - July 20, 2012
- Added support for Hadoop version 0.20.205.
- Dropped support for Hadoop versions prior to 0.20.
- Updated default Hadoop version for EMR jobs to 0.20.205.
- Updated Bowtie version used to 0.12.8.
- Fixed issues with streaming jar version parsing.
- Fixed documentation bugs regarding --sra-toolkit option, which is
superseded by the
--fastq-dump
option.
Version 1.1.2 - May 23, 2011
-
Added
--just-align
and
--resume-align
options.
--just-align
causes Crossbow to put the results of the Alignment phase in the
--output
directory and quit after the alignment phase. You can
later "resume" Crossbow by specifying this directory as the
--input
directory and specifying the
--resume-align
option.
-
Fixed issue with .sra input whereby status output from fastq-dump
would be interpreted as a read.
-
Other minor bugfixes.
Version 1.1.1 - February 7, 2011
-
Added support for the
.sra file format, used by the
Sequence Read Archive. These files can now be
specified in the manifest. Crossbow uses the fastq-convert
tool from the
SRA Toolkit to convert
.sra
files to FASTQ files in the preprocess stage.
-
The examples that included defunct SRA FASTQ files were updated to
point to new
.sra
files instead.
Version 1.1.0 - October 12, 2010
-
Added
--discard-ref-bin
and
--discard-all
options, which can be
helpful to reduce Crossbow running time when a run's chief purpose
is to test whether it runs all the way through.
-
Fixed a bug in soapsnp that caused a segmentation fault in the
last partition of a chromosome when chromosome length is a
multiple of 64.
-
Revamped the reference jar scripts (in $CROSSBOW_HOME/reftools).
The new scripts use Ensembl rather than
UCSC &
dbSNP. The old
scripts (db2ssnp* and *_jar) are still there, but are likely to be
deprecated soon.
-
Fixed a few bugs in the hg19_jar and db2ssnp_hg19 scripts.
-
Removed the hg18_jar script, which was broken by a reorganization
of the dbSNP site.
-
Uses Bowtie 0.12.7 instead of 0.12.5.
-
Switched Mouse17 example's manifest files back to use .gz
extension instead of .bz2.
Version 1.0.9 - September 13, 2010
-
Fixed example manifests that point to Short Read Archive files to
use .bz2 instead of .gz extensions.
Version 1.0.8 - September 4, 2010
-
Set the memory cap on the sort task to be inversely proportional
to --cpus, to avoid memory footprint blowup on computers with more
processors.
- Fixed a final issue that affected how Crossbow handles quality
value conversion.
- Fixed issue whereby bzip2'ed data would be handled incorrectly by
the preprocessor.
- Fixed counter in Preprocess step that would erroneously refer to
unpaired reads as paired. Also "Read data fetched to EC2" has
been changed to "Read data fetched".
- In EMR mode, updated where user credentials are found; Amazon
changed their path sometime around 8/30/2010.
- In EMR mode, updated the manner in which the bootstrap action is
specified; the old way was disabled by Amazon sometime around
8/30/2010.
- Fixed issue whereby ReduceWrap.pl would crash in cases with a
large number of bins (>10 million) .
- NOTE: The Short Read Archive (SRA) seems to be in the midst of a
reorganization that includes files that were previously gzipped
being replaced with versions zipped with bzip2. The files will
sometimes disappear for a while. If you are having problems with
an example where input reads come from the SRA, try renaming files
in the manifest file as appropriate. If that doesn't work, please
contact us.
Version 1.0.7 - August 27, 2010
-
Fixed issue whereby the order of the arguments to bowtie would
result in a crash when POSIXLY_CORRECT was set.
-
Fixed --keep-all option, which was causing a crash.
-
Fixed a lingering quality bug whereby qualities were converted
immediately to phred33 but phred64 or solexa64 flags would be
spuriously passed to Bowtie.
Version 1.0.6 - August 26, 2010
-
Single-computer mode now copies the output that it writes to the
console to a file cb.local.(pid).out. Please include the
contents of this file when reporting issues.
-
Sorting in single-computer mode is now more portable; switched
from command-line sort to pure-Perl File::Sort.
-
Fixed bug whereby the quality setting would be propagated to
Bowtie but not to SOAPsnp, causing SOAPsnp to operate with
incorrect (over-optimistic) quality values when Phred+64 or
Solexa+64 modes were used.
-
More helpful output from MapWrap.pl and ReduceWrap.pl to make it
easier to debug issues in single-computer-mode runs.
-
Fixed issue where web form would incorrectly convert + signs in
AWS secret key to spaces, causing some good credentials to fail
verification.
-
Fixed issue in preprocessor that mishandles copies when user's AWS
secret key contains slash characters.
Version 1.0.5 - August 16, 2010
-
Fixed issue that prevented CROSSBOW_EMR_HOME environment variable
from working.
-
Fixed issue whereby Align.pl script fails to report a count for
the number of reads with alignments sampled due to Bowtie's -M
option.
-
Fixed issue whereby scripts in the $CROSSBOW_HOME/reftools
directory had #!/bin/sh headers but were actually bash scripts.
-
Fixed issue that made it difficult to specify a space-separated
list of arguments to the --bowtie-args and other --*-args
parameters.
-
Fixed issue whereby most documentation referred to arguments with
a single-dash prefix, whereas users with the POSIXLY_CORRECT
environment variable set must use a double-dash prefix.
Documentation and code have been updated to always use double-dash
prefixes.
Myrna paper out - 8/11/10
Major revision: Version 1.0.4 - July 22, 2010
- Crossbow has been largely rewritten as an Amazon Elastic MapReduce
(EMR) application, as opposed to an Elastic Compute Cloud (EC2)
application. EMR runs on top of EC2 and is a more appropriate way
to run Crossbow for several reasons, including:
-
The AWS Console's Elastic MapReduce tab, together with EMR's
Debug Job Flow feature, provide a much friendlier interface for
monitoring and manipulating jobs.
-
The elaborate scripts for automating cluster setup, teardown,
proxy connection, etc., that were in old versions of Crossbow
are all gone. They are either no longer relevant, or else are
handled automatically by EMR.
- A web-based GUI for composing and submitting EMR jobs has been
added. Most helpfully, the web GUI has features for
sanity-checking inputs; e.g. whether the user's credentials as
entered are valid, whether the input URL exists, etc.
- Crossbow is now fully "tri-mode", with separate cloud, Hadoop, and
single-computer operating modes. All three modes share a great
deal of common infrastructure, making all three modes easier to
maintain going forward.
-
Crossbow's Hadoop mode is now much improved, having an interface
very similar to cloud and single-computer modes.
-
A new single-computer operating mode has been added that (a)
uses many processors/cores to shorten computation time, and (b)
does not require the user to have a cloud account or a Hadoop
installation. It also doesn't require Java; just appropriate
versions of Bowtie, SOAPsnp (some of which are included), Perl,
and other tools. Its interface is very similar to cloud and
Hadoop modes.
- The manual is entirely rewritten. It now contains information
about running in all three modes (cloud, Hadoop, single-computer),
and gives multiple examples for how to run in each mode.
- Fixed a bug whereby allele frequency columns in the provided
reference jars had T and G columns switched.
- SOAPsnp reduce step now outputs more counter and status
information.
- SOAPsnp reduce step outputs an additional column per SNP
indicating paired-end coverage.
- Compatible with Bowtie versions 0.12.0 and above. Bowtie 0.12.5
now included.
- Compatible with Hadoop 0.18 or 0.20; both were tested.
- Many other new options and features. See manual.
Crossbow 0.2.0 and Crossbow for Elastic MapReduce - Coming soon
- Crossbow has been rewritten to allow it to run as an
Amazon Elastic MapReduce
application. EMR is a much easier and slicker
way of running MapReduce programs on EC2. It will also be easier
for us to maintain.
- Fixed a bug whereby allele frequency columns in the provided
reference jars had T and G columns switched.
- SOAPsnp reduce step now outputs more counter and status
information.
- SOAPsnp reduce step outputs an additional column per SNP
indicating paired-end coverage.
- Compatible with Bowtie
versions 0.12.0 and above.
Bowtie 0.12.0 - 12/23/09
Crossbow paper out - 11/20/09
0.1.3 release - 10/21/09
-
cb-local now gives the user clear feedback when worker nodes fail
to confirm the MD5 signature of the reference jar. If this
failure occurs several times per node across all nodes, the
supplied MD5 is probably incorrect.
-
An extra Reduce step was added to the end of the Crossbow job to
bin and sort SNPs before downloaded to the user's computer. This
step also renames output files by chromosome and deletes empty
output files.
- Added another example that uses recently-published mouse
chromosome 17 data (sequenced by Sudbery et al). The TUTORIAL
file now points to this new example.
- More and clearer messages in the output from cb-local.
Crossbow piece on Cloudera Blog - 10/15/09
- Mike Schatz,
Crossbow co-author, has a great
guest post on Cloudera's
"Hadoop & Big Data" blog about Crossbow.
0.1.2 release - 10/13/09
- Many fixes for the scripts that automate the reference-jar
building process.
- Added two utility scripts, dist_mfa and sanity_check, to the
reftools subdirectory. See their documentation for details.
- Added scripts for building a reference jar for C. elegans using
UCSC's ce6 (WormBase's WS190) assembly and information from dbSNP.
This small genome is used in the new TUTORIAL.
- New TUTORIAL steps the user through preprocessing reads from the
NCBI Short Read Archive, creating a reference jar from a UCSC
assembly (ce6 in this case) and a set of SNP descriptions from
dbSNP, then running Crossbow and examining the resulting SNPs.
- Extended the preprocess-and-copy infrastructure to allow output
from a single input file to be split over many output files. This
is critical for achieving good load balance across a breadth of
datasets.
0.1.1 release - 10/9/09
- Added scripts that automate the reference-jar building process for
UCSC genomes hg18 and mm9. These scripts can be adapted to other
species. See the new "Using Automatic Scripts" subsection of the
"Building a Reference Jar" section of the MANUAL for details.
- License agreement files are now organized better. All licenses
applying to all software included in Crossbow are in LICENSE*
files in the Crossbow root directory.
- Minor updates to MANUAL
0.1.0 release - 10/3/09
The first public release of Crossbow is now available for download.
This release includes:
- A detailed manual (MANUAL in the expanded archive)
that includes step-by-step instructions for how to get
started with Amazon Web
Services and Crossbow.
- Scripts for copying and preprocessing short-read FASTQ files
into Amazon's S3 storage service, including an easy-to-use
interactive script (cb-copy-interactive)
- Scripts for running Crossbow either locally or on Amazon's EC2
utility computing service, including an easy-to-use
interactive script (cb-interactive)