Bowtie

An ultrafast memory-efficient short read aligner

Site Map

Latest Release

Links

Related Tools

Pre-built indexes

Consider using Illumina's iGenomes collection. Each iGenomes archive contains pre-built Bowtie and Bowtie 2 indexes.

H. sapiens, NCBI GRCh38 2.7 GB
H. sapiens, NCBI GRCh38
with 1KGenomes major SNPs
2.7 GB
  How we built this, FASTA
H. sapiens, UCSC hg19 2.7 GB
 colorspace: full
H. sapiens, UCSC hg19
with 1KGenomes major SNPs
2.7 GB
  How we built this, FASTA
H. sapiens, UCSC hg18 2.7 GB
 colorspace: full
H. sapiens, NCBI v37 2.7 GB
 colorspace: full
H. sapiens, NCBI v36 2.7 GB
 colorspace: full
M. musculus, UCSC mm8 2.4 GB
 colorspace: full
M. musculus, UCSC mm9 2.4 GB
 colorspace: full
M. musculus, NCBI v37 2.4 GB
 colorspace: full
R. norvegicus, UCSC rn4 2.4 GB
 colorspace: full
B. taurus, UMD v3.0 2.4 GB
 colorspace: full
C. familiaris, UCSC canFam2 2.4 GB
 colorspace: full
G. gallus, UCSC, galGal3 1.1 GB
 colorspace: full
D. melanogaster, Flybase, r5.22 150 MB
 colorspace: full
A. thaliana, TAIR, TAIR9 120 MB
 colorspace: full
C. elegans, Wormbase, WS200 75 MB
 colorspace: full
S. cerevisiae, CYGD 15 MB
 colorspace: full
E. coli, NCBI, st. 536 5 MB
 colorspace: full

All indexes are for assemblies, not contigs. Unplaced or unlocalized sequences and alternate haplotype assemblies are excluded.

Some unzip programs cannot handle archives >2 GB. If you have problems downloading or unzipping a >2 GB index, try downloading in two parts.

Check .zip file integrity with MD5s.

Publications

Contributors

Related links

Getting started


Download and extract the appropriate Bowtie binary release into a fresh directory. Change to that directory.


Performing alignments


The Bowtie source and binary packages come with a pre-built index of the E. coli genome, and a set of 1,000 35-bp reads simulated from that genome. To use Bowtie to align those reads, issue the following command. If you get an error message "command not found", try adding a ./ before the bowtie.


bowtie e_coli reads/e_coli_1000.fq

The first argument to bowtie is the basename of the index for the genome to be searched. The second argument is the name of a FASTQ file containing the reads.

Depending on your computer, the run might take a few seconds up to about a minute. You will see bowtie print many lines of output. Each line is an alignment for a read. The name of the aligned read appears in the leftmost column. The final line should say Reported 699 alignments to 1 output stream(s) or something similar.

Next, issue this command:


bowtie -t e_coli reads/e_coli_1000.fq e_coli.map

This run calculates the same alignments as the previous run, but the alignments are written to e_coli.map (the final argument) rather than to the screen. Also, the -t option instructs Bowtie to print timing statistics. The output should look something like this:


Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Seeded quality full-index search: 00:00:00
# reads processed: 1000
# reads with at least one reported alignment: 699 (69.90%)
# reads that failed to align: 301 (30.10%)
Reported 699 alignments to 1 output stream(s)
Time searching: 00:00:00
Overall time: 00:00:00


Installing a pre-built index


Download the pre-built S. cerevisiae genome package by right-clicking the "S. cerevisiae, CYGD" link in the "Pre-built indexes" section of the right-hand sidebar and selecting "Save Link As..." or "Save Target As...". All pre-built indexes are packaged as .zip archives, and the S. cerevisiae archive is named s_cerevisiae.ebwt.zip. When it has finished downloading, extract the archive into the Bowtie indexes subdirectory using your preferred unzip tool. The index is now installed.

To test that the index is properly installed, issue this command from the Bowtie install directory:


bowtie -c s_cerevisiae ATTGTAGTTCGAGTAAGTAATGTGGGTTTG

This command searches the S. cerevisiae index with a single read. The -c argument instructs Bowtie to obtain read sequences directly from the command line rather than from a file. If the index is installed properly, this command should print a single alignment and then exit.

If you would rather install pre-built indexes somewhere other than the indexes subdirectory of the Bowtie install directory, simply set the BOWTIE_INDEXES environment variable to point to your preferred directory and extract indexes there instead.


Building a new index


The pre-built E. coli index included with Bowtie is built from the sequence for strain 536, known to cause urinary tract infections. We will create a new index from the sequence of E. coli strain O157:H7, a strain known to cause food poisoning. Download the sequence file by right-clicking this link and selecting "Save Link As..." or "Save Target As...". The sequence file is named NC_002127.fna. When the sequence file is finished downloading, move it to the Bowtie install directory and issue this command:


bowtie-build NC_002127.fna e_coli_O157_H7

The command should finish quickly, and print several lines of status messages. When the command has completed, note that the current directory contains four new files named e_coli_O157_H7.1.ebwt, e_coli_O157_H7.2.ebwt, e_coli_O157_H7.rev.1.ebwt, and e_coli_O157_H7.rev.2.ebwt. These files constitute the index. Move these files to the indexes subdirectory to install it.

To test that the index is properly installed, issue this command:


bowtie -c e_coli_O157_H7 GCGTGAGCTATGAGAAAGCGCCACGCTTCC

If the index is installed properly, this command should print a single alignment and then exit.


Finding variations with SAMtools


SAMtools is a suite of tools for storing, manipulating, and analyzing alignments such as those output by Bowtie. SAMtools understands alignments in either of two complementary formats: the human-readable SAM format, or the binary BAM format. Because Bowtie can output SAM (using the -S/--sam option), and SAM can can be converted to BAM using SAMtools, Bowtie users can make full use of the analyses implemented in SAMtools, or in any other tools supporting SAM or BAM.

We will use SAMtools to find SNPs in a set of simulated reads included with Bowtie. The reads cover the first 10,000 bases of the pre-built E. coli genome and contain 10 SNPs throughout. First, we run bowtie to align the reads, being sure to specify the -S option. We also specify an output file that we will use as input for the next step (though pipes can be used to accomplish the same thing without the intermediate file):


bowtie -S e_coli reads/e_coli_10000snp.fq ec_snp.sam

Next, we convert the SAM file to BAM in preparation for sorting. We assume that SAMtools is installed and that the samtools binary is accessible in the PATH.


samtools view -bS -o ec_snp.bam ec_snp.sam

Next, we sort the BAM file, in preparation for SNP calling:


samtools sort ec_snp.bam ec_snp.sorted

We now have a sorted BAM file called ec_snp.sorted.bam. Sorted BAM is a useful format because the alignments are both compressed, which is convenient for long-term storage, and sorted, which is conveneint for variant discovery. Finally, we call variants from the Sorted BAM:


samtools pileup -cv -f genomes/NC_008253.fna ec_snp.sorted.bam

For this sample data, the samtools pileup command should print records for 10 distinct SNPs, the first being at position 541 in the reference.

See the SAMtools web site for details on how to use these and other tools in the SAMtools suite.