BLASTing Illumina reads in FASTQ format

Sequences typically come in FASTA or in FASTQ format, or in their compressed variations (i.e., with an additional .gz or .bz2).

BLAST uses FASTA format for queries and for database creation. So the BLAST algorithm does not directly understand FASTQ format. This is because:

FASTQ files typically result from Illumina or Nanopore sequencing. They typically are huge files that containing tens to hundreds of millions of reads, with many being from the from the same subset of the genome or transcriptome, or from a particular amplicon. Such information is highly redundant. When this is the case:

Most of the time if you want to BLAST a FASTQ file, you’re probably not using the best approach

It is likely that you want to first reduce redundancy in your dataset. The most biologically relevant way is often to perform whole genome or transcriptome assembly of your raw reads prior to BLASTing them. Sometimes, simple deduplication or collapsing is sufficient.

If you do want to work with the raw FASTQ reads, BLAST often isn’t the best way to perform analysis.

But what if I really do need to run BLAST on FASTQ files?

While it is often inappropriate to BLAST raw reads, gaining biological insight sometimes does depend on it.

SequenceServer automatically detects and converts FASTQ to FASTA format. Just paste the FASTQ reads into the search box. SequenceServer will instantly convert to FASTA for BLASTing.

Command-line batch conversion of FASTQ to FASTA

If you have huge numbers of reads, you’ll want to use a more automated approach to convert the FASTQ format file to a FASTA format file. Using a tried and tested tool is less risky than creating your own custom script by creatively using grep, sed, python, perl or chatgpt. The following seqtk command is one easy way:

seqtk seq -A input.fq > output.fasta

Before using huge numbers of reads for database creation or as queries, it’s often a good idea to remove redundancy. You can directly reduce redunancy with a tool like cd-hit, but it’s often best to run a quick assembly (e.g. with spades or megahit).

If you need a transcriptome, metagenome or genome assembly done on your raw data, we can help you with that. Contact support with your details and we’ll get back to you. We offer cheap and fast transcriptome, genome, and metagenome assembly services.

By leveraging cloud computing and publication-ready graphics, SequenceServer Cloud makes it easy to perform BLAST searches and to interpret them. Learn more

Sequence Search with SequenceServer

Stay up to date

To receive the latest news from our team, enter your email:

Some blog posts you might like: