SequencingFormats

We typically provide the following files/formats as part of a typical data package for HiSeq as well as Genome Analyzer Sequencing Data.

FASTQ Files (Quality Score Files)

This file format is used frequently at the Sanger Institute to bundle a sequence and its quality data.

A Illumina FASTQ file normally uses four lines per sequence:

Line 1 - begins with a '@' character and is followed by a sequence identifier Line 2 - the sequence Line 3 - begins with a '+' character and is optionally followed by the same sequence identifier Line 4 - the quality values for each base in the sequence in Line 2

An Example: @WICMT-SOLEXA_0043:7:1:1500:1199#0/1;1 CTCTGCTGTCTCTTGTGTAAGAAGANANNNCNTTCT +WICMT-SOLEXA_0043:7:1:1500:1199#0/1;1 fffffffRcfad[ddffa]faaaaBBBBBBBBBBBB

The sequence identifier in the above example is "WICMT-SOLEXA_0043:7:1:1500:1199#0/1;1". Please note that the last 2 characters i.e. ";1" may only be found in datasets sequenced by us because we do not remove the filtered reads but indicate them with a 1 (not filtered i.e. good) or 0 (filtered i.e. bad). Please refer to the FAQ's for information on filtering criteria and what it means.

WICMT-SOLEXA	Instrument name
0043	A unique random string for the whole run (pretty much meaningless)
7	flowcell lane
1	tile number within the flowcell lane
1500	'x'-coordinate of the cluster within the tile
1199	'y'-coordinate of the cluster within the tile
#0	index number for a multiplexed sample (0 for no indexing)
/1	the member of a pair, /1 or /2 (paired-end or mate-pair reads only)
;1	Read was not filtered This bit of information is typically not present in FASTQ files created by Illumina pipeline

SequencingFormats

FASTQ Files (Quality Score Files)

Navigation menu

Views

Personal tools

Navigation

Services

Equipment Resources

External Lab Registration

Search

Tools