Latest revision as of 17:03, 10 July 2009

ELAND EXTENDED FORMAT - s_<LANE>_eland_extended_<GENOME>_<SEED>-<READLENGTH>.TXT

Column 1. Machine Name
Column 2. Run Number
Column 3. Lane
Column 4. Tile
Column 5. X Coordinate of cluster
Column 6. Y Coordinate of cluster
Column 7. Blank usually
Column 8. Read number (1 or 2 for paired-read analysis, blank for a single-read analysis)
Column 9. Read
Column 10. Quality string—In symbolic ASCII format (ASCII character code = quality value + 64)
Column 11. Match chromosome — Name of chromosome match OR code indicating why no match resulted
- Code for no match - "NM" - No Match, "QC" - Bad Read/Not enough base calls/too many N's, "#:#:#" - Indicating Number of Matches
Column 12. Match Contig — Gives the contig name if there is a match and the match chromosome is split into contigs (Blank if no contigs)
Column 13. Match Position — Always with respect to forward strand, numbering starts at 1 (Blank if no match found)
Column 14. Match Strand—“F” for forward, “R” for reverse (Blank if no match found)
Column 15. Match Descriptor — Concise description of alignment (Blank if no match found)
- A numeral denotes a run of matching bases
- A letter denotes substitution of a nucleotide: For a 35 base read, “35” denotes an exact match and “32C2” denotes substitution of a “C” at the 33rd position
Column 16. Single-Read Alignment Score — Alignment score of a single-read match alignment score of a read if it were treated as a single read. Blank if no match found; any scores less than 4 should be considered as aligned to a repeat
Column 17. Paired-Read Alignment Score — Alignment score of a paired read and its partner, taken as a pair. Blank if no match found; any scores less than 4 should be considered as aligned to a repeat
Column 18. Partner Chromosome — Name of the chromosome if the read is paired and its partner aligns to another chromosome (Blank for single-read analysis)
Column 19. Partner Contig — Not blank if read is paired and its partner aligns to another chromosome and that partner is split into contigs (Blank for single-read analysis)
Column 20. Partner Offset — If a partner of a paired read aligns to the same chromosome and contig, this number, added to the Match Position, gives the alignment position of the partner (Blank for single-read analysis)
Column 21. Partner Strand — To which strand did the partner of the paired read align? “F” for forward, “R” for reverse (Blank if no match found, blank for single-read analysis)
Column 22.Filtering — Did the read pass quality filtering? “Y” for yes, “N” for no

ELAND NORMAL/FIXEDLENGTH FORMAT - s_<LANE>_eland_fixedlength_<GENOME>_<READLENGTH>.TXT

Each line of the output file contains the following fields:

Column 1. Sequence ID
Column 2. Sequence
Column 3. Type of match codes:
- NM—No match found
- QC—No matching done: QC failure (too many Ns)
- U0—Best match found was a unique exact match
- U1—Best match found was a unique 1-error match
- U2—Best match found was a unique 2-error match
- R0—Multiple exact matches found
- R1—Multiple 1-error matches found, no exact matches
- R2—Multiple 2-error matches found, no exact or 1-error matches
Column 4. Number of exact matches found
Column 5. Number of 1-error matches found
Column 6. Number of 2-error matches found

The following fields are only used if a unique best match was found:

Column 7. Genome file in which match was found
Column 8. Position of match (bases in file are numbered starting at 1)
Column 9. Direction of match (F=forward strand, R=reverse)
Column 11. How N characters in read were interpreted (“.”=not applicable, “D”=Detection, “I”=Insertion)

The following field is only used in the case of a unique inexact match:

Column 12.Position and type of first substitution error (A numeral refers to a run of matching bases, an upper case base or N refers to a base in the reference that differs from the read. For example, 11A: after 11 matching bases, base 12 is A in the reference but not in the read)

ELAND ITERATIVE FORMAT - s_<LANE>_eland_iterative_<GENOME>_<MAXREADLENGTH>-<MINREADLENGTH>.TXT

Same as "ELAND NORMAL/FIXEDLENGTH FORMAT" except an additional column at the end that has the history a read's alignment at each iteration, until it matches.

@@ Line 1: / Line 1: @@
-== ELAND EXTENDED FORMAT - s_<LANE>_eland_extended_<GENOME>_<SEED>-<READLENGTH>.TXT ==
+= ELAND EXTENDED FORMAT - s_<LANE>_eland_extended_<GENOME>_<SEED>-<READLENGTH>.TXT =
 * Column 1. Machine Name
@@ Line 27: / Line 27: @@
 * Column 22.Filtering — Did the read pass quality filtering? “Y” for yes, “N” for no
-== ELAND NORMAL/FIXEDLENGTH FORMAT - s_<LANE>_eland_fixedlength_<GENOME>_<READLENGTH>.TXT ==
+= ELAND NORMAL/FIXEDLENGTH FORMAT - s_<LANE>_eland_fixedlength_<GENOME>_<READLENGTH>.TXT =
 Each line of the output file contains the following fields:
 * Column 1. Sequence ID
@@ Line 51: / Line 51: @@
 * Column 12.Position and type of first substitution error (A numeral refers to a run of matching bases, an upper case base or N refers to a base in the reference that differs from the read. For example, 11A: after 11 matching bases, base 12 is A in the reference but not in the read)
-== ELAND ITERATIVE FORMAT - s_<LANE>_eland_iterative_<GENOME>_<MAXREADLENGTH>-<MINREADLENGTH>.TXT ==
+= ELAND ITERATIVE FORMAT - s_<LANE>_eland_iterative_<GENOME>_<MAXREADLENGTH>-<MINREADLENGTH>.TXT =
 Same as "ELAND NORMAL/FIXEDLENGTH FORMAT" except an additional column at the end that has the history a read's alignment at each iteration, until it matches.

Difference between revisions of "AlignmentFormat"

Latest revision as of 17:03, 10 July 2009

ELAND EXTENDED FORMAT - s_<LANE>_eland_extended_<GENOME>_<SEED>-<READLENGTH>.TXT

ELAND NORMAL/FIXEDLENGTH FORMAT - s_<LANE>_eland_fixedlength_<GENOME>_<READLENGTH>.TXT

ELAND ITERATIVE FORMAT - s_<LANE>_eland_iterative_<GENOME>_<MAXREADLENGTH>-<MINREADLENGTH>.TXT

Navigation menu

Views

Personal tools

Navigation

Services

Equipment Resources

External Lab Registration

Search

Tools