AlignmentFormat

From Genome Technology Core (GTC) wiki - Sequencing and Microarray
Revision as of 17:03, 10 July 2009 by Sgupta (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

ELAND EXTENDED FORMAT - s_<LANE>_eland_extended_<GENOME>_<SEED>-<READLENGTH>.TXT

  • Column 1. Machine Name
  • Column 2. Run Number
  • Column 3. Lane
  • Column 4. Tile
  • Column 5. X Coordinate of cluster
  • Column 6. Y Coordinate of cluster
  • Column 7. Blank usually
  • Column 8. Read number (1 or 2 for paired-read analysis, blank for a single-read analysis)
  • Column 9. Read
  • Column 10. Quality string—In symbolic ASCII format (ASCII character code = quality value + 64)
  • Column 11. Match chromosome — Name of chromosome match OR code indicating why no match resulted
    • Code for no match - "NM" - No Match, "QC" - Bad Read/Not enough base calls/too many N's, "#:#:#" - Indicating Number of Matches
  • Column 12. Match Contig — Gives the contig name if there is a match and the match chromosome is split into contigs (Blank if no contigs)
  • Column 13. Match Position — Always with respect to forward strand, numbering starts at 1 (Blank if no match found)
  • Column 14. Match Strand—“F” for forward, “R” for reverse (Blank if no match found)
  • Column 15. Match Descriptor — Concise description of alignment (Blank if no match found)
    • A numeral denotes a run of matching bases
    • A letter denotes substitution of a nucleotide: For a 35 base read, “35” denotes an exact match and “32C2” denotes substitution of a “C” at the 33rd position
  • Column 16. Single-Read Alignment Score — Alignment score of a single-read match alignment score of a read if it were treated as a single read. Blank if no match found; any scores less than 4 should be considered as aligned to a repeat
  • Column 17. Paired-Read Alignment Score — Alignment score of a paired read and its partner, taken as a pair. Blank if no match found; any scores less than 4 should be considered as aligned to a repeat
  • Column 18. Partner Chromosome — Name of the chromosome if the read is paired and its partner aligns to another chromosome (Blank for single-read analysis)
  • Column 19. Partner Contig — Not blank if read is paired and its partner aligns to another chromosome and that partner is split into contigs (Blank for single-read analysis)
  • Column 20. Partner Offset — If a partner of a paired read aligns to the same chromosome and contig, this number, added to the Match Position, gives the alignment position of the partner (Blank for single-read analysis)
  • Column 21. Partner Strand — To which strand did the partner of the paired read align? “F” for forward, “R” for reverse (Blank if no match found, blank for single-read analysis)
  • Column 22.Filtering — Did the read pass quality filtering? “Y” for yes, “N” for no

ELAND NORMAL/FIXEDLENGTH FORMAT - s_<LANE>_eland_fixedlength_<GENOME>_<READLENGTH>.TXT

Each line of the output file contains the following fields:

  • Column 1. Sequence ID
  • Column 2. Sequence
  • Column 3. Type of match codes:
    • NM—No match found
    • QC—No matching done: QC failure (too many Ns)
    • U0—Best match found was a unique exact match
    • U1—Best match found was a unique 1-error match
    • U2—Best match found was a unique 2-error match
    • R0—Multiple exact matches found
    • R1—Multiple 1-error matches found, no exact matches
    • R2—Multiple 2-error matches found, no exact or 1-error matches
  • Column 4. Number of exact matches found
  • Column 5. Number of 1-error matches found
  • Column 6. Number of 2-error matches found

The following fields are only used if a unique best match was found:

  • Column 7. Genome file in which match was found
  • Column 8. Position of match (bases in file are numbered starting at 1)
  • Column 9. Direction of match (F=forward strand, R=reverse)
  • Column 11. How N characters in read were interpreted (“.”=not applicable, “D”=Detection, “I”=Insertion)

The following field is only used in the case of a unique inexact match:

  • Column 12.Position and type of first substitution error (A numeral refers to a run of matching bases, an upper case base or N refers to a base in the reference that differs from the read. For example, 11A: after 11 matching bases, base 12 is A in the reference but not in the read)

ELAND ITERATIVE FORMAT - s_<LANE>_eland_iterative_<GENOME>_<MAXREADLENGTH>-<MINREADLENGTH>.TXT

Same as "ELAND NORMAL/FIXEDLENGTH FORMAT" except an additional column at the end that has the history a read's alignment at each iteration, until it matches.