Difference between revisions of "AlignmentFormat"
From Genome Technology Core (GTC) wiki - Sequencing and Microarray
Line 1: | Line 1: | ||
− | + | = ELAND EXTENDED FORMAT - s_<LANE>_eland_extended_<GENOME>_<SEED>-<READLENGTH>.TXT = | |
* Column 1. Machine Name | * Column 1. Machine Name | ||
Line 27: | Line 27: | ||
* Column 22.Filtering — Did the read pass quality filtering? “Y” for yes, “N” for no | * Column 22.Filtering — Did the read pass quality filtering? “Y” for yes, “N” for no | ||
− | + | = ELAND NORMAL/FIXEDLENGTH FORMAT - s_<LANE>_eland_fixedlength_<GENOME>_<READLENGTH>.TXT = | |
Each line of the output file contains the following fields: | Each line of the output file contains the following fields: | ||
* Column 1. Sequence ID | * Column 1. Sequence ID | ||
Line 51: | Line 51: | ||
* Column 12.Position and type of first substitution error (A numeral refers to a run of matching bases, an upper case base or N refers to a base in the reference that differs from the read. For example, 11A: after 11 matching bases, base 12 is A in the reference but not in the read) | * Column 12.Position and type of first substitution error (A numeral refers to a run of matching bases, an upper case base or N refers to a base in the reference that differs from the read. For example, 11A: after 11 matching bases, base 12 is A in the reference but not in the read) | ||
− | + | = ELAND ITERATIVE FORMAT - s_<LANE>_eland_iterative_<GENOME>_<MAXREADLENGTH>-<MINREADLENGTH>.TXT = | |
Same as "ELAND NORMAL/FIXEDLENGTH FORMAT" except an additional column at the end that has the history a read's alignment at each iteration, until it matches. | Same as "ELAND NORMAL/FIXEDLENGTH FORMAT" except an additional column at the end that has the history a read's alignment at each iteration, until it matches. |
Latest revision as of 16:03, 10 July 2009
ELAND EXTENDED FORMAT - s_<LANE>_eland_extended_<GENOME>_<SEED>-<READLENGTH>.TXT
- Column 1. Machine Name
- Column 2. Run Number
- Column 3. Lane
- Column 4. Tile
- Column 5. X Coordinate of cluster
- Column 6. Y Coordinate of cluster
- Column 7. Blank usually
- Column 8. Read number (1 or 2 for paired-read analysis, blank for a single-read analysis)
- Column 9. Read
- Column 10. Quality string—In symbolic ASCII format (ASCII character code = quality value + 64)
- Column 11. Match chromosome — Name of chromosome match OR code indicating why no match resulted
- Code for no match - "NM" - No Match, "QC" - Bad Read/Not enough base calls/too many N's, "#:#:#" - Indicating Number of Matches
- Column 12. Match Contig — Gives the contig name if there is a match and the match chromosome is split into contigs (Blank if no contigs)
- Column 13. Match Position — Always with respect to forward strand, numbering starts at 1 (Blank if no match found)
- Column 14. Match Strand—“F” for forward, “R” for reverse (Blank if no match found)
- Column 15. Match Descriptor — Concise description of alignment (Blank if no match found)
- A numeral denotes a run of matching bases
- A letter denotes substitution of a nucleotide: For a 35 base read, “35” denotes an exact match and “32C2” denotes substitution of a “C” at the 33rd position
- Column 16. Single-Read Alignment Score — Alignment score of a single-read match alignment score of a read if it were treated as a single read. Blank if no match found; any scores less than 4 should be considered as aligned to a repeat
- Column 17. Paired-Read Alignment Score — Alignment score of a paired read and its partner, taken as a pair. Blank if no match found; any scores less than 4 should be considered as aligned to a repeat
- Column 18. Partner Chromosome — Name of the chromosome if the read is paired and its partner aligns to another chromosome (Blank for single-read analysis)
- Column 19. Partner Contig — Not blank if read is paired and its partner aligns to another chromosome and that partner is split into contigs (Blank for single-read analysis)
- Column 20. Partner Offset — If a partner of a paired read aligns to the same chromosome and contig, this number, added to the Match Position, gives the alignment position of the partner (Blank for single-read analysis)
- Column 21. Partner Strand — To which strand did the partner of the paired read align? “F” for forward, “R” for reverse (Blank if no match found, blank for single-read analysis)
- Column 22.Filtering — Did the read pass quality filtering? “Y” for yes, “N” for no
ELAND NORMAL/FIXEDLENGTH FORMAT - s_<LANE>_eland_fixedlength_<GENOME>_<READLENGTH>.TXT
Each line of the output file contains the following fields:
- Column 1. Sequence ID
- Column 2. Sequence
- Column 3. Type of match codes:
- NM—No match found
- QC—No matching done: QC failure (too many Ns)
- U0—Best match found was a unique exact match
- U1—Best match found was a unique 1-error match
- U2—Best match found was a unique 2-error match
- R0—Multiple exact matches found
- R1—Multiple 1-error matches found, no exact matches
- R2—Multiple 2-error matches found, no exact or 1-error matches
- Column 4. Number of exact matches found
- Column 5. Number of 1-error matches found
- Column 6. Number of 2-error matches found
The following fields are only used if a unique best match was found:
- Column 7. Genome file in which match was found
- Column 8. Position of match (bases in file are numbered starting at 1)
- Column 9. Direction of match (F=forward strand, R=reverse)
- Column 11. How N characters in read were interpreted (“.”=not applicable, “D”=Detection, “I”=Insertion)
The following field is only used in the case of a unique inexact match:
- Column 12.Position and type of first substitution error (A numeral refers to a run of matching bases, an upper case base or N refers to a base in the reference that differs from the read. For example, 11A: after 11 matching bases, base 12 is A in the reference but not in the read)
ELAND ITERATIVE FORMAT - s_<LANE>_eland_iterative_<GENOME>_<MAXREADLENGTH>-<MINREADLENGTH>.TXT
Same as "ELAND NORMAL/FIXEDLENGTH FORMAT" except an additional column at the end that has the history a read's alignment at each iteration, until it matches.