ELANDV1.1

From Genome Technology Core (GTC) wiki - Sequencing and Microarray
Revision as of 11:26, 6 October 2009 by Sgupta (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

To run the program

USAGE: perl /nfs/WICMT/pub/public_apps/ELAND/run_eland_v1.1.pl config.txt

Output Files

1. [Prefix]_[GenomeUsed]_[SeedUsed]_eland_extended.txt - Gives all the positions a read maps, if it matches to less than 10 positions.

2. [Prefix]_[GenomeUsed]_[SeedUsed]_export.txt - Contains all read, quality value and alignment information for a lane of data.

3. [Prefix]_[GenomeUsed]_[SeedUsed].ylf - (Y)oung (L)ab (F)ormat files - These are used as input for the young lab error model for ChIP-Seq analysis.

4. [Prefix]_[GenomeUsed]_[SeedUsed]_sorted.txt - Contains all the uniquely mapped reads, sorted based on the position on the genome.

5. [Prefix]_[GenomeUsed]_[SeedUsed]_genomesize.xml - Summary for the genome being used for the alignment

6. TechnicalAnalysis-[Prefix]_[GenomeUsed]_[SeedUsed].xls - QC Report for the ELAND alignment - Please refer here for details - http://jura.wi.mit.edu/genomecorewiki/index.php/QCOutputFormat

7. [Prefix]_[GenomeUsed]_[SeedUsed]_tagcount.txt - Gives a count for all the reads, their corresponding counts, position on the genome and the gene name if it is in the transcribed region.

Instructions for making the config file

FORMAT - INDEX,TYPE,VALUE
ANY LINES STARTING WITH A POUND SIGN (#) ARE IGNORED
INPUTFILE (REQUIRED) - Name of the sequence.txt (fastq) file.
INPUTDIR (REQUIRED) - Full path to the directory where the INPUTFILE lives.
OUTPUTDIR (REQUIRED) - Where you want the results, you should have writing previledges to this location
GENOME (REQUIRED) - Select the Genome from /lab/solexa_public/Genome/ Full path to the genome to use. Email sgupta@wi.mit.edu if you have questions about this
UNIQUENAME (REQUIRED) - This can be anything - sample name, IP name etc.. user defined
SEEDLENGTH (OPTIONAL) - Recommended values is 25, can be any number between 15 TO 32. if not specified defaults to min of read length and 32

Example 1

Align s_1_sequence.txt (lives at /nfs/WICMT/pub/scripts/test/) to /lab/solexa_public/Genome/RefGenome_Eland_mm8/ genome and output results to /nfs/WICMT/pub/scripts/test/ with all result files having a prefix "1_TEST"

1,INPUTFILE,s_1_sequence.txt
1,INPUTDIR,/nfs/WICMT/pub/scripts/test/
1,OUTPUTDIR,/nfs/WICMT/pub/scripts/test/
1,GENOME,/lab/solexa_public/Genome/RefGenome_Eland_mm8/
1,UNIQUENAME,TEST

Example 2

Align s_1_sequence.txt AND s_2_sequence.txt (both live at /nfs/WICMT/pub/scripts/test/) to /lab/solexa_public/Genome/RefGenome_Eland_mm8/ genome and output results to /nfs/WICMT/pub/scripts/test/ with result files s_1_sequence.txt have a prefix "1_TEST" and result files s_2_sequence.txt have a prefix "3_TEST"

1,INPUTFILE,s_1_sequence.txt
3,INPUTFILE,s_2_sequence.txt
13,INPUTDIR,/nfs/WICMT/pub/scripts/test/
13,OUTPUTDIR,/nfs/WICMT/pub/scripts/test/
13,GENOME,/lab/solexa_public/Genome/RefGenome_Eland_mm8/
13,UNIQUENAME,TEST

Example 3

Align s_1_sequence.txt (lives at /nfs/WICMT/pub/scripts/test1/) to /lab/solexa_public/Genome/RefGenome_Eland_mm8/ genome and output results to /nfs/WICMT/pub/scripts/test1/ with result files s_1_sequence.txt have a prefix "1_TEST"

Align s_2_sequence.txt (lives at /nfs/WICMT/pub/scripts/test2/) to /lab/solexa_public/Genome/RefGenome_Eland_hg18/ genome and output results to /nfs/WICMT/pub/scripts/test2/ with result files s_2_sequence.txt have a prefix "3_TEST"

1,INPUTFILE,s_1_sequence.txt
3,INPUTFILE,s_2_sequence.txt
1,INPUTDIR,/nfs/WICMT/pub/scripts/test1/
3,INPUTDIR,/nfs/WICMT/pub/scripts/test2/
1,OUTPUTDIR,/nfs/WICMT/pub/scripts/test1/
3,OUTPUTDIR,/nfs/WICMT/pub/scripts/test2/
1,GENOME,/lab/solexa_public/Genome/RefGenome_Eland_mm8/
3,GENOME,/lab/solexa_public/Genome/RefGenome_Eland_hg18/
1,UNIQUENAME,TEST
3,UNIQUENAME,TEST

Example 4

Align s_1_sequence.txt (lives at /nfs/WICMT/pub/scripts/test1/) to /lab/solexa_public/Genome/RefGenome_Eland_mm8/ genome AND to /lab/solexa_public/Genome/RefGenome_Eland_hg18/ genome AND output results from mm8 alignments to /nfs/WICMT/pub/scripts/test1/ with result files have a prefix "1_TEST" AND output results from hg18 alignments to /nfs/WICMT/pub/scripts/test1/ with result files have a prefix "3_TEST"


1,INPUTFILE,s_1_sequence.txt
1,INPUTDIR,/nfs/WICMT/pub/scripts/test1/
1,OUTPUTDIR,/nfs/WICMT/pub/scripts/test1/
1,GENOME,/lab/solexa_public/Genome/RefGenome_Eland_mm8/
3,GENOME,/lab/solexa_public/Genome/RefGenome_Eland_hg18/
1,UNIQUENAME,TEST
3,UNIQUENAME,TEST