Overall general procedure for creating a phylogenetic tree:
ASSEMBLY OF CONTIGS
1. A very easy program to use is DNA Baser. Watch the video tutorial:
http://www.dnabaser.com/index_2minutes.html and download it here:
2. Open DNA baser software.
3. For vector removal [primer removal]:
>Go to ‘Tools’.
>Select ‘vector cleaning’.
>Enter name and recognition sequence of your vectors in provided spaces.
>Then select ‘cut recognition sequence’
4. To assembly contig:
>In “All supported files” select the type of files you want to work with (e.g. FASTA). Next to “All supported files” select the folder in your drive that contains the files you want to analyze.
>Add the files you want to assemble by selecting them and by adding them to the job list. This is done by selecting them and clicking on the add files button (“+”).
>To start the assembly just click the forward button (”).
>Contig is ready. DNABaser removes low quality ends and vectors sequences (if you specify them, like explained before).
>In case you want to edit a base you can double click it and enter the new value (letter).
>To navigate to next ambiguity press button “Stop Next”.
>To finish the contig click “Finish” and the contig is ready.
5. When all your sequences are assembled, you can put all your contigs in the same Word document.
NOTE: Batch assembly or align groups of sequences by name is also possible in DNA Baser (it will automatically recognize sequence pairs). Procedure:
>Select the folder with the raw sequences.
>Choose the pattern for batch assembly.
>Press ‘start batch’.
SEARCHING SEQUENCE DATABASES
1. BLASTing your sequences:
Note a: Make sure that in your Word document is just the sequence name/code in FASTA format (“>Sequence Name”) and not:” > Contig – Sample Name- Generated with DNA Baser”.
Note b: Blast can’t handle many sequences at the same time, therefore, if you have a big number of sequences consider in blasting a few at each time. For example, ten sequences is a good number.
Each time after obtaining the BLAST, check if the sequence is in the right orientation. If the sequence is not in the right orientation (aligned database sequence runs from high to low number), you can use a program liketo reverse it. (MEGA webpage with free download: http://www.megasoftware.net/):
>Open MEGA 4 program.
>Go to ‘Alignment’ and select ‘Alignment explorer/CLUSTAL’ then select ‘Create new alignment’ then reply ‘Yes’ to the question ‘are you building a DNA (yes) sequence alignment or protein (no)’. A window will open.
>Go to ‘Edit’. And select ‘insert sequence from file’.
>Open the FASTA file that contains your sequence and then go to ‘Data’ and select ‘reverse complement’.
>To save the new reversed sequence: Go to ‘Data’ and select ‘Export Alignment’ and next FASTA (if you want to save it as a FASTA file).
>Alternatively: you can also copy paste your sequence instead of opening it from a file, but for that you have to select ‘insert blank sequence’ instead of ‘insert sequence from file’ and then copy and paste it (by using “Edit>Paste”).
3. Blast results: for each sequence you BLAST, you can choose the closest hit and the closest known species. Select the FASTA format for each blast result you selected (Display settings>FASTA). Send them to the Clipboard (on the BLAST webpage: ‘Send to>Clipboard’), later to be copied to a file of yours.
4. Older protocol: A second option is also to try the Ribossomal Database Project:
http://rdp.cme.msu.edu/index.jsp (for Bacteria and Archaea)
5. Check all sequences for the possibility of chimeras: use the older version 8.1:
http://126.96.36.199/html/analyses.html (Chimera check)
From the result, check the following:
a. Is it in the right orientation (5’ -> 3’)? If not, note this and later on correct this (click the sequence in Bioedit, and change via ‘Sequence’ ‘Nucleotide’ ‘Reversecomplement’)
b. Are there any inserts/deletions of your sequence vs. the closest hit? If so, check the original sequence chromatogram to see if these inserts/deletions are real. Correct the text file, if required.
ASSEMBLING TEXT FILES FOR DETAILED PHYLOGENETIC ANALYSIS
1. Put the corrected text files into one large word document. compare, why and how (many)) observed in Genbank/BLAST search in a word document (suggest initially to make a different document than prepared under 1, later combine them):
a. click on the accession number in the BLAST search, this will bring you to Pubmed-nucleotide.
b. Mark the sequence!!!. Select Display ‘Fasta’ send to ‘clipboard’ (if you want to select more than 1 sequence) or ‘text’. In the latter case, copy the text file and put it in a Word document. When you store in the clipboard, when you are finished selecting all sequences and putting them in ‘clipboard’, select ‘clipboard’ and proceed as for ‘text’.
c. Include also one sequence from genbank for the ‘outgroup’, e.g. E.coli for Archaea, or a alpha-proteobacteria when you compare just betaproteobacteria.
3. Save the Word documents as text file(s).
Alignments and Trees via MEGA
How to create a tree in MEGA (MEGA webpage and free download:
Alignment in MEGA:
1. Put all the sequences you want to use in the same FASTA file (your sequences and Blast hits).
2. In MEGA go to: ‘Alignment>Alignment Explorer/CLUSTAL and then select ‘Retrieve sequences from a file’.
3. You can also copy paste the sequences instead of opening them from a file, but for that you have to select ‘Create a new alignment’ and then go to ‘Edit’ and select ‘insert blank sequence’ and then copy and paste them (by using “Edit>Paste”).
4. Select all the sequences by clicking on them (using ‘up arrowyou’re your keyboard and your mouse – they will be highlighted in dark blue) then select ‘Alignment>Align by ClustalW’>Ok.
5. Visual inspect if the alignment seems to make sense.
6. After the alignment is done, make sure you trim the alignment by deleting a part of the beginning and a part of the end of the aligned sequences in a way that all sequences will have the same size. This can be done using your computer mouse for selecting and right clicking the mouse and selecting ‘delete’. After this, all aligned sequences will have the same size. Save session, also keep the old file(s).
7. Export alignment to MEGA format: go to ‘Data’ and select ‘export alignment>MEGA format’.
8. Open MEGA file by right clicking on the file (in the folder were you saved it) with MEGA program.
Tree in MEGA:
1. Go to ‘Distances>Choose model> in Nucleotide: ‘Jukes and Cantor’ OK’
2. Go to Phylogeny>Bootstrap Test of Phylogeny>Neighbour Joining. In the open window check if: Bootstrap is 100 replicates and check if Model> Nucleotide is ‘Jukes-Cantor’ and Gaps/Missing Data ‘Pairwise deletion’ (if ‘Complete deletion’ (that removes all gaps) doesn’t work) Compute you get your TREE.
[older protocol] Alignments in Bioedit and trees in Treecon
- Open Bioedit, and ‘File’ ‘Open’ to open your txt file with sequences
- Select all sequences you want to compare from the Window (‘shift’ + right click on the names on the left).
- Under ‘Accesory Applications’ select ‘Clustal W Multiple Aligment’.
- Press “Run Clustalw”, first select ‘Output clustal’ if you want to see the consensus.
- Visually check the alignment. If needed, manually correct (via ‘Mode’ ‘Edit’)
- Save the aligment as a .phy file.
Bioedit program can also be used to create a sequence of the same length and starting and ending at the same position:
> input the alignment file
> select the starting sequence point which one wants to remove
>go to edit\select to beginning\delete
> select the ending sequence point which wants to remove
> go to edit\select to end\delete
> select all sequences
> unlock indels
> Degap to remove gaps
>Lock the undels
> Save the file (copy sequence to word, then save as txt file under new name)
> Make a new alignment for this new txt file
Create the phylogenetic tree in TreeCon:
a. Open the alignment file (phy) by TreeCon, and then follow the steps below:
1. Distance estimation
> start distance estimation
> select file as phy format
> nucleic acid sequence & PHYLYP interleaved
> select sequence----OK,
> Jukes and cantor (distance estimation) /all (alignment position)/ yes (bootstrap analysis)/not taken into account (insert and deletion) –OK, 100 for bootstrap samples
2. Inter tree topology
> start inferring topology
> neighor-joining/Yes (bootstrap analysis)
> finished, go to next step
3. Root unrooted trees
> start root unrooted trees
> output options, select single sequence (forced). Select your ‘special sequence’, see
> Yes (bootstrap analysis), OK,,,
4. Draw phylogenetic tree
> open\new tree (default)
> the tree can be modified in this windows.
> show bootstrap values and scale for substitution rate. afterwards, the style of font, name for each sequence can be changed as wished in the following steps:
• in ‘Draw’ phylogenetic tree windows,
• open a tree .trc file
• select customize\sequences names (font or change) or
• select group of sequence for modifying the name of the whole group