table of contents
- expected learning outcomes
- getting started
- exercise 1: basic functions in Jalview
- exercise 2: comparison of two different alignment programs (ClustalW2 and MAFFT) using nucleotide sequences
- exercise 3: comparison of two different alignment programs (ClustalW2 and MAFFT) using protein sequences
- exercise 4: exploring the MAFFT settings
- exercise 5: exploring other features in Jalview
expected learning outcomes
The objective of this activity is to become familiar with the features of several multiple alignment and alignment visualization programs, including data input and output, basic visualization and editing functions, alignment options, and differences between nucleotide and amino acid alignments.
getting started
While a large number of alignment programs have been developed, we are going to focus on two of them: ClustalW2 and MAFFT. ClustalW2 (and its graphical user interface version ClustalX2) is the latest version of a very popular alignment program, and MAFFT is, by several measures, generally the best performing alignment program.
For visualization we will use the program Jalview, from which you can invoke these two alignment programs (as well as others). You can access help files at any time within the program by clicking on 'Help'>'Documentation'. There is also a program tutorial available at: http://www.jalview.org/tutorial/TheJalviewTutorial_screen.pdf.
This activity is structured to be done either by yourself or with a partner.
- Links to the three data sets for this activity can be found below. Download them and make a note of where you save them on your computer:
- Atg5.fasta: amino acid sequences of the autophagy-specific gene 5 from several Drosophila species
- 1ped.fasta: nucleotide sequences of alcohol dehydrogenase from a variety of organisms; modified from BAliBASE.
- 1ped_lg_mafft.fasta: nucleotide sequences of alcohol dehydrogenase from a variety of organisms; modified from BAliBASE, with multiple redundant sequences.
- Start Jalview
exercise 1: basic functions in Jalview
- Close the example file windows to simplify viewing and switch to the Jalview window.
- Load the data set Atg5.fasta by going to 'File' > 'Input alignment' > 'from File'. Have a look at the data. Is it aligned?
- Try some of the basic commands:
- To select a taxon, click on any taxon name on the left side.
- To select all sequences at once, on Macs you can type Command-A. On Windows or Linux, you can use Control-A.
- To deselect all sequences at once, go to 'Select'>'Deselect All'.
- To move selected sequences to another point in the data set, highlight a sequence, and then press the arrow keys to move the sequence.
- Sequences can be edited manually:
- Left click and drag to select where you want to begin editing.
- Right click the highlighted sequence and then select 'Selection'>'Edit'>'Edit Sequence'.
- Enter the characters you wish to insert, or insert a space for a gap in the data set.
- You can undo changes by going to 'Edit'>'Undo'.
- Close the file with or without saving.
exercise 2: comparison of two different alignment programs (ClustalW2 and MAFFT) using nucleotide sequences
If you would like to work with a partner, designate one partner A, and the other partner B.
Both A and B:
- Open data set 1ped.fasta in Jalview.
A only:
- Select the 1ped.fasta window.
- Perform a basic alignment with ClustalW2 by clicking 'Web Service' > 'Alignment' > 'ClustalW Multiple Sequence Alignment'.
- Once the alignment process is completed, a new window with the aligned data will appear, along with a graph below the alignment. This consensus graph shows the percentage of agreement between bases in each column of the alignment.
B only:
- Select the 1ped.fasta window
- Perform a basic alignment with MAFFT by clicking 'Web Service' > 'Alignment' > 'Mafft Multiple Sequence Alignment'.
- Once the alignment process is completed, a new window with the aligned data will appear.
Both A and B:
- Compare the alignments resulting from A steps and B steps. Are they different? Which one do you prefer, the MAFFT or the ClustalW2 alignment? Why? (Hint: these are protein coding genes).
- Export the nucleotide alignments in FASTA format by clicking File > Save As.... Choose a new filename, and FASTA as the file format in the opening dialog box. Do not close the alignment window.
- Build 2 trees, one from each of your nucleotide alignments: Go to your aligned nucleotide sequences window (for both MAFFT and ClustalW2 alignments) and click on 'Calculate' > 'Calculate Tree' > 'Neighbor Joining Using % Identity (Note: these trees are great for helping to evaluate your alignments, but this program should not be your sole tree building method).
- Compare the trees from both the MAFFT and ClustalW2 alignments. Do the topologies and/or branch lengths differ?
exercise 3: comparison of two different alignment programs (ClustalW2 and MAFFT) using protein sequences
A only:
- Find the original 1ped.fasta window.
- Click 'Calculate' > 'translate cDNA'
- Click 'Web Service' > 'Alignment' > 'ClustalW Multiple Sequence Alignment'. Notice that two new graphs appear along with the alignment: conservation and quality. Conservation measures the number of changes in the physio-chemical properties of the amino acids in any given column of the alignment. Quality is a score that measures the likelihood of changes in each column, given the substitution matrix used to calculate the alignment. For more detail, click on: 'Help'>'Documentation'>'Alignment Annotations'.
- Save the alignment file (click File > Save As...).
- Build a tree out using your nucleotide alignment by selecting 'Calculate' > 'Calculate Tree' > 'Neighbor Joining using BLOSUM62'
B only:
- Find the original 1ped.fasta window.
- Click 'Calculate' > 'translate cDNA'
- Click 'Web Service' > 'Alignment' > 'Mafft Multiple Sequence Alignment'.
- Save the alignment file (click File > Save As...).
- Build a tree out using your nucleotide alignment by selecting 'Calculate' > 'Calculate Tree' > 'Neighbor Joining using BLOSUM62'
Both A and B:
- Compare amino acid alignments and trees. Which one do you prefer? Does it make sense to align protein-coding sequences using the protein translation, or should you instead build alignments from nucleotide sequences?
exercise 4: exploring the MAFFT settings
We will now run MAFFT from a different online service where we can more easily modify its settings. Specifically, we will assess how setting a high versus low gap opening penalty will affect the alignment, and then you may examine some of the other gap-related settings.
A only:
- Find the original 1ped.fasta window.
- Click 'Web Service'>'JABAWS Alignment'>'http://compbio.dundee.ac.uk/jabaws'>'MafftWS'>'Edit Settings and run..'
- In the window that comes up, set 'Gap Opening Penalty' to 20 (scroll down under "Parameters")
B only:
- Find the original 1ped.fasta window.
- Click 'Web Service'>'JABAWS Alignment'>'http://compbio.dundee.ac.uk/jabaws'>'MafftWS'>'Edit Settings and run..'
- In the window that comes up, set 'Gap Opening Penalty' to 0.1 (scroll down under "Parameters")
Both A and B:
- Compare this alignment to the default one, and compare the alignments with increased (A) and decreased (B) gap penalties to each other. Which one of the three alignments do you prefer, and why?
- Optional: Try to set and combine other gap parameters (ep, lep, lop... see the MAFFT manual for details) and compare results.
exercise 5: exploring other features in Jalview
Spend some time trying out some of the other features in Jalview, following the activities below. You can learn about additional features and functionality of Jalview in the program manual and tutorial at: http://www.jalview.org/tutorial/TheJalviewTutorial_screen.pdf.
Removing redundant sequences:
Often, you will need to identify and remove redundant sequences prior to running your analyses.
- Open 1ped_lg_mafft.fasta in Jalview.
- Click on 'Edit' > 'Remove Redundancy'.
- Select your redundancy threshold (e.g.: >98%) and adjust the bar to that value (this will highlight the redundant sequences in black). click 'remove'.
Loading sequences from a public database:
Jalview can search the EMBL, PDB, PFAM, and Uniprot databases and load sequences so that you may align and analyze them. Try searching through one of these databases and finding sequences you are interested in working with. Make a note of the accession numbers and enter them in step 4, or use the numbers provided below.
- Close all alignments you have open in Jalview (save them first if you wish). Note that if you have an alignment open, the sequences will automatically be added to it.
- Click on 'File' > 'Fetch Sequence(s)'.
- Click on 'EMBL' in the drop-down menu (or if you chose to search a different database, select that one here).
- Enter: X53828; X53829; X53930; X5831 (or the numbers of the sequences you found yourself) in the box and click 'OK'.
- You can then save these sequences in a variety of different formats to analyze as you wish.
Saving alignments as graphics:
You may need an image of your alignment for publication. Jalview will allow you to save one in HTML, EPS or PNG format.
- Open any of the alignments you have worked with so far.
- To wrap the alignment on the page, click on 'Format' > 'Wrap'.
- Click on 'File'>'Export'>'HTML' and enter a name for your new image.