Multiple Sequence Alignments
Based on a previous exercise by Rein Aasland, Hans-Petter Kleppen, Angèle Abboud, and Pål Puntervoll
Extensively revised and updated by Rein Aasland on 28-OCT-2016
In the previous exercise you explored the relationship of Cyclins to other proteins by performing Blast searches. In this exercise, we will look more closely at the Cyclin family of sequences by performing multiple sequence analysis.
Basic Multiple Sequence Alignment with Clustal and MAFFT using Jalview
Jalview is a free program for multiple sequence alignment editing, visualisation and analysis. You can start Jalview directly from the Jalview web site by clicking the purple LAUNCH JALVIEW DESKTOP button at the upper right corner. You can answer yes to the repeated questions about file extensions. (See the the quick reference card).
We will perfom two different multiple sequence alignments of the human Cyclin sequences you stored in the previous PC lab session: [Search for Human Cyclin-A1 Orthologs Using Blast (blastp)].
- In Jalview open your “.fasta” file with all the sequences : FILE>INPUT ALIGNMENT> FROM FILE (to find your file highlighted in the list, make sure in the File Format to have “FASTA” selected)
- Use ClustalW with default settings for your first multiple alignment and save the file: WEB SERVICE>ALIGNMENT>CLUSTAL>WITH DEFAULTS [suggested file name: ’cyclin_blast_human_basic.aln'].
- Go back to your first window that contains all your sequences in a fasta format. Perform an other alignment, by this time using MAFFT (WEB SERVICE>ALIGNMENT>MAFFT>WITH DEFAULTS) with default parameters [suggested file name: cyclin_blast_human_mafft.aln]. NB: Note that this part of the exercise has been changed on Oct 27th.
- Colour the alignments according to your taste (we recommend the ClustalX colour scheme): COLOUR>CLUSTALX. Also colour the alignment by conservation: COLOUR>BY CONSERVATION (use the default threshold).
Note that Jalview can colour the amino acid residues with a colouring scheme based on chemical properties, plurality and context. This colouring makes it easier to navigate and evaluate the alignment and it also emphasises conservation in the sequence family.
- Compare the two alignments with Clustal and MAFFT. Focus on the regions that contain the two cyclin domains. Where do you observe major differences? Is one of the alignments more convincing than the other? If so, why?
Note: To get the positions of an amino acid of a sequence, point on the amino acid, and read the position in the status area in the bottom of the window.
- Identify the first and last positions in the alignment which has identical residues in all sequences, and note the residue numbers in the CCNA1_HUMAN sequence. Do the domains annotated by SMART extend beyond these positions? Which domain appears to be the most conserved?
As another criterion we will use to compare the two alignments is how secondary structure elements align.
NB: NEW TEXT HERE !!!!! To aid in the comparison of the alignments, we can use Jalview to display sequence features on top of the sequences in the alignments:
- Select all the sequences in the alignment with Clustal
- Go to the tab WEBSERVICES > Fetch Database > References -> Standard databases.
- Hoover over the sequence name of CCNA1_HUMAN and click on the top cross-reference (>sp|.....)
- Open the Sequence Feature Settings window: VIEW>FEATURE SETTINGS
- Select the feature 'helix', while leaving all the others unselect (you can use 'invert selection' to help get all selected or unselected)
- This will reveal alpha helices in those proteins that have known structure (by crystallography)
- Which sequences have secondary structure elements annotated?
- Some of the alpha-helices in the Clustal alignment (cyclin_blast_human_basic.aln) are disrupted by gaps. Focus on helix 9, 10, and 11.
- Note start and end positions of the alpha-helices of CCNA2_HUMAN that are disrupted. (Use the mouse pointer to identify the alpha-helices.)
- Do the same analysis with the MAFFT alignment (cyclin_blast_human_MAFFT.aln).
- Are there any differences?
- Observe the alignment of secondary structure elements within the two multiple sequence alignments.
- Based on these observations, is it possible to state that one of the alignments is more correct than the other?
Advanced Multiple Sequence Alignments and Construction of Trees
In part Basic Multiple Sequence Alignment Using Clustal and Jalview we observed that the alignment parameters in ClustalW can be tuned to improve an alignment. In this part of the exercise we will look at two different modes of use that in some cases may produce alignments of better quality. We will also construct a phylogenetic tree based on the alignment.
Using the latest version of Clustal, Clustal Omega
- Go under Jalview and use the window with the original file of (unaligned) human cyclin sequences as an input.
- Align as before using the "Web Service" but this time choose the ClustalO, with defaults settings. [suggested file name:cyclin_blast_human_clustalO.aln]
- Choose ClustalX coloring and inspect the alignment. Check if there are any changes to the placement of gaps in second half of the cyclin alignment.
- Compare this alignment to the ones you obtained for question Q1 and Q2.
- Briefly describe the difference between the alignments obtained with Clustal W and Clustal Omega.
Making a phylogenetic tree
You may have noticed from inspecting the alignments that there seem to exist two distinct groups of human cyclins. To investigate this further, we will expand our analysis to mammalian cyclins:
- Repeat the Blast search you did in PC lab exercise 2 Sequence searches, except this time limit the search to mammalian sequences (Mammalia) instead of only human ones. As before, save all the sequences with an E-value that is less than e-10 in FASTA format [suggested file name: cyclin_blast_mammalia.fasta].
- Open this fasta file with Jalview and align the sequences with Clustal Omega using default settings, as you have done it in the previous sections.
- Next, you will build a tree using the Neighbour-Joining functionality under Jalview: CALCULATE>CALCULATE TREE>NEIGHBOUR JOINING USING PAM205
- What are the paralogs of CCNA1_HUMAN? What are the orthologs of CCNA1_HUMAN?
The tree seems to confirm that it does make sense to group the cyclins into two large groups, while some sequences may fall outside these major groups. You can now use your own subjective judgement to find a good way to split the tree and alignment in two large groups (but you don't have to do this here, now).
Note: The final PC-lab session will continue with the structural aspects of cyclins. Since you may want to reinspect some of the files from this exercise, we advise you to keep all files.