Sequence Searching and Multiple Sequence Alignments

Pål Puntervoll

Based on previous exercise by Rein Aasland & Hans-Petter Kleppen

Last updated: 16-OCT-2009

In this exercise you will work on the cyclin protein family. Cyclins are regulatory proteins that bind to and regulate cyclin-dependent kinases. They play an important role during cell cycle progression.

Note: Part I of this exercise will be completed while in the PC lab. There will be no report to hand in. Part II is home work.


PC lab

Sequence Searching and Pairwise Alignments

The purpose of this part of the exercise is to get acquainted with normal Blast and the sensitive variant of Blast: PSI-Blast. We will also do a pairwise alignment using the Smith-Waterman algorithm.


Search for Human Cyclin-A1 Orthologs Using Blast (blastp)

Use the ExPASy blast server:

Expasy blast annotated.png

  1. Enter the UniProt identifier for Cyclin-A1: CCNA1_HUMAN, or paste in the sequence.
  2. Restrict the search to human sequences only.
  3. Search only in the SwissProt part of UniProt and exclude fragments.

Inspect the results.

Q1
Count the number of sequences that are annotated as cyclins. Do not count splice variants (splice variants have the database prefix sp_vs). Note identifier, accession number and E-value for the poorest scoring cyclin sequence.

Before we move on, save all the sequences with an E-value that is less than e-10 in FASTA format from the initial search (suggested file name: cyclin_blast_human.fasta):

Expasy blast select.png

  1. Click on the SELECT UP TO... button. Then mark the last sequence you want to include.
  2. Select 'Retreive sequences (FASTA format)', and click on the SUBMIT QUERY button.
Note: Exclude the query sequence and all splice variants. 

This file of sequences will be used in the next part of this exercise.

The Effect of Scoring Matrices

ExPASy Blast automatically chooses scoring matrix based on the sequence length (see ExPASy help). In the case of CCNA1 HUMAN, BLOSUM62 was chosen. Repeat the search using BLOSUM80 and BLOSUM45.

Q2
Count the number of cyclin sequences found in the new searches. Is it different from the initial search? What scoring matrix do you think is the most appropriate to use here? Briefly justify your answer.

Inspect the alignment of the query sequence and the poorest scoring cyclin from the initial search (BLOSUM62). Make note of the start and end positions for each sequence. Now, investigate the two sequences using SMART or Pfam. Note the start and end positions of the domains that are reported.

Q3
What is the correspondence between the parts of the sequences that were aligned and the domains reported by SMART or Pfam?

Smith-Waterman Pairwise Alignment

Align the two sequence by performing a pairwise alignment using the Smith-Waterman algorithm. Use the server at EBI. Remember to choose local alignment. Note the start and end positions for each sequence.

Q4
What is the correspondence between the parts of the sequences that were aligned using Smith-Waterman and the domains reported by SMART or Pfam? How do you explain the difference to the results obtained with Blast?

Perform a Sensitive PSI-Blast Search to Identify Remote Homologues of Cyclins

Position-Specific Iterative Blast or PSI-Blast is a more sensitive version of Blast. The first round of a PSI-Blast search (iteration 1) is a normal Blast search. The results from this search are aligned and turned into a profile. The next search round (iteration 2) is performed using the generated profile. In an iterative manner new hits can be found and included in the profile for further searches.

Perform the first iteration of a PSI-Blast search with CCNA1_HUMAN, again restricting the search to SwissProt and human sequences:

Ncbi blast annotated.png

  1. Paste in the sequence or the accession number. (The identifier will not work here.)
  2. Choose the right database.
  3. Restrict to human sequences.
Q5
How do these results compare to the results obtained in the normal Blast search (in part Search for Human Cyclin-A1 Orthologs Using Blast (blastp))?

Perform the second iteration. Include only the sequences selected by default.

Q6
Now, what is the best scoring new apparent non-cyclin sequence (different from the one you identified in part Search for Human Cyclin-A1 Orthologs Using Blast (blastp))? Note identifier, accession number and E-value.

Investigate the apparent non-cyclin sequence using SMART or Pfam. Does the domain analysis suggest why this sequence was picked up in the PSI-Blast search?

Often, to verify that sequences obtained with PSI-Blast searches really are related to the query sequence, reciprocal searches are performed. The purpose of reciprocal searches is to find back to the query sequence by starting with one (or more) of the hit sequences.

Perform a new PSI-Blast search using the sequence from Q6. In this case, search in the nr database, still restricted to only human sequences.

Q7
How many iterations are needed to pick up the original query sequence (CCNA1_HUMAN)?

Multiple Sequence Alignments

Now that you have explored the relationship of Cyclins to other proteins by performing Blast searches, we will look more closely at the Cyclin family of sequences by performing multiple sequence analysis.


Basic Multiple Sequence Alignment Using ClustalX and Alignment Visualisation with Jalview

Make two multiple alignments of the human Cyclin sequences you stored in part Search for Human Cyclin-A1 Orthologs Using Blast (blastp).

1
Use ClustalX with default settings: ALIGNMENT>DO COMPLETE ALIGNMENT [suggested file name: cyclin_blast_human_basic.aln].
2
Select all sequences (using the mouse over the names) and remove all gaps using EDIT >REMOVE ALL GAPS from the top menu. Use the ALIGNMENT>ALIGNMENT PARAMETERS>MULTIPLE ALIGNMENT PARAMETERS to change the gap opening penalty from 10 to 20. Perform alignment as above [suggested file name: cyclin_blast_human_gap20.aln].

Jalview is a multiple sequence alignment editor that has many useful features. We will use it to compare the two alignments.

Note: You can start Jalview directly from the Jalview web site. Go to the download section, and click on the big blue INSTALL WITH JAVA WEBSTART button. You can answer yes to the repeated questions about file extensions. 

Open the two alignments (cyclin_blast_human_basic.aln and cyclin_blast_human_gap20.aln) in Jalview. Colour the alignments according to your taste (we recommend the ClustalX colour scheme): COLOUR>CLUSTALX. Also colour the alignment by conservation: COLOUR>BY CONSERVATION (use the default threshold).

Q8
Compare the two alignments. In what region do you observe differences? Which alignment looks most convincing?

To aid in the comparison of the alignments, we can use Jalview to display sequence features on top of the sequences in the alignments: open the Sequence Feature Settings window: VIEW>FEATURE SETTINGS... . Click on the DAS SETTINGS tab, select InterPro in the list of DAS services, and click on the FETCH DAS FEATURES button. When all features have been fetched, click on the FEATURE SETTINGS tab. By default all features are displayed. Click on the INVERT SELECTION button to deselect everything. Then, check SMART domain and Pfam-A. Click on the OPTIMISE ORDER button. Return to the alignment window, and observe the effect of the feature visualisation.

Q9
Note the positions of the first and last identical positions. Do the domains annotated by SMART or Pfam extend beyond these positions? Which domain appear to be the most conserved?
Q10
SMART seems to be less sensitive than Pfam. Can you find an explanation for this?

A last criterion we will use to compare the two alignments is the alignment of secondary structure elements. Opening the Sequence Feature Settings window again: VIEW>FEATURE SETTINGS... . Click on the DAS SETTINGS tab, select UniProt, and click on the FETCH DAS FEATURES button. When all features have been fetched, click on the FEATURE SETTINGS tab. Select only secondary structure elements (helix, strand and turn). Return to the alignment window.

Q11
Which sequences have secondary structure elements annotated?

Some of the alpha-helices in the first alignment (cyclin_blast_human_basic.aln) are disrupted by gaps.

Q12
Note start and end positions of the alpha-helices of CCNA2_HUMAN that are disrupted. (Use the mouse pointer to identify the alpha-helices.)

Do the same analysis with the second alignment (cyclin_blast_human_gap20.aln).

Q13
Are there any differences?

Observe the alignment of secondary structure elements in the two alignments.

Q14
Based on these observations, is it possible to state the one of the alignments is more correct than the other?

Home work

Advanced Multiple Sequence Alignments and Construction of Trees

In part Basic Multiple Sequence Alignment Using ClustalX and Alignment Visualisation with Jalview2.1 we observed that the alignment parameters in ClustalX can be tuned to improve an alignment. In this part of the exercice we will look at two different modes of use that in some cases may produce alignments of better quality.

Using Profile Alignment in ClustalX

Before we start using the profile alignment mode in ClustalX, we will take a short detour. You may (or may not) have noticed from inspecting the alignments that there seem to exist two distinct groups of human cyclins. To investigate this further, we will expand our analysis to mammalian cyclins.

Repeat the Blast search you did in part Search for Human Cyclin-A1 Orthologs Using Blast (blastp), except this time limit the search to mammalian sequences (Mammalia) instead of only human ones. As before, save all the sequences with an E-value that is less than e-10 in FASTA format [suggested file name: cyclin_blast_mammalia.fasta].

Align the sequences with ClustalX using default settings. Next, you will build a tree using the Nearest-Joining functionality of ClustalX. We will exclude positions with gaps: TREES>EXCLUDE POSITIONS WITH GAPS. Build the tree: TREES>BOOTSTRAP N-J TREE. If you used the file names suggested by ClustalX, this should produce a file called cyclin_blast_mammalia.phb.

Open the cyclin_blast_mammalia.phb file in the tree drawing program NJplot. To inspect the quality of the tree, check the box next to BOOTSTRAP VALUES.

Q15
What is the largest group of apparent orthologs in the tree? What are the paralogues of CCNA1_HUMAN?

The tree seems to confirm that it does make sense to group the cyclins into two major groups.

Note: Some sequences may fall outside the major groups. In the following, add them to one of the major groups. 

Use the tree to guide the splitting of the cyclin_blast_human.fasta file (from part Search for Human Cyclin-A1 Orthologs Using Blast (blastp)) into two files [suggested names: cyclin_blast_human_1.fasta and cyclin_blast_human_2.fasta]. Use a text editor such as Word or NotePad for this. The files must be saved as normal text files. Align each of the two new files using ClustalX (default settings).

Now, we finally come back to the profile alignment mode in ClustalX. Restart ClustalX and change from 'MULTIPLE ALIGNMENT MODE' to 'PROFILE MODE' (upper left corner). Use FILE>LOAD PROFILE 1 to load the the first alignment (cyclin_blast_human_1.aln), and FILE>LOAD PROFILE 2 to load the the second alignment (cyclin_blast_human_2.aln). Perform the alignment: ALIGNMENT>ALIGN PROFILE 2 TO PROFILE 1 [suggested name: cyclin_blast_human_profile.aln]. Restart ClustalX and open the new alignment.

Compare the two alignments: the one with gap open penalty 20 (cyclin_blast_human_gap20.aln) and the profile based one (cyclin_blast_human_profile.aln) using Jalview as above.

Q16
Again--by observing differences is conserved positions and alignment of secondary structure elements--which alignment seems to be better?

Using Structure Masks in ClustalX

Clustal_X allows you to perform multiple alignments where gap penalties are increased in sequences with known secondary structure elements (read about structure masks). This procedure will often give superior quality alignments.

Copy the sequence of the cyclin A2 which contains the structure mask (based on the structure pdb:1jsu) to a plain text file, and name it: 1jsu-str-clw.txt:

CLUSTAL X (1.81) multiple sequence alignment


!SS_1JSU        ...AAaaAAAAAAAAAAAAa......aAa......aAAAAAAAAAAAAAAa.....aAAA
1JSU            NEVPDYHEDIHTYLREMEVKCKPKVGYMKKQPDITNSMRAILVDWLVEVGEEYKLQNETL 60
                ************************************************************

!SS_1JSU        AAAAAAAAAAa......aAAAAAAAAAAAAAAAAAa......aAAAAAa......aAAAA
1JSU            HLAVNYIDRFLSSMSVLRGKLQLVGTAAMLLASKFEEIYPPEVAEFVYITDDTYTKKQVL 120
                ************************************************************

!SS_1JSU        AAAAAAAAa.........aAAAAAAAa.......aAAAAAAAAAAAAAAa.aAAa....a
1JSU            RMEHLVLKVLTFDLAAPTVNQFLTQYFLHQQPANCKVESLAMFLGELSLIDADPYLKYLP 180
                ************************************************************

!SS_1JSU        AAAAAAAAAAAAAAAa.....aAAAAAa...aAAaaAAAAAAAAAAAaaAAa...aAAAa
1JSU            SVIAGAAFHLALYTVTGQSWPESLIRKTGYTLESLKPCLMDLHQTYLKAPQHAQQSIREK 240
                ************************************************************

!SS_1JSU        ...aAa..aAa........ 
1JSU            YKNSKYHGVSLLNPPETLNL 260
                ********************

Note that this file is in Clustal format.


Start ClustalX and change from 'MULTIPLE ALIGNMENT MODE' to 'PROFILE MODE' as above. Use FILE>LOAD PROFILE 1 to load the structure mask file (1jsu-str-clw.txt). Accept to use structure mask. Use FILE > LOAD PROFILE 2 to load the cyclin sequence file you obtained in part Search for Human Cyclin-A1 Orthologs Using Blast (blastp) (cyclin_blast_human.fasta). Perform the alignment: ALIGNMENT>ALIGN SEQUENCES TO PROFILE 1 [suggested name: cyclin_blast_human_str.aln].

Select LOCK SCROLL at the top of the window and scroll along the alignment. You will now see the secondary structure mask above the alignment. Note that alpha-helices are not interrupted by gaps.

Compare the two alignments: the one with gap open penalty 20 (cyclin_blast_human_gap20.aln) and the one with structure mask (cyclin_blast_human_str.aln) using Jalview as above.

Q15
Yet again--by observing differences is conserved positions and alignment of secondary structure elements--which alignment seems to be better?
Note: The final PC-lab session will continue with the structural aspects of cyclins. Since you may want to reinspect some of the files from this exercise, we advise you to keep all files.