Tutorial for simple Blastp search for finding homologs in all species

This search is essentially as the previous one ( question 7), except that we search not only among human protein sequences, but all sequences in the database. Here, however, you may get 100s of hits, and it is much more difficult to evaluate the data. Hence, we ask a simpler question for the report (see below)

  1. Go to the NCBI Blast search page (Make sure that the blastp tab is selected in the tab heading).
  2. Copy the fasta version of your protein sequence into the window under "Enter Query Sequence".
  3. From the "Database" pull-down tab under "Choose Search Set", select "UniProtKB/Swiss-Prot
  4. Skip the selection of "Organisms"
  5. Under "General parameters", set the "Max target sequences" to 250.
  6. Further down on the page under "Scoring Parameters", choose the "Matrix" option "Blosum80". (This is a scoring matrix for the search that is most suitable when we are searching for closely related sequences).
  7. Then hit the blue Blast button.

The search might take quite a while, so be very patient. Once the results page appears, inspect the different sections as you scroll down.

  1. At the top you find a brief summary of the search
  2. Then follows a graphic summary. The upper part shows matches to know protein domains and superfamilies. This information can be very useful, and you can click on the icons and learn more about them.
  3. Next follows a graphical representation of the list of hits from the search. Red lines shows the most closely related sequences (including the sequence you searched with). Then follows shorter hits with lower search scores.
  4. Scroll further down till you come to the list of Descriptions. Here you will see the protein names, and Max score, Total score, Querey cover, E-value, and percent identity.
  5. We will consider only sequences that have E-values lower than 1e-10 as significant.
  6. Scroll further down till you get to the "Alignments" section. The first hit is to it self and is 100% identical. Skip this one and move on to the next hits with E<1e-10. Note the percentage identity and inspect the alignment. Do you see a large number of matching residues? Note also the full name/description text for each entry and check that it relates to the sequence you searched with.
  7. For the report, list the entries that you judge to be true homologs of your enzyme and which come from the following model organisms: zebrafish (Danio rerio); nematode (Caenorhabditis elegant); Fruit fly (Drosophila melanogaster); Yeast (Saccharomyces cereviciae).

