|
|
|
|
|
|
|
|
Last updated:
Sept 25, 2001
|
|
 |
|
|
GEN 240B, Spring 2002: Sequence Comparisons and Genomics Databases by Close, Jiang, Lonardi, Swanson (due June 3rd, 2002)
| We will perform similarity searches using BLAST, FASTA as hands-on tools. We will also briefly discuss Smith-Waterman
algorithm as a potential similarity search tool.
FASTA Protein Similarity Search
Link to the Fasta3 search engine at the European Bioinformatics Institute.
You may check out the links to the "Help" and "Tool" screens.
Your e-mail address and search title for the sequence are optional entries. Enter "Murine IL-7 Receptor" as your search
title. Choose "interactive" as the option for results although you can get them by e-mail.
Change the scoring matrix to Blosum62 since this matrix has been shown to detect most protein similarities when the query
sequence is long. (The murine IL-7 receptor is 459 amino acids long.)
In order to limit the number of hits (similar sequences) in this search, change the number of scores to 30 and the
alignments to 10. You may get a histogram of the results by changing the "HIST" drop-down menu to "yes". Leave the other
parameters unchanged with the default values. We will search the default database, "swall", which is the Swiss-Prot
non-redundant database combined with Trembl and TremblNew (Trembl = Translated EMBL and TremblNew = New sequences in Trembl).
Copy and paste the murine IL-7 receptor (IL-7R) sequence from this text file. The
input sequence can be in any format.
Click the "Run Fasta3" button for the search results.
You may view the same results as a graphical output by clicking on the "VisualFasta" button from the "Results of Search"
screen.
Interpretation of the results:
In general, one selects sequence similarities with E() value < 0.02 as statistically significant matches.
As expected, notice that the murine IL-7R sequences in the database (Accession numbers Q9R0C1, P16872) show the best
similarities to the query sequence (with the highest opt and z-scores). Only two other sequences corresponding to the
human IL-7R gene (Acc. #'s P16871, Q9UPC1) show fairly high opt and z-scores. Even the human protein isoforms (Acc. #'s
P16871-02, P16871_01) with some identical residues to the query sequence have lower opt and z scores.
Try to change the parameters to see how they affect the results
BLAST Protein Similarity Search
Connect to the BLAST site at NCBI. Click the link to the
"Standard protein-protein BLAST [blastp]" page. Familiarize yourself with the various features of the site.
Choose "nr" for the non-redundant database to search.
As with the FASTA exercise above, copy and paste the murine IL-7 receptor (IL-7R) sequence from
this text file into the large data entry field. This sequence is already in the FASTA
format.
Limit the number of hits to a manageable size by changing the Expect value to 1 from the default value of 10. You may
also restrict the number of hits returned by decreasing the number of Descriptions and Alignments returned.
Use the default Blosum62 scoring matrix as selected at the bottom of the page.
Click on the "Search" button to perform the similarity search immediately. On the Blast CGI screen that shows up next,
view the results by pressing the "Format results" button. You may also check for any conserved domains between your
sequence and the database sequences. You may wish to get the BLAST results by e-mail by providing your e-mail address on
the BLAST search screen.
Try to change the parameters to see how they affect the results
Smith-Waterman Algorithm
Connect to the
Bioccelerator site at EMBL to use the
Smith-Waterman algorithm for sequence similarity searches. You have to figure out how to use the site and how to interpret the
results. Try to change the parameters to see how they affect the results. Notes:
This search tool is a rigorously mathematical, dynamic programming algorithm that uses iterative calculation of similarity in
matrix cells (pairwise comparisons between the query and database sequences).
Very computationally intensive and may take longer times for similarity searches.
Comparison of Fasta, BLAST and Smith-Waterman Search Results
Check for database sequences that have been pulled out as common hits by the three search algorithms. How do these sequences
common to all searches show up in the graphical alignment figures from BLAST, FASTA and SW?
How do the statistical significance score for common sequences compare between the three programs?
Compare the interface in terms of accessibility, parameters, documentation and visualization capabilities
Submit by June 3rd
Send your report by email to Stefano Lonardi or drop it at his office (SURGE 320)
|
|
|