California State University Monterey Bay - Biology 241L - KIBAK
   

Are Whales Hippos? - An Introductory Bioinformatics Lab

Molecular Phylogenetics
Bio 241 L Kibak

 

 

Bioinformatics is emerging as a hugely important field affecting all areas of biology.  Even though bioinformatics is formally the application of computer technologies to biological sciences - ranging from automated analysis of microarrays containing thousands of individual experiments to the development of browser tools for looking at whole genomes - students in all areas of biology need to be familiar with software tools developed by bioinformaticians to accomplish routine tasks in biology.

 

 

Skills developed in this lab:
  • Use of National Center for Biotechnology Information (NCBI) databases
  • Retrieval of sequences from NCBI
  • Alignment of homologous protein sequences using ClustalW
  • Using ClustalW output to prepare phylogenetic networks (trees)
  • Testing evolutionary hypotheses
 
tree A Tree B
 

Research Question:

Are whales and dolphins a sister group to Artiodactyls (even-toed ungulates)?  Or should they be placed within the Artiodactyls as a sister group to Hippopotami? In other words, are whales a kind of even-toed ungulate as Hippos are? Or are they only related to even-toed ungulates? For background see pages 559 - 561 of your textbook, "Whale Evolution: A Case History" (In Biological Science 2nd Ed. by Scott Freeman). There is also a more detailed discussion here.

To answer our research question we will build a phylogenetic tree of relatedness using protein sequence data from the National Center for Biotechnology Information (NCBI)

 

 

 

STEP ONE - Obtaining an appropriate Cytochrome b protein sequence for the analysis

 


Search "All Databases" for "Mirounga"


As you can see, there is a vast amount of information cataloged even for this monachine phocid...
...also known as the Elephant Seal.


Try clicking on "PubMed Central: free, full text journal articles."

Here, for example, you will find an important article that should be read by all ESTP and BIO majors.

"Sequential megafaunal collapse in the North Pacific Ocean: An ongoing legacy of industrial whaling?"


Go ahead and refine the search a bit by clicking "Protein" and adding the search modifier for "organism"  like this:

Mirounga [orgn]

That should reduce the number of hits a bit. Adding "cytochrome b" with quotes like this should help a lot:

Mirounga [orgn] "cytochrome b"

Finally, if you add the search modifier for "protein" like this:

Mirounga [orgn] "cytochrome b" [prot]

 ...it should knock it down to about eight hits that include the Cytochrome b sequences for Mirounga leonina and Mirounga angustirostris.

 
   


When collecting sequences for any kind of analysis it is preferable to use a "Ref" sequence if it is available. Here there is one for Mirounga leonina (for more information on RefSeqs see the NCBI Handbook).

Create a folder called "SEQs" somewhere on your hard drive where you can find it again (Perhaps write down the pathname).

Click on YP_778785 and make sure it is Cytochrome b from Mirounga leonina and is 379 amino acids long.

Scan the sequence record. If you ever need the DNA sequence that codes for this protein you can click on the CDS link down near the bottom by the amino acid sequence.

Change the Display from GenPept to FASTA format by clicking on the drag down menu. Then copy and paste the FASTA format sequence to Notepad and save in your SEQs folder as a text file named "Cytb_M_leonina.txt"

You now know how to find and retrieve a sequence from the protein sequence database at NCBI.


 


STEP TWO - Using one sequence to obtain others.

We now have to retrieve about nine more sequences from the database. There is a convenient way to do this quickly.

Go back to the NCBI home page (Google "NCBI" if you have to).

Click on the "Blast" tab at the top of the NCBI home page.

The Basic Local Alignment and Search Tool allows you to search a sequence against a sequence database to find similar sequences. Kind of like "Googling" sequences. It is crude for alignments, and not as sensitive as some other search algorithms, but it is VERY fast.

Select Protein Blast since you will be searching with the Elephant Seal Cytochrome b protein sequence you just saved to your SEQs folder.


 

#1 Copy and paste the Cytochrome b sequence (FASTA format) into the dialog box:


 

#2 Select "Reference proteins" from the drop-down menu, instead of Human or NR (non-redundant).


 

#3 Cytochrome b is pretty much a protein found in all organisms. Since we will be looking at mammals to answer the question where whales belong, it helps to narrow the search to just mammals by typing in "mammalia" here. If you are looking for a specific Cytochrome b, you can type in the Latin name here.


 

#4 Click "BLAST" and wait. This can take a couple of minutes... and you should see a couple of different screens. Be patient, there are thousands of people searching this database right now, and it is run by your tax dollars, not Google :)


 

The resulting screen should look like the one below. You can scroll down to check out the hits and alignments if you want... but to pick out a bunch of sequences for a phylogenetic study, it is VERY convenient use the crude tree of potential taxonomic relationships that BLAST produces under "Taxonomy Reports."





Click "Taxonomy Reports" to cause a page like the one below to display. Here you can easily retrieve sequence data from representative taxa for your analysis. But wait... open a new tab in your browser, because we will save even more time by using preassembled sequences.




STEP THREE - Prepare a multiple alignment of the sequences.

Copy and paste all ten of these sequences at once into the ClustalW Alignment Tool at the Kyoto University Bioinformatics Center in Kyoto, Japan. Then select "Execute Multiple Alignment." Wait for the result, then copy and paste the alignment into a WORD or NotePad file. Scroll all the way to the bottom of the alignment. Select "N-J tree" and click the "Exec" button. Do not click "Generate Profile HMM."


>Platypus
MNNLRKTHPLIKIVNHSFIDLPTPSNISSWWNFGSLLGLCLIIQILTGLFLAMHYTSDTSTAFSSVAHIC
RDVNYGWLIRYMHANGASLFFMCIFLHIGRGLYYGSYTQTETWNIGVVLLFTVMATAFVGYVLPWGQMSF
WGATVITNLLSAIPYIGTTLVEWIWGGFSVDKATLTRFFAFHFILPFVIAALAVIHLLFLHETGSNNPSG
LNSDPDKIPFHPYYSVKDLVGFFMTILVLLTLVLFTPDLLGDPDNYTPANPLSTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALVASILILILVPLLHTSYQRGLAFRPLTQMLFWILVTDLLTLTWIGGQPVEQPFIII
GQLASILYFLLITTLIPLTGLLENDLLKW

>Wolf
MTNIRKTHPLAKIVNNSFIDLPAPSNISAWWNFGSLLGVCLILQILTGLFLAMHYTSDTATAFSSVTHIC
RDVNYGWIIRYMHANGASMFFICLFLHVGRGLYYGSYVFMETWNIGIVLLFATMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTDLVEWIWGGFSVDKATLTRFFAFHFILPFIIAALAMVHLLFLHETGSNNPSG
ITSDSDKIPFHPYYTIKDILGALLLLLILMSLVLFSPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALVFSILILAFIPLLHTSKQRSMMFRPLSQCLFWLLVADLLTLTWIGGQPVEHPFIII
GQVASMLYFTILLILMPTVSVIENNLLKW

>Elephant Seal
MTNIRKTHPLAKIINNSFIDLPTPPNISAWWNFGSLLGICLILQILTGLFLAMHYTPDTTTAFSSVTHIC
RDVNYGWIIRYMHANGASMFFICLYMHMGRGLYYGSYTFTETWNIGIILLFTIMATAFMGYVLPWGQMSF
WGATVITNLLSAVPYVGDDLVQWIWGGFSIDKATLTRFFALHFILPFVALALAAVHLLFLHETGSNNPSG
IPSDSDKIPFHPYYTIKDILGALLLILTLMLLVLFSPDLLGDPDNYTPANPLSTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALILSILILAIIPLLHTSSQRGMMFRPISQCLFWLLVADLLTLTWIGGQPVEHPYIII
GQLASILYFTILLVLMPITSIIENNILKW

>Pig
MTNIRKSHPLMKIINNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVTHIC
RDVNYGWVIRYLHANGASMFFICLFIHVGRGLYYGSYMFLETWNIGVVLLFTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTDLVEWIWGGFSVDKATLTRFFAFHFILPFIITALAAVHLLFLHETGSNNPTG
ISSDMDKIPFHPYYTIKDILGALFMMLILMILVLFSPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALVASILILILMPMLHTSKQRSMMFRPLSQCLFWMLVADLITLTWIGGQPVEHPFIII
GQLASILYFLIILVLMPITSIIENNLLKW

>Orca
MTNIRKTHPLMKILNNAFIDLPTPSNISSWWNFGSLLGLCLITQILTGLLLAMHYTPDTSTAFSSVAHIC
RDVNYGWFIRYLHANGASMFFICLYAHIGRSLYYGSYMFQETWNVGVLLLLAVMATAFVGYVLPWGQMSF
WGATVITNLLSAIPYIGTTLVEWIWGGFSVDKATLTRFFAFHFILPFIITALAAVHLLFLHETGSNNPTG
IPSNMDMIPFHPYHTIKDTLGALLLILTLLALTLFAPDLLGDPDNYTPANPLSTPAHIKPEWYFLFAYAI
LRSVPNKLGGVLALLLSILILIFIPMLQTSKQRSMMFRPFSQLLFWTLIADLLTLTWIGGQPVEHPYIIV
GQLASILYFLLILVLMPTISLIENKLLKW

>Rhinoceros
MTNIRKSHPLVKIINHSFIDLPTPSNISSWWNFGSLLGICLILQILTGLFLAMHYTPDTTTAFSSVTHIC
RDVNYGWMIRYLHANGASMFFICLFIHVGRGLYYGSYTFLETWNIGIILLFTLMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIILALAITHLLFLHETGSNNPSG
IPSNMDKIPFHPYYTIKDILGALLLILVLLILVLFFPDILGDPDNYTPANPLSTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILLLIPYLHTSKQRSMMFRPLSQCMFWLLVADLLTLTWIGGQPVEHPFIII
GQLASILYFSLILVLMPLAGIIENNLLKW

>Horse
MTNIRKSHPLIKIINHSFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVTHIC
RDVNYGWIIRYLHANGASMFFICLFIHVGRGLYYGSYTFLETWNIGIILLFTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTTLVEWIWGGFSVDKATLTRFFAFHFILPFIITALVVVHLLFLHETGSNNPSG
IPSNMDKIPFHPYYTIKDILGLLLLILLLLTLVLFSPDLLGDPDNYTPANPLSTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALILSILILALIPTLHMSKQRSMMFRPLSQCVFWLLVADLLTLTWIGGQPVEHPYVII
GQLASILYFSLILIFMPLASTIENNLLKW

>Hippopotamus
MTNIRKSHPLMKIINDAFVDLPAPSNISSWWNFGSLLGVCLILQILTGLFLAMHYTPDTLTAFSSVTHIC
RDVNYGWVIRYMHANGASIFFICLFTHVGRGLYYGSHTFLETWNIGVILLLTTMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTDLVEWIWGGFSVDKATLTRFFAFHFILPFVITALAIVHLLFLHETGSNNPTG
IPSNADKIPFHPYYTIKDILGILLLMTTLLTLTLFAPDLLGDPDNYTPANPLSTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALALSILILALIPMLHTSKQRSLMFRPLSQCLFWALIADLLTLTWIGGQPVEHPFIII
GQVASILYFLLILVLMPVAGIIENKLLKW


>Cow
MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVTHIC
RDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTG
ISSDVDKIPFHPYYTIKDILGALLLILALMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILISALIPLLHTSKQRSMMFRPLSQCLFWALVADLLTLTWIGGQPVEHPYITI
GQLASVLYFLLILVLMPTAGTIENKLLKW


>Blue Whale
MTNIRKTHPLMKIINDAFIDLPTPSNISSWWNFGSLLGLCLIVQILTGLFLAMHYTPDTMTAFSSVTHIC
RDVNYGWVIRYLHANGASMFFICLYAHMGRGLYYGSHAFRETWNIGVILLFTVMATAFVGYVLPWGQMSF
WGATVITNLLSAIPYIGTTLVEWIWGGFSVDKATLTRFFAFHFILPFIIMALAIVHLIFLHETGSNNPTG
IPSDMDKIPFHPYYTIKDILGALLLILTLLMLTLFAPDLLGDPDNYTPANPLSTPAHIKPEWYFLFAYAI
LRSIPNKLGGVLALLLSILVLALIPMLHTSKQRSMMFRPFSQFLFWVLVADLLTLTWIGGQPVEHPYVIV
GQLASILYFLLILVLMPVTSLIENKLMKW

      

 

 

STEP FOUR - Capture a copy of the alignment and the tree that results from that alignment.

 


Feel free to go back to the browser tab that is open to the Taxonomy Report. Check for additional interesting species. It might be worth it to go all the way back to the original search and broaden it beyond mammals. When you are setting up the Blastp parameters, change the "100 results" to "500 results" or use the elephant seal sequence to search for Aves, or Chondrichthyes, or whatever taxa you think might provide a good, unbiased root for your tree.



When preparing files for analysis you should be aware that the tree drawing program constructs a name to label the tree from the information that is on the first line of the FASTA format file. It is important to keep the accession numbers in your notes, but to remove all but the simplest description from that first line. Otherwise you get a messy tree.



To finish this lab, prepare a tree that includes the Cytochrome b sequences given in lab, the Cytochrome b sequence you used for your Integrative Project, and a Cytochrome b from one additional organism that could shed light on the position of whales and dolphins in the tree of life. Conclude whether the tree you prepare provides evidence in favor, or against, the grouping of whales within the even-toed ungulates.

An extension of this lab might be to go back and use the DNA sequences coding for these Cytochrome b proteins for your alignments instead of the amino acid sequences.


Your lab write-up should include:

Title, your name, and a one or two paragraph Introduction stating the problem regarding the taxonomy of whales.

Under Methods, list the accession number and species names of the sequences you used, where you obtained the sequences, which software you used to align the sequences, and which tree-building algorithm you used and how others could replicate your results.

Under Results prepare at least two figures with captions, then refer to the figures in your narrative. One figure should be of the alignment that includes the two additional Cytochrome b sequences, and one figure should be of the resulting tree.

Under Discussion state your conclusion and why your tree supports this conclusion. You may wish to search "artiodactyls hippopotamus" in Google, Google Scholar, or Pub Med, for some inspiration. Be sure to accurately cite any literature or website you choose to include in your discussion.





© Henrik Kibak 2004 - 2009