|
||||||
|
|
||||||
| Introductory Bioinformatics Lab for Introduction to Cell and Molecular Biology Henrik Kibak - Fall 2004 |
![]() |
|||||
| Bioinformatics is emerging as a hugely important field affecting all areas of biology. While bioinformatics is formally the application of computer technologies to biological sciences - ranging from automated analysis of microarrays containing thousands of individual experiments to the development of browser tools for looking at whole genomes - students in all areas of biology need to be familiar with software tools developed by bioinformaticians to accomplish routine tasks in biology. | ||||||
|
||||||
|
First we will look at the taxonomic position of Euglena using Cytochrome C as a demonstration exercise. You will then have the tools to answer the question: "Are whales and dolphins a sister group to Ariodactyls (ungulates)? Or should they be placed within the Ariodactyls as a sister group to Hippopotami? You will answer that question during next week's lab (Reading HERE).
|
||||||
|
It is impossible to provide a reasonable guide to even a small section of this tremendous resource... you will have to explore it yourself... Most of the instructions will be given in lab. If you miss lab, you will have to work with a classmate to capture some of the steps.
As you can see, there is a vast amount of information cataloged even for this monachine phocid...
Here, for example, you will find an important article that should be read by all ESSP majors. "Sequential megafaunal collapse in the North Pacific Ocean: An ongoing legacy of industrial whaling?" To see what is available for Euglena let's enter that instead of Mirounga. Go ahead and refine the search a bit by clicking "Protein" and adding the search modifier for "organism" like this:
That should reduce the number of hits a bit. Adding "cytochrome c" with quotes like this should help a lot:
Finally, if you add the search modifier for "protein" like this:
...it should knock it down to about three hits that include the Cytochrome C sequences for Euglena viridis and Euglena gracilis, that were obtained many years ago by direct protein sequencing, and a more recent one with no information on how it was obtained. Create a folder called "Seqs" somewhere on your hard drive where you can find it again (Perhaps write down the pathname). Save the first Euglena viridis sequence to that folder as a web page called "Cyt_c_Eug_vir.html" Now erase your previous search terms and try typing in "Cytochrome C" in quotes... what results do you get when you search? Click on "Protein" if you aren't already in the Protein database. You should see "Page 1" of at least "1,567 pages" of results!!! A bit more than Mirounga... To refine the search try adding [prot] after the "Cytochrome C" - that should get it down to only 25 pages of results (!). Finally try adding "mammalia" to the search terms as in the example below:
What do you see? You should see that the results have been narrowed to 56 items (2004) on 3 pages. Click on the first one if it is P68096. The sequences are available in a variety of formats which are selected via the "Display" button. The sequences can also be sent to "text" for printing or saved in a file. Copying and pasting into Notepad also works. There is also information associated with structure, taxonomy, other genes and publications, etc. In order to save time I have downloaded five sequences for us to use in this exercise. Follow the steps below the sequences.
|
||||||
|
The Cytochrome C sequences we will use (in FASTA format): >Arabidopsis gi|4539007
Cytochrome c [Arabidopsis thaliana] >Euglena GI|117985:1-102
Cytochrome c [Euglena viridis] >Hippo gi|65451
Cytochrome c [Hippopotamus amphibius] >Mosquito gi|31202411|ref|XP_310154.1|
[Anopheles gambiae] Preparing sequences for comparison by aligning them using ClustalX
We want the program also to compare the aligned sequences for us and see how different they are from each other. The more "different" they are, the less related they should be, and the more "distant" they should appear on a phylogenetic tree. The program first finds the two most related sequences then adds the next most related "neighbor" sequence. It calculates a difference score and outputs a little file of brackets and numbers that show the relationships and degree of relationship in the form of "branch lengths."
So Euglena is slightly more closely related to animals than plants... Are you convinced?
|
||||||
|
"Are whales and dolphins a sister group to Ariodactyls (ungulates)? Or should they be placed within the Ariodactyls as a sister group to Hippopotami? You will answer that question during next week's lab.
|
||||||
|
Pancreatic Ribonuclease sequences for this your project:
|
||||||
| You may, upon consultation with me, choose a different project for your lab... some of you may choose to work with fish, insects or plants... Or, perhaps the most challenging and interesting of all, comparing whales, seals, bears, weasels... However, be aware that it will take you extra time since you will have to find your own sequences to compare and confirm that they are what you think they are... not always easy for beginners. | ||||||
|
Your write-up should consist of:
Due December 17, 2004. |
||||||
|
Another interesting dataset to try... Remember, you can pick and choose your own... no need to run them all. >Rhinocerus
(white) ATP7A [Ceratotherium simum] >Horse
ATP7A [Equus caballus] >Hippopotamus
ATP7A [Hippopotamus amphibius] >Elephant
(African) ATP7A [Loxodonta africana] >Whale
(Humpback) ATP7A [Megaptera novaeangliae] >Okapia
(Giraffe family) ATP7A [Okapia johnstoni] Photo >Pig
(note "X" at 2nd to last residue) ATP7A [Sus scrofa] >Manatee
(Caribbean) ATP7A [Trichechus manatus] >Dolphin
(Bottle-nosed) ATP7A [Tursiops truncatus]
|
||||||
|
|
||||||
|
If you are
desperate to produce a nicer image of your tree than a screen shot will
provide, follow the steps below:
Viewing three dimensional structures of proteins and their sequences. Some proteins have had their structures determined by X-ray crystallography or Nuclear Magnetic Resonance. This is an arduous but rewarding endeavor and especially important for understanding enzyme mechanisms or for drug discovery.
|
||||||
|
||||||
|
||||||
![]() |
||||||
|
|
||||||
![]() |
||||||
|
© Henrik Kibak 2004 |
||||||