MICa 8006 Home
MICa 8006
Protein Sequence Analysis

Homework 3


Due before class, Tuesday, September 26, 2006

Part A (5 pts) GCG

In this part, you will use p0aa25.uniprot_sprot, test1.pep and test2.pep.
test1.pep and test2.pep are on the class website in fasta format. Convert them to GCG format before you start, if you have not done so already.
  1. Align p0aa25.uniprot_sprot and test1.pep using the modules bestfit and gap. Write a short description of the differences between the two alignments, if any. Which alignment is "better"? Why? WARNING: GCG uses the same default output filename for both these modules. CHANGE one or both names (How?) or your second result will overwrite your first.
  2. Use the same two modules to align p0aa25.uniprot_sprot and test2.pep. Write a short description of the differences between the two alignments, if any. Which alignment is "better"? Why?
  3. Use pileup to align all three proteins: p0aa25.uniprot_sprot, test1.pep and test2.pep. Create a .png format file with the dendrogram. Based on this dentrogram, can you determine if test2.pep is more similar to test1.pep or to p0aa25.uniprot_sprot? Why or why not?
  4. Use blast to compare test2.pep to the UNIPROT protein database. If this finds an identical sequence, give its name. What is the strongest hit with known function?
  5. Find protein sequences of several (10-20) members of the protein family with that known function (How?), and use pileup to create a multisequence alignment with test2.pep and all these sequences. If you were studying this "for real" you would find every possible member of the protein family, but for this homework assignment, only use 10-20 sequences in the family. Is test2.pep a member of that protein family? Why or why not?
  6. Is test2.pep a putative protein with known function, a conserved hypothetical, or an unknown protein?

Submit: Printed copies of all of the alignments and dendrograms. A copy of the first part of the blast output, through the first hit with known function. Write your short descriptions or answers to questions directly on the GCG output pages that relate to them.

Part B (5 pts) Perl

Based on Pseudocode for HW3B, adapt the program you wrote in homework 2 to become a Perl program, with comments, that:
  1. creates three hashes, all with keys = 1 letter amino acid codes, one with values = amino acid names, the second with values = three letter codes, the third with amino acid property codes;
  2. creates one hash with keys = amino acid property; codes and values = full names of amino acid properties;
  3. randomly chooses an amino acid;
  4. randomly prints its 1 letter code, name, or three letter code;
  5. requests the user to input the other two items;
  6. request the user to input the full name of one property of that amino acid;
  7. requests the user to draw the chemical structure of the R group of the amino acid and prints four blank lines so the user has space to do so.
  8. The program should repeatedly do this until the user requests it to end. It should produce different random values each time it is run.

Submit: a printed copy of your program and the output it produces when it is run three times, with three or more loops each time. On the printed output, manually draw 2-D structures of the requested R groups. Also, email your program to Flora Fan at fanx0038@umn.edu

Part C (10 pts) Perl

Based on the Pseudocode for HW3C, add to the Perl program in Part B, so it:
  1. checks the accuracy of the information input by the user, except for the R group structure;
  2. praises correct answers and corrects incorrect answers;
  3. counts the correct and the total number of questions; and,
  4. when the user requests it to end, reports the percent correct and the date and time.

Submit: a printed copy of your program and the output it produces when run three times as described below, with three or more loops each time. One run should have all answers correct; one run all answers incorrect; and one run a mixture of correct and incorrect answers. Again, draw correct R groups on the printed output. Also, email your program to Flora Fan at fanx0038@umn.edu


February 21, 2008 Lynda Ellis

© 2009, University of Minnesota.
All rights reserved.

[an error occurred while processing this directive]