MICa 8006 Home
MICa 8006
Protein Sequence Analysis

Homework 4


Due before class, Tuesday, October 3, 2006

Part A (10 pts) GCG

In this part, you will use test2.pep. test2.pep is on the class website in fasta format. Convert it to GCG format before you start, if you have not done so already.
  1. Analyze this protein sequence using the GCG modules motifs, hmmerpfam, coilscan, hthscan, spscan.
  2. Use the GCG module findpatterns to search this sequence for the simple glutaredoxin motif: cysteine, proline, tyrosine or phenylalanine, cysteine. Read the findpattern documentation to learn how to write this pattern so findpattern can understand it.
  3. Analyze this protein using the appropriate GCG modules you learned to use in HW2.
  4. Write a short report on test2.pep, suitable for a biologist not familiar with protein sequence analysis, describing the results of your GCG analyses on it in HW 3 and HW 4, and referring to the output from these GCG analyses as appropriate.

Submit: a printed copy of output (you do not have to repeat the HW 3 analyses, just resubmit the parts you include in your report). A short report summarizing all you have learned about this protein.

Part B (5 pts) Perl

For this assignment, use the file tests.pep, on the class website. It contains multiple sequences in fasta format. Based on Pseudocode for HW4B, write a Perl program that will:
  1. open this file,
  2. read through it, changing all sequence information to upper case, concatenating sequences lines into one long string,
  3. remove any blank spaces (white space), and,
  4. for each sequence, print out the fasta ID line, the cleaned up sequence, and put one blank line between sequence sets.

Submit: a printed copy of your program and the output it produces when it is run. Also, email your program to Flora Fan at fanx0038@umn.edu

Part C (5 pts) Perl

Based on Pseudocode for HW4C, add to the program in Part B, so that for each sequence, it will, in addition, calculate its amino acid composition, and output this after the sequence.

Submit: a printed copy of your program and the output it produces when it is run. Also, email your program to Flora Fan at fanx0038@umn.edu

Part D (10 pts) Perl

Based on Pseudocode for HW4D, add to the Perl program in Part B, so it asks the user to input a sequence pattern in regular expression format (Where should you put this addition?), and reports if each sequence contains that regular expression, and, at the end of the run, reports the date and time.

Run it at least three times: using two regular expression for the PROSITE patterns for N-glycosylation and thioredoxin, that were demonstrated as regular expressions in class examples, and once using a different regular expression for a PROSITE pattern that includes gaps and alternative residues and is found in at least one sequence (you can run the individual sequences through GCG motifs or the PROSITE website to find such patterns).

The US mirror of the PROSITE website is: http://us.expasy.org/prosite/.
A list of all current PROSITE motifs is found at:
http://us.expasy.org/cgi-bin/prosite-list.pl.
(What computer language was the program that wrote that last web page written in?)

Submit: a printed copy of your program and the output it produces when it is run. Also, email your program to Flora Fan at fanx0038@umn.edu


February 21, 2008 Lynda Ellis

© 2009, University of Minnesota.
All rights reserved.

[an error occurred while processing this directive]