String Processing

More Hashes

In this lesson the student will learn how to:
  1. Add new items to a hash
  2. Work with command line input
By the end of this lesson the student will be able to:

	Utilizing hashes, create a script which translates a
	sequence of nucleotides into an amino acid sequence.

Reading Frame

When translating a DNA strand into an amino acid sequence there are six possible reading frames to consider. First of all, the DNA can be read in either direction. Second of all, in a single direction you can start reading at the first nucleotide, the second nucleotide, or the fourth nucleotide. Two directions multiplied by three possible starting locations makes six possible reading frames. In the script you are to write for this assignment you only need to consider one reading frame (reading forward from the first nucleotide).

A question that you might not have considered is whether or not it is possible to untranslate an amino acid sequence into a RNA sequence and then into a DNA sequence. That is, given an amino acid sequence can you specify the DNA sequence that was used to create it? The answer is no. However, you can use the amino acid sequence to recognize a possible nucleotide sequences which match it. Since there are in many cases more than one codon specifying an amino acid it is impossible to predict which codon was used to specify it (except for the case of an amino acid which is specified by only a single codon as is the case for methionine).

Since you will need this information for the script you are to write, here are the amino acids and the codons which specify them:


A     GCU GCC GCA GCG
R     CGU CGC CGA CGG AGA AGG
N     AAU AAC
D     GAU GAC
C     UGU UGC
Q     CAA CAG
E     GAA GAG
G     GGU GGC GGA GGG
H     CAU CAC
I     AUU AUC AUA
L     UUA UUG CUU CUC CUA CUG
K     AAA AAG
M     AUG
F     UUU UUC
P     CCU CCC CCA CCG
S     UCU UCC UCA UCG AGU AGC
T     ACU ACC ACA ACG
W     UGG
Y     UAU UAC
V     GUU GUC GUA GUG
.     UAA UAG UGA

More Hashes:

#!/usr/bin/perl %h = ( "ACC" => "T", "GUU" => "V", "AUU" => "I" ); $h{"ACA"} = "T"; $h{"CUC"} = "L"; $h{"GAA"} = "E"; foreach $item (keys(%h)){ print "$item --> $h{$item}\n"; }
As you can see, adding single elements to a hash is fairly simple.

The following script takes input from the command line:

#!/usr/bin/perl @numbers = @ARGV; $total=0; foreach $n (@numbers){ $total+=$n; } print "TOTAL: $total\n";
Run this program like this:

   prog_name 3 5 4 3

Try it with a few sets of numbers. Your sets can be of any length.

If the user fails to enter command line input this one will prompt the user for the needed input:

#!/usr/bin/perl ($one, $two, $three) = @ARGV; if($one eq ""){ print "NAME: "; $one = <STDIN>; chomp($one); } if($two eq ""){ print "PHONE: "; $two = <STDIN>; chomp($two); } if($three eq ""){ print "ZIP: "; $three = <STDIN>; chomp($three); } print "$one, $two, $three\n";

ASSIGNMENT:

Write a script which will take a nucleotide sequence and translate it to an amino acid sequence. Either prompt the user to input the sequence or to enter it at the command line. Don't worry about reading frame.