String Processing

Transcription and Translation

In this lesson the student will learn how to:
  1. Use pattern matching to read a string three letters at a time
  2. Use if-elsif-else construct to simulate translation
  3. Use unless construct
By the end of this lesson the student will be able to:

           Write a perl script to simulate the processes of 
           transcription and translation.

64 Codons


A     GCU GCC GCA GCG
R     CGU CGC CGA CGG AGA AGG
N     AAU AAC
D     GAU GAC
C     UGU UGC
Q     CAA CAG
E     GAA GAG
G     GGU GGC GGA GGG
H     CAU CAC
I     AUU AUC AUA
L     UUA UUG CUU CUC CUA CUG
K     AAA AAG
M     AUG
F     UUU UUC
P     CCU CCC CCA CCG
S     UCU UCC UCA UCG AGU AGC
T     ACU ACC ACA ACG
W     UGG
Y     UAU UAC
V     GUU GUC GUA GUG
.     UAA UAG UGA

The column to the right shows the one-letter abbreviation for the twenty amino acids plus the dot represents stop codons. The groups of three nucleotide letters on the right are the 64 possible codons.

Here are the names of the amino acids along with their three and one letter abbreviations:
One-letter CodeThree-letter CodeAmino Acid Name
Aalaalanine
Rargarginine
Nasnasparagine
Daspaspartic acid
Ccyscysteine
Qglnglutamine
Egluglutamic acid
Gglyglycine
Hhishistidine
Iileisoleucine
Lleuleucine
Klyslysine
Mmetmethionine
Fphephenylalanine
Pproproline
Sserserine
Tthrthreonine
Wtrptryptophan
Ytyrtyrosine
Vvalvaline

Sample Scripts:

#!/usr/bin/perl $str = "aaacccgggaaatttaaacccggg"; $out = ""; print $str . "\n"; while(length $str > 2){ $sb = substr($str,0,3); if($sb =~ /aaa/){ $out .= "K"; } elsif($sb =~ /ccc/){ $out .= "P"; } elsif($sb =~ /ggg/){ $out .= "G"; } elsif($sb =~ /uuu/){ $out .= "F"; } else{ $out .= "BAD CODON"; print "$sb\n"; } $str =~ s/$sb//; #removes the currect codon from the original sequence } print $out . "\n";
This script will translate four different codons into their corresponding amino acids. Notice that an if-elsif-else construct is used to consider each possibly match one at a time. A script which actually took any codon and translated it into its corresponding amino acid would require 64 possible patterns. (There are other approaches which will work with fewer lines of code which we will consider in future lessons.)

Here's a little script which prints out all 64 possible codons:

#!/usr/bin/perl @nuc = ( "A","U", "G", "C" ); $one=0; $two=0; $three=0; for( $i = 0; $i<64; $i++){ $one=$i/16; $two=($i/4)%4; $three=$i%4; print "" . ($i+1) . ") " . $nuc[$one] . $nuc[$two] . $nuc[$three] . "\n"; }
There are better ways of writing a script to output the 64 possible codons, but this one works. The most confusing part of this one is the math involving the division and modulus operations within the loop. It might take you a while to realize why it works, but take some time to compare the output of this script with these operations and make sure that you understand why they work as they do.

ASSIGNMENT:

Write a perl script which simulates the entire process of transcription and translation. Prompt the user for an input string and produce output like this:


 DNA: (whatever the user provides)
 RNA: (transcription product)
 AA:  (translation product)

Make sure that you have a special output message for bad input.