String Processing

Recursion

In this lesson the student will learn how to:

Write subroutines
Pass variables to subroutines
Write recursive subroutines

By the end of this lesson the student will be able to:


    Write a recursive subroutine which creates a 
    random nucleotide sequence of specified length.

Random Sequences

Exon sequences can often be recognized by their GC content. Exons have a higher GC content than introns or nucleotide sequences occurring between gene sequences. This is because there is a tendency for methylated C's to be deaminated to produce T's. Outside of exons, this rarely makes any difference and so there is a tendency for these areas to drift towards a lower GC content (remember that G's pair with C's). Within an exon, however, such changes do make a difference and may render a gene non-functional. A non-functional gene will not produce any offspring and so will not proliferate.

While there were still many genes to locate in the human genome, one process used to locate potential genes was to search for GC-rich portions of the genome. While there are some rule-breaking genes, for the most part, genes have an above average GC content as compared to non-gene portions of the genome.

Subroutines

Subroutines are a useful way to avoid writing the same code over and over. They allow you to reuse the lines of code contained within them. Try out the following example:

#!/usr/bin/perl sub name(){ print "ENTER YOUR NAME PLEASE: "; $hold = <STDIN>; chomp $hold; return $hold; } sub street(){ print "ENTER YOUR STREET ADDRESS: "; $hold = <STDIN>; chomp $hold; return $hold; } sub zip(){ print "ENTER YOUR CITY: "; $c = <STDIN>; chomp $c; print "ENTER YOUR STATE: "; $s = <STDIN>; chomp $s; print "ENTER YOUR ZIP: "; $z = <STDIN>; chomp $z; return "$c, $s $z"; } sub addr(){ print "\n\n"; print $na . "\n"; print $str . "\n"; print $z . "\n"; } $na = name(); $str = street(); $z = zip(); addr();

There are four subroutines in this example. You should notice the keyword return at the end of the first three. Also understand that the subroutines are called in the last four lines of the program and that this is the only time they actually execute.

Now try out this example:

#!/usr/bin/perl sub madd{ ($a,$b) = @_; $sum = $a + $b; return $sum; } sub getnum{ ($p) = @_; print "$p"; $n = <STDIN>; chomp $n; return $n; } $a = getnum("ENTER NUMBER: "); $b = getnum("ENTER ANOTHER: "); $ans = madd($a,$b); print "ANSWER: $ans\n";

This example actually passes values to the subroutines. You can reuse a subroutine as many times as you want to.

Finally, try this subroutine.

#!/usr/bin/perl @letters = ( "a", "g", "u", "c" ); $nuc = "aug"; srand(time); sub addnuc(){ $hold = ""; for($i=0; $i<3; $i++){ $n = int(rand(4)); $hold .= $letters[$n]; } $nuc .= $hold; if($hold ne "uaa" && $hold ne "uag" && $hold ne "uga"){ addnuc($hold); } return $hold; } $nuc .= addnuc(); print $nuc . "\n"; print "LENGTH: " . length($nuc) . "\n";

This subroutine is recursive. This means that it calls itself. The addnuc subroutine calls itself until a specific condition is satisfied. If you ran this one a few times, then you probably noticed that the length of the nucleotide it creates varies from execution to execution of the script.

ASSIGNMENT:

Write a script which first of all asks the user to specify the length of a nucleotide sequence. Write a subroutine which recursively calls itself to create a random nucleotide sequence of the length specified. (Obviously, a more direct way of create a sequence of fixed length is with a for-loop, but that's not how you are to do it here!) Your recursive function should only add one nucleotide per call!