String Processing

SUMMARY

TITLE PERL TOPICS BIOLOGICAL TOPICS
Strings and DNA
  • Represent a DNA sequence using a string
  • Use substr to inspect a single character in a string
  • Concatentate strings using the dot operator
  • Identify the four nucleotides in DNA
  • Create complementary strings of nucleotides
  • Substitution
  • Use the s// operator
  • Describe the difference between DNA and RNA in terms of nucleotide content
  • Explain the process of transcription in general terms
  • Distinguish between RNA and DNA by nucleotide content
  • Understand that there are several different types of RNA
  • Understand that an RNA transcript is created from a DNA template
  • Transliteration
  • Use the tr// operator
  • Convert lowercase to uppercase and vice versa
  • Know the difference between introns and exons
  • Recognize the difference between an RNA and DNA transcript
  • List the nucleotides used in DNA and RNA
  • Pattern Matching
  • Use pattern matching with conditionals
  • Use the g option to force exhaustive pattern matching
  • Count using the pattern matching construct
  • Know what codons are
  • Understand the relationship between RNA and amino acids
  • Know that multiple codons code for certain amino acids
  • Random Sequences
  • Use the seed functions to improve random number generation
  • Use the rand function to generate random numbers
  • Generate a random sequence of nucleotides
  • Use the substr and length functions to isolate portions of a string
  • Understand that mutations are changes to DNA
  • Understand that mutations can have harmful, neutral, or beneficial effects for the organism
  • Transcribing and Translating Sequences
  • Use pattern matching to read a string three letters at a time
  • Use if-elsif-else construct to simulate translation
  • Use unless construct
  • Know that codons correspond to amino acids
  • Know that amino acids have three and one letter abbreviations
  • Working With Substrings
  • Use the substr function in a variety of situations
  • Use the substr function to insert into a string
  • Know how to use the index and rindex functions
  • Recall a few details about exons and introns
  • Know that AUG codes for methionine and serves as a start codon
  • Identify the stop codons
  • Arrays: Split and Join
  • Use the split function
  • Use the join function
  • Use the push function
  • Compare the similarity of two nucleotide sequences
  • Know that sequence similarity has biological significance
  • Arrays: Push, Pop, Shift, and Unshift
  • Add items to an array using push
  • Add items to an array using unshift
  • Retrieve items from an array using pop
  • Retrieve items from an array using shift
  • Know that nucleotide sequences are held together by phosphodiester bonds
  • Know that two strands of DNA are held together by hydrogen bonds
  • Know that DNA sequences are reported in the 5' to 3' direction
  • Splicing Arrays
  • Use the splice function
  • Slice an array
  • Understand that mutations are changes in DNA sequence
  • Understand that mutations can be insertions, substitutions, or deletions of a single or more than one nucleotide
  • Regular Expressions
  • Know that regular expressions are generalized patterns
  • Be able to use a subset of regular expressions
  • Know the parts of an amino acid
  • Know that there are 20 amino acids coded for by the genetic code
  • Know the names and symbols for the 20 amino acids
  • Know that amino acids can be grouped based on various properties
  • More Regular Expressions
  • Incorporate quantification symbols into regular expressions
  • Understand how the {}, *, +, and ? symbols are used to specify quantification
  • Know the location of centromeres and telomeres
  • Know that TTAGGG is the repeating motif found in mammalian telomeric DNA
  • Understand the terms telomerase, oncology, and replicative cell senescence
  • Conditionals
  • Use if-elsif-else constructs
  • Use unless construct
  • Use if after statement
  • Use or construct
  • Know the names and symbols for the 20 amino acids
  • Know that there are other amino acids other than those used in the genetic code
  • Iteration
  • Use a variety of loop constructs
  • Compare the various loop constructs
  • Create and use embedded loops
  • Understand the terms ribosome, tRNA, and anticodon
  • Know that the ribosome is where mRNA and tRNA come together to produce an amino acid sequence
  • Recursion
  • Write subroutines
  • Pass variables to subroutines
  • Write recursive subroutines
  • Know that exons have a higher GC content than the rest of the genome
  • Know that non-exon areas of the genome tend to drift towards a lower GC content over time
  • Hashes
  • Know how to create and use hashes
  • Know how extract the keys and values from a hash
  • Know how to extract a single value from a hash
  • Know the four levels of protein structure
  • Review the amino acid symbols and names
  • More Hashes
  • Add new items to a hash
  • Work with command line input
  • Understand that there six possible reading frames for a DNA sequence
  • Be aware of codon to amino acid correspondencies
  • Restriction Enzymes
  • Use parentheses in regular expressions
  • Create complex regular expressions
  • Understand that restriction enzymes can be used to cut nucleotide sequences at specific sites
  • Know that each restriction enzyme has a specific site called the cut site where the actual cut is made
  • IUB Ambiguity Codes
  • Write regular expressions for IUB ambiguity codes
  • Specify the portion of a sequence before and after a pattern match
  • Understand how to use IUB ambiguity codes
  • Palindromes
  • Recognize palindromes programmatically
  • Write a script using many previously presented techniques
  • Recognize different kinds of palindromes
  • Regular Expressions Revisited
  • Specify whether matching will be greedy or minimal
  • Do calculations within regular expressions
  • Use regular expressions to represent codons
  • Understand that the human nuclear genetic code is not exactly the same as the human mitochondrial genetic code
  • Two-Dimensional Arrays
  • Create two-dimensional arrays
  • Access values stored in two-dimensional arrays
  • Understand that repetitive DNA is found throughout mammalian genomes
  • Understand that certain diseases are associated with excessively repetitive DNA
  • Formatted Output
  • Use the printf function
  • Use field specifiers with the printf function
  • Understand that the afinity between two nucleotide strands can be measured by counting the number of complementary base pairs between them
  • Best Matches
  • Modify a script to add functionality
  • Locate the best match between two non-identical nucleotide sequences
  • Back to String Processing and Sequence AnalysisUnit Index
    Back to Bioinformatics Course Index