Simple Flat File Format


Flat files databases are not true databases, but they are still useful for storing, organizing, and retrieving data. They are widely used, especially for biological data and so you will run across them fairly often. There is no one way to organize a flat file database. In this lesson we will examine one scheme which could be used to organize a flat file database of information about biotech companies. Here's the basic scheme:
NM  Geron Corporation
SY  GERN
CO  Thomas Okarma
AD  230 Constitution Drive, Menlo Park, CA, 94025
FO  nuclear transfer, stem cells, oncology
Each line begins with a two letter code. Here's a key to the two letter codes:
  NM  company name
  SY  stock symbol
  CO  CEO of company
  AD  address
  FO  focus areas of research and development
Between the code and the information is exactly two spaces. This will make it easier to parse this file.

Here are four more examples:

NM  Perlegen
SY  private
CO  Brad Margus
AD  2021 Stierlin Ct, Mountain View, CA, 94043
FO  genomics, SNPs, haplotypes

NM  Incyte Genomics
SY  INCY
CO  Paul Friedman
AD  3160 Porter Drive, Palo Alto, CA, 94304
FO  genomic information and software

NM  Protein Design Labs, Incorporated
SY  PDLI
CO  Douglas Ebersole
AD  34801 Campus Drive, Fremont, CA, 94555
FO  monoclonal antibodies

NM  Nanogen
SY  NGEN
CO  V. Randy White
AD  10398 Pacific Center Court, San Diego, CA, 92121
FO  microelectronics for genomics research and medical diagnostics
While it is possible to understand information presented in this format, it may not be appropriate to present information this way to a general audience. So, what we need is a way of parsing this information and preparing it for display. Obviously, we could collect a lot more information (and we will) about companies, and so we are simply doing a scaled-down exercise in this assignment.

Here's a parser written in Perl which will output a properly coded web page.
makeHTML.pl
util.pm

(Make sure the information files are saved as company_name.des and that each company has it's own file.)

We could implement this as a CGI script, but we won't go that far for this assignment. The important thing is to inspect the Perl code so that you understand what was done here.


ASSIGNMENT:
You will do four things to successfully complete this assignment.
  1. Add a field for WEBSITE (abbreviated WB) to the flat file format used in this lesson
  2. Modify the Perl script so that a LINK is created to the company home page on the web page your script generates
  3. Update your flat file pages to include the relevant information for the five companies covered in this lesson, plus set up a new file for Sequenom
  4. Also make sure that the information presented for the sample companies is current.