next up previous
Next: About this document ... Up: perl3 Previous: perl3

Subsections

Data manipulations

Array operation

shift() and friends

my @a = (1,2,3,4,5);
my $first = shift (@a);   # $first = 1, @a = (2,3,4,5)
unshift (@a, $first);     # $first = 1, @a = (1,2,3,4,5)
my $last = pop (@a);      # $last = 5,  @a = (1,2,3,4)
push (@a, $last);         # $last = 5,  @a = (1,2,3,4,5)
Image shift
shift() and unshift() operates on the left side of the list (array). push() and pop() operate on the right side. All of these functions remove or insert an element from the end.

In addition to removing an element from the list, shift() and pop() returns the removed element. So you can assign the removed element to another variable (e.g. $first or $last above) if needed. These two function returns a special value undef if an empty array is given.

splice()

# splice @targetArray,OFFSET,LENGTH,@insertArray

my @target = (11,12,13,14,15);
my @ins = (10, 20);
splice @target, 1, 3, @ins;  # @target becomes (11, 10, 20, 15), @ins unchanged

Image splice

reverse()

my @a = (1,2,3);
my @b = reverse(@a);  # @a = (1,2,3), @b = (3,2,1)

It returns an array whose order is reversed. Note that the argument array (@a) is unaltered. If you want to reverse an array ``in place'', you can assign it back into the same variable:

@a = reverse(@a);    # @a = (3,2,1) now.

You can also use reverse to character strings (scalar variable).

my $s = "hot dog";
my $rev = reverse($s);
print $rev, "\n";  # becomes "god toh";

chomp()

You already know that chomp($var) removes the newline character. You can also give an array as an argument.
@fileContent = <INFILE>;  # ("line1\n", "line2\n","line3\n", ...)
chomp(@fileContent);      # ("line1", "line2","line3", ...)
Then it removes newline from each element.

sort()

qw()

@x = qw(small medium large);  # equivalent to @x = ('small', 'medium', 'large');
If you are initializing a large array or hash with character strings, it become tedious to type in quotes for each element. You can use qw to split it by spaces, and put quotes around each element.

An example of initializing a large hash (codon -> amino acid):

# '*' indicates the termination codon
%aminoAcid = qw (TTT F TTC F TTA L TTG L TCT S TCC S TCA S TCG S TAT Y TAC Y TAA * TAG * TGT C TGC C TGA * TGG W CTT L CTC L CTA L CTG L CCT P CCC P CCA P CCG P CAT H CAC H CAA Q CAG Q CGT R CGC R CGA R CGG R ATT I ATC I ATA I ATG M ACT T ACC T ACA T ACG T AAT N AAC N AAA K AAG K AGT S AGC S AGA R AGG R GTT V GTC V GTA V GTG V GCT A GCC A GCA A GCG A GAT D GAC D GAA E GAG E GGT G GGC G GGA G GGG G);

Other convenient methods of array operation

Appending one array to another

my @a = ("See", "you");
my @b = ("later", "Aligator");

# method 1 (better)
push (@a, @b);
print "@a\n";

# method 2 (slower)
@a = (@a, @b);

Perl doesn't have nested arrays (an array as an element of another array). In other words, (("a1", "a2"), ("b1", "b2")) automatically becomes ("a1", "a2", "b1", "b2"). So the 2nd method will produce a one dimensional array (automatic flattening).

But push is a more efficient way to achieve the same goal.

Extracting unique elements from a list

my @a = (1,3,5,2,5,4,3,2,1,5);
my @uniqArr = Unique(@a);  # you get (1,2,3,4,5)

sub Unique {
  my %seen  =();
  foreach my $element (@_) {
    $seen{$element}++;
  }
  return (sort(keys(%seen)));
}

grep(): Finding all elements matching certain criteria

A straight forward method:
@matched = ();
foreach my $i (@list) {
  push (@matched, $i) if ($i =~ /^\d+$/ && $i < 20);
}

This will find all integers which are less than 20.

An easier method:

@matched = grep {$_ =~ /^\d+$/ && $_ < 20} @list;
Each element gets assigned to $_ and if it satisfy the test (inside of { }), the value gets inserted to @matched.

Let's say we have a hash table %age, which contains the age of each person (the keys are names of people). Can you write a grep statement to get an array of names (keys in this hash), whose age is younger than 21?

Scalar manipulations

Changing cases

my $song = "Old Joe Clark";
$song = lc($song);      # become "old joe clark"
$song = uc($song);      # become  "OLD JOE CLARK"
$song = lcfirst($song); # become  "oLD JOE CLARK"
$song = ucfirst($song); # become  "OLD JOE CLARK"
Note that lcfirst and ucfirst only change the first character of string, and the case of the other characters are not changed.

Can you make a function which take a string as the argument, and return a string with the first character of each word is upper case, but the rests are lower case? So Capitalize($song) will always return "Old Joe Clark".

length()

my $string = "kermit the frog";
my $len = length($string);   # 15 characters including spaces in $string

Conversion between scalar and list

split(): scalar to list

@line = split /\s+/, $_;
@line = split /\s+/;
@line = split;

@csvdata = split /\s*,\s*/, $csvString;
The function split the string (2nd argument) by the pattern inside of /  /, and return an array. Note that we are using regular expressions now.

If you omit the 2nd argument (string scalar), it operates on $_. If you omit the pattern, it splits on whitespace (after skipping any leading whitespace). So the first 3 statements are equivalent.

join(): list to scalar

@a = (1, 20, 36)
print join("\t", @a), "\n"      # print out a tab delimited line "1\t20\t36".

my $concatenated = join("", @a) #  $concatenated is 12036

my ($hour, $min, $sec) = (20, 31, 16);
my $timeString = join (':', $hour, $min, $sec); # become 20:31:16

my $sep = "\n";
$threeLine = join($sep, @a);

Note that split() uses a pattern /  /, but join takes a character string.

Exercises

  1. You have two files with 3 tab-delimited columns. The 1st column contains the genbank accession numbers. We want to combine these two files, but we don't want to have duplicated accession numbers. In other words, we want to have the accesion numbers to be unique in this combined file. If there are duplicated accession numbers, you can use any one of the lines. Make a program which will let us merge the two files.

    If you do:

    uniqAcc.pl fileA fileB > outFile

    the result should look like the following outFile

    fileA:
    AY167979        Bras.juncea     # Brassica juncea rbcL gene
    AY167976        Bras.rapa       # Brassica rapa rbcL
    AF267640        Bras.napus      # Brassica napus rbcL
    
    fileB:
    AF267640        B.napus         # Brassica napus rbcL
    AY167979        B.juncea        # Brassica juncea rbcL gene
    U91966          A.thal          # Arabidopsis thaliana rbcL
    
    outFile:
    AY167979        Bras.juncea     # Brassica juncea rbcL gene
    AY167976        Bras.rapa       # Brassica rapa rbcL
    AF267640        Bras.napus      # Brassica napus rbcL
    U91966          A.thal          # Arabidopsis thaliana rbcL
    

  2. Make a program which reads in a FASTA file, and print out the ``reverse complement'' in FASTA format. Extra point to make it print out, so that each line is at a reasonable length (say <=70 characters per line).

    Reverse complement of 'AAGCTTGC' is 'GCAAGCTT'.

  3. Make a program (selectSites.pl) which takes 3 arguments: input_file_name, beginning site, last site. So if you run the program,

    selectSites.pl in.fasta 6 200

    It will read in the fasta file, and select sequences between site 6 and site 200 from each sample, and print out the FASTA file.

  4. Make a program which translates DNA sequences into amino acid sequences.


next up previous
Next: About this document ... Up: perl3 Previous: perl3
Naoki Takebayashi 2011-10-19