m//g
/pattern/i /ATG/i # matches ATG, atg, ATg, aTg etc.
s/pattern/replace/gYou can combine with case insensitive option:
$degenSeq =~ s/[MRWSYKVHDBN]/N/gi;All of degenerate bases (both upper and lower cases) get replaced by 'N'.
/
', it is tedious to escape each
'/
' with '\/
'. You can actually any character for the
field delimiter of s/ / /
s#^https://#http://#; s@^https://@http://@; s/^https:\/\//http:\/\//;
These are all same.
m/pattern/
form (this is equivalent to
/pattern/
).
m#^https://# m@^https://@
$& # the part of the string that actually matched the pattern $` # the part of the string before the match $' # the part of the string after the match
Example:
my $seq = 'TTTGAATTCAAA'; if ($seq =~ /GAATTC/) { # EcoRI site print "Found EcoRI site: $`:$&:$'\n"; }
This is similar to $1, $2
etc. when ( )
was used in
regex, but it is automatic.
Useful for checking that your complex regex is correct.
m//g
my $seq="ATGTTTCCCTTTAAA"; while($seq=~/TTT/ig){ print (pos($seq), ":", length($&), ":", pos($seq) - length($&) + 1, ":", "$&\n"); }
perl -e 'print "Hi!\n";'-e option is for execute the following ``expressions''. The expressions need to be followed RIGHT AFTER -e option.
perl -ne 'print if (/(camel|llama)/);' infileThis is equivalent to:
while(<>) { print if (/(camel|llama)/); }which is similar to ``grep''.
perl -ne 'chomp; @a = split /\t/; print join("\t", @a[1,3]), "\t")' infile
grep '>' fastaFile
?
perl -i.orig -ne 's/mouse/cat/g;print' *.txt
This will work on the multiple files (*.txt), and all 'mouse' get replaced by 'cat' in each file.
It also creates the backup with the suffix specified after -i as the backup (i.e. a file mouse.txt.orig get created from mouse.txt).
A practical example.
perl -i.orig -ne 'if(/\A>/) {s/\A>([^\|]+\|){3}([^\|]+).+/>$2/};print' *.fasta
Try running the command on files downloaded from GenBank , and see what happens (similar to what we did in the previous section).
FASTA downloaded from GenBank usually has following header (sequence name) lines:
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus] > gi|gi-number|(gb|emb|dbj)|accession|locus
Friedl, J. E. F., 2002. Mastering Regular Expressions. O'Reilly & Associates.