Overview
perl fileName
runs a script you have
created. perl -e '...code...;'
runs Perl code
straight from the command line.
perl -h
shows you a whole list of switches. For example, -ne
puts a while (<>) {...}
construct around whatever code you
supply (that code being made allowable by the e
). -w turns on
warnings (to help you write better code). -d
turns on the built
in Perl debugger (has a big set of its own commands, including s
to
step to next line, /xx/
to search code for xx
, and
much more...)
#!/user/bin/perl
(or
wherever perl is installed on the machine) -
this negates need to call Perl at command line
#!/user/bin/perl
(#!perl
is
sufficient in windows)@ARGV
is a magic array that holds all input variables passed
through the command line (eg. "myScript.pl arg1 arg2
")
open(FILEHANDLE, "actualFilePath");
-- reads existing fileopen(FILEHANDLE, ">actualFile
Path
");
-- create file and write to itopen(FILEHANDLE, ">>actualFile
Path
");
-- append to existing fileopen(FILEHANDLE, "| output-pipe command");
-- set up an output filteropen(FILEHANDLE, "input-pipe command |");
-- set up an input filterSTDIN
and STDOUT
are predefined filehandles
- a program's normal input and output channels. STDIN
always
has a line break tacked onto its end; you can chop its result value
to remove the line break (chop removes the last character fed to it)
close(fileHandle);
$var
= <STDIN>;
while($line = <fileHandle>) { ... }
to
loop through every line of a file. Even shorter, if you drop the $line=
part, then the value of each looping will automatically be
assigned to $_
. Even shorter, you can do: statement
while (<fileHandle>)
@ARGV = pathName
$^I = ".bk"
While (<>) { print; }
$^I
takes the first element in
@ARGV
, renames it to its existing name plus ".bk" at
the end. Next, the file is read line by line into $_
. Next,
a file handle called ARGVOUT
is opened, and print is
targeted upon it, instead of STDOUT
as usual. ARGVOUT
is opened on the original file name, so all changes are made in
the "new" file.
perl -i.bk -ne"print
if /regex/" pathName
. Recall that -e
invokes Perl, and -n adds the loopopendir(FILEHANDLE, "actualPath")
can
be used like the open function, but to open whole directories instead of
files. Then you can read the directory contents with: while($file
= readdir FILEHANDLE)
, which reads each file into $file
.
Note: perl always assumes you are in the directory the script was run from
(or the directory designated by the last chdir). So, to test/access
$file
from within the loop, you'll need to chdir,
or explicitly state the file's path before $file
(eg.
"actualPath/$file"
), etc. This is not an
issue if you are simply listing the files, but not doing anything with them.modulus: | % |
exponentiation: | ** |
repetition: | x (right value equals number of
repeats) |
concatenation: | .
|
$var++
and $var--
increment and decrement $var
by 1 (after it has been referenced by the = operator) . Putting the
operators before the $var
alter its value before it is
referenced.
$var .= $var2
appends $var2
to the end of $var
. This works with the other operators,
as well.OR | or |
lower precedence; evaluates entire left side for truth, then (if false), evaluates right side, and treats left side as if it never existed |
|| |
higher precedence; evaluates code immediately to
its left, then (if false) evaluates code immediately to right.
However, any code on left side that was NOT evaluated is not
forgotten, and will ultimately be run against whatever is left on
right side. This would, for example, break:open FILE, $file || print "failed"; (error
message doesn't print, even if the file does not exist, because
$file evaluates to true - it doesn't evaluate ALL of
left side) |
|
AND | && |
higher precedence; Evaluates left side, and if false, returns false with no other action |
and |
lower precedence than && . See OR
for details. |
|
& |
Like && , but evaluates left side,
and if false, still evaluates right side (even if it will ultimately
return false anyway, overall). Remember, in Perl, evaluation means
something could change (the value of a variable, for example) |
|
XOR | xor |
Either/Or; returns false unless one side is true, and one side is false |
NOT | not |
lower precedence |
! |
higher precedence |
==
is "equal", !=
is
"not equal", <
is "less than", >
is "greater
than", <=
is "less than or equal", <=>
is "0 if
equal, 1 if former is greater, =1 if latter is greater" - this function
should be used to compare numeric values. For strings, use cmp
(which
serves the same purpose).-e fileName |
file exists |
-r fileName |
file is readable |
-w fileName |
file is writeable |
-d fileName |
file is a directory |
-f fileName |
file is a regular file |
-t fileName |
file is a text file |
"A" == "B"
evaluates
to true, "A" eq "B"
evaluates to false;
same for >
(gt
)
and <
(lt
). next
to skip to the next loop
iteration. last
(or
last if
) to skip to the
end of the block, as if the condition returned false. LOOPNAME: while (=) {...}
- you can then last,
next, continue, and redo that loop specifically (in multilevel
loops) - eg. last LOOPNAME if ...;
$!
is a predefined variable: it contains the most
recent Perl error message
CTRL-C
will interrupt a Perl program
*var = *var2
results in everything named var being a synonym for everything
name var2. Can also glob just *var
to var2
.
Prefix var2 with \$
if you just want
scalar vars to be synonyms, not also @
and %
$var = *STDOUT
if ($var =~ /someText/ )
returns true if someText
exists within $var
s/text1/text2
replaces all occurences of
text1
with text2
. This can
also be accomplished with s{text1}{text2}
\s
represents a single unit of
whitespace, \w
represents a single word character (digit or
letter), \d
represents a digit, and \b
represents
a word-boundry (imaginary distinction, 0 chars wide) - this enables you to find "Fred" as opposed
to "Frederick" \w+
represents a
whole word. /\d{7,11}/)
match on their associated
character if it repeats between their number of times. Omit the maximum for
no max. Omit the comma and max to limit by exactly the first number *
next to a character matches zero or more of the
character. ?
matches zero or one. [^]
negates any character left of the ^
$#aryName
retrieves the last index of an array.
Assigning to $#aryName
changes the length of the array.scalar(@aryName)
returns the number of elements in an
array.print @aryName
writes all the items in array, in one
long string; print "@aryName"
writes all
items, but separates each with a single space.@aryName[3..5]
represents indexes 3 thru 5 $aryName[indexNum] = strValue;
@aryName = qw(stringNoQuotes1 stringNoQuotes2 etc)
inserts
values (as strings) into @aryName
, split by space(s)push(@aryName, value1, value2,
etc...) |
adds value(s) to end of array |
pop(@aryName) |
removes and returns value from end of array |
shift(@aryName) |
removes and returns value from beginning of array |
unshift(@aryName, value1, etc...) |
adds value(s) to beginning of array |
splice(@aryName, offset, iRem,
values...) |
deletes iRem values from array,
starting at offset , and replaces them with values
(which are optional) |
%aryName = (
"key1" =>
"value1",
"key2" =>
"value2",
);
%aryName = ('key1', 'value1', 'key2', 'value2',
etc...);
$aryName{'key1'} = 'value2';
(*Note
the use of {}
instead of []
as in scalar arrays) delete $aryName{'keyName'}
someCode keys %aryName
someCode values %aryName
someCode @aryName{'key1', 'key2'}
someCode scalar(keys %aryName)
-
returns # of elementssomeCode exists $aryName{'key1'}
-
booleanforeach (keys %aryName) {print "key $_ contains $aryName{$_};
}
, or:while (($key, $value)=each %aryName) {print "key $key
contains $value"; }
foreach(sort keys %aryName)
and
(reverse sort keys %aryName)
undef $varName
- as
opposed to just setting it equal to "";\n
= newline, \r
= carriage
return, \t
= tab, \f
= form feed, \b
= backspace, \a
= alert (bell), \e
= escape, \cC
= control-C\u
= uppercase next
char, \l
= lowercase next char, \U
= uppercase all
following chars, \L
= etc, \Q
= backslash all
following non-alphanumeric chars $var
= @var
will take the last value in the @var
list
and put it in $var
. A list can contain other lists as
well: eg. (@foo, @bar, &someSub)
()
. You can destroy an
entire array by setting it equal to ()
%ENV
. You can see the value of every environment value
simply by calling set
from the command prompt. $var = `someCommand $someVar`
will interpolate $someVar, then interpret the enclosed system command as
a whole through the shell. You can run any linux command this way. Using
backticks will return the entire output of the process invoked, not just a
success or error code. This can also be accomplished with qx/someCommand/
or qx(someCommand
). Other ways to call system commands
include:
open HANDLE, "someCommand |"
while (<HANDLE>) {print "$. $_";}
someCommand
,
causes data to be piped to the process. exec list
- this ends execution of the Perl
script and runs whatever found in list. If it fails, it instead returns an
error code. This apparently doesn't work in Windowssystem("commandName")
- runs
commandName, then carries on with the script, returning 0 for success or an
error code (which must be divided by 256 to be understood, for some reason),
unlike backticks.
<FILEHANDLE>
yields the
next line from the associated file (ending with newline char). STDIN,
STDOUT,
and STDERR
are the most common examples (and are pre-opened).
Assigning a filehandle to an array (as opposed to a scalar) results in an
indexed list of every line being created within that array. while (<>) {...}
processes each argument (which
should be a filename) passed to the script. <> represents @ARGV
,
only in a specialized way (specific to files). Each filename is presumed open (as if it was a filehandle) and ready to
use within the loop. @aryName = <*something>
creates a list
of every file in the directory matching something.
Similarly,
while (<*>) {...}
processes all files within the working directory (which can be changed with chdir)
- this method is not as efficient as using opendir/readdir
/PATTERN/
- the match operator?PATTERN?
- like match operator, but only matches once =~
operator points a regex at a variable other than $_
.
It is useful, among other things, as the operator in a condition that invokes a regex, ie:
if ($varName =~ /pattern/) {...}
. This can also be used with a while
loop (and global modifier) - useful to make things happen every time a match
occurs. !~
m//
is
the explicit match operator, s///
is the substitution operator.
A substitution will operate upon $_
unless a target is
specified as follows: $varName =~ s///
=
)
to a variable: @aryName = m//
m
, can be used with ANY character as
regex delimiter; for example, you could run a match with m##
or
m%%
instead of m//
- this is good for readability if
your pattern has lots of forward slashes in it.
m// conditionalOperator statement
\ | ( ) [ { ^ $ * + ? .
|
is the equivalent of "OR". Putting
parentheses
around a string
allows for sub-processing of a pattern, eg.
/(Fred|Wilma|Barney) Flinstone/
This also stores the match made
by the subpattern in a
backreference: \1
, \2
, etc..., depending on how many groupings came
before it. That backreference can be used in a following pattern within the
same regex. In addition, its value is also stored in the corresponding
variable $1
, $2
, etc... - useful outside of the
regex. Backreferences are good, for example, if you need to find an HTML tag
and its corresponding close-tag. To prevent the creation of backreferences
(in case you simply don't need them, you just want a subpattern), use
?:
in the following manner: (?:somePattern)
$foo
), but be aware that
doing so slows down the process considerably. The pattern has to recompile
each time through in case the variable changed (unless you use the o
operator). You can even determine one variable name through another: eg,
${$varName}
null
will match at the leftmost position in the string. Rule 2: the engine
tries alternatives in a pattern from left to right.^ |
Matches at the beginning of the string (or line, if /m is used) |
$ |
Matches at the end of the string (or line, if /m is used) |
\b |
Matches at word boundary |
\B |
Matches except at word boundary |
\A |
Matches at the beginning of the string |
\Z |
Matches at the end of the string |
\G |
Matches where previous m//g left off |
(?=...) |
Matches if engine would match ... next [this is a sub-assertion] |
(?!...) |
Matches if engine wouldn't match ... next [this is a sub-assertion] |
{n,m} |
{n,m}? |
Must occur at least n times but no more than m times |
{n,} |
{n,}? |
Must occur at leasn n times |
{n} |
{n}? |
Must match exactly n times |
* |
*? |
0 or more times |
+ |
+? |
1 or more times |
? |
?? |
0 or 1 time |
.
" matches any character except newline (unless you
use the /s
modifier - then it matches newline as well. Such
forward-slash modifiers come at the end of the pattern slashes; eg. s/pattern1/pattern2/s
)
. The period, in conjunction with the asterisk (.*
) matches
anything at all, like %
in SQL.
[abc]
or [a-c]
. Use a backslash to protect a hyphen that would otherwise be interpreted as range
delimiter in this unique circumstance (however, hyphens appearing as the first
or last character within the brackets are interpreted literally - they don't
need a backslash). \a |
alarm (beep) |
\n |
newline |
\r |
carriage return |
\t |
tab |
\f |
formfeed |
\e |
escape |
\d |
digit |
\D |
non-digit |
\w |
alphanumeric word char (add a + to the end to
match an entire word); equals a-zA-Z |
\W |
non-word char |
\s |
whitespace char (same as [ \t\n\r\f] ) - that
is, space, tab, newline, formfeed, or carriage return |
\S |
non-whitespace char |
\1
,
\2
, etc... Again, this
doesn't just repeat that pattern, it references the actual value returned by
that pattern when it ran. If your substitution was overwriting a value you
wanted to preserve, you could use this to put it right back (within the
replacement string). To avoid saving a backreference, use
(?: ...)
You can backreference outside of a pattern with
$1
, $2
,
etc... - scope extends to the end of the enclosing block or eval
string, or to the next successful pattern match (whichever comes first).
$+
to return whatever the last bracket
match matched, $&
to return the entire matched string, $'
to return everything before the matched string, and $'
to
return everything after the matched string.
\033
) matches the
character with that value, unless a backreference with that number exists. This
also works with hexadecimal values.
s/^([^ ]+) +([^ ]+)/$2 $1/;
# swap the first two
words/(\w+)\s*=\s*\1/;
# match "foo
= foo"/.{80,}/;
# match line of at least 80 charsif (/Time: (..):(..):(..)/) {
# pull fields
out of a line$hours = $1;
$minutes = $2;
$seconds = $3;
}
/^fee|fie|foe$/
matches "fee" at
the beginning of the string, or "fie" anywhere, or "foe"
at the end. /^(fee|fie|foe)$/
matches a string consisting
solely of "fee", "fie", or "foe". (?...)
represents a regex extension(?# someText)
is used to comment your patterns (a
simple #
is sufficient if you've enabled the /x
switch)(?:...)
groups a pattern, but prevents the saving of a
backreference(?=...)
lookahead assertion (eg, subpattern); returns
positive if ...
matches. Note: this assertion should come
at the end of your pattern, not before.(?!...)
lookahead assertion (eg, subpattern); returns
negative if ...
matches. Note: this assertion should come
at the end of your pattern, not before. m/somePattern/someOperator)
g |
match globally (find all occurences); this returns a list of matches (in a list context) or returns true for every match until it finally returns false (in a scalar context) |
i |
case-insensitive |
m |
treat string as multiple lines |
o |
only compile pattern once |
s |
treat string as single line (\n not matched by ^
or $ ) |
x |
use extended regular expressions
(ignore
whitespace and # in the pattern, so that you can make it more
readable) |
e |
only applicable to substitutions; tells
the engine to treat the text in the replacement position (eg.
s/PATTERN/REPLACEMENT/e ) as an expression,
rather than simple text. e operators can be stacked -
the more you use, the more passes a regex will make (and,
proportionally, the more evaluations) |
tr/SEARCHLIST/REPLACELIST/
doesn't use
regular expressions; it scans a string character by character and replaces SEARCHLIST
with REPLACELIST
, then returns the number of characters
replaced or deleted. This is actually faster than regex, so its good to use
it when possible. Its modifiers are:
c
- search for every character NOT in SEARCHLIST
d
- delete characters specified in SEARCHLIST
(and found), but not partnered by an item in REPLACELIST
.
If this is not done, the last letter in REPLACELIST
will be used to replace every un-partnered character in SEARCHLIST
s
- sequences of characters that were translated to the
same character are reduced to a single instance tr/a-zA-Z//s
changes
"bookkepper" to "bokeper" tr/a-zA-Z/ /cs
changes non-alphas to a
single space-r |
file is readable by effective uid/gid (in caps, by real uid/gid) |
-w |
file is writable by effective uid/gid (in caps, by real uid/gid) |
-x |
file is executable by effective uid/gid (in caps, by real uid/gid) |
-o |
file is owned by effective uid/gid (in caps, by real uid/gid) |
-e |
file exists |
-z |
file has zero size |
-s |
file has non-zero size |
-f |
file is a plain file |
-d |
file is a directory |
-l |
file is a symbolic link |
-p |
file is a named pipe |
-S |
file is a socket |
-b |
file is a blcok special file |
-c |
file is a character special file |
-t |
filehandle is opened to a tty |
-T |
file is a text file |
-B |
file is a binary file (opposite of -T ) |
-M |
age of file (at script startup) in days since modification; returned in days (including fractional) |
-A |
age of file (at script startup) in days since last access; returned in days (including fractional) |
-C |
age of file (at script startup) in days since inode change; returned in days (including fractional) |
-T
, good to test with -f
first
(eg, next unless -f $file && -T _;
) ..
returns a list
of values (counting by ones) from the left value to the right value - useful
for loops. In a scalar context, ..
returns a boolean value.
That value is false as long as the left operand is false. Once the left
value is true, the value will be true until the right operand is true Expression ? If_True_Then
: If_False_Then
If expr
,
Or expr
, Unless expr
, While expr
,
and Until expr
for ($1 = 1; $1 < 10; $i++) { ... }
foreach $varName (@aryName) { ...
}
my
, which will limit the scope to the block,
subroutine, eval, or file in question. A block is simply a block of code
enclosed in solitary curly braces - useful for limiting scope. Declare multiple values this way
by enclosing them in parenthesis: eg, my ($var1, $var2);
Declare them with local to, at least, make the
variable(s) accessible to other subroutines/functions called from within the
block in question; otherwise, they won't even be available there. local
is also the only way to limit the scope of global special variables.
user integer;
tells the compiler that it may use integer operations
till the end of the enclosing block. sub subName(LIST) {...};
(LIST)
is
not required. Passing in an array as an argument turns the array into a flat
list of scalars. The arguments passed into a sub can be accessed through the
local @_
variable. Another way of calling subroutines
includes prefixing the subName
with &
,
which is mostly optional, but makes a difference within a subroutine itself
(calling &subName
automatically pulls in @_
,
subName
by itself does not). Subroutines can return more
than one value; for example: ($rVar1, $rVar2)
= &subName(...);
return ($var1, $var2);
$_
- the default input an pattern-searching space ($ARG
)$.
- current input line number of the last filehandle
read ($INPUT_LINE_NUMBER, $NR
)$/
- input record separator; newline by default ($INPUT_RECORD_SEPARATOR,
$RS
)$!
- contains current error number ($OS_ERROR,
$ERRNO
)$`
- the string preceding whatever was matched by the
last successful pattern match$'
- the string following whatever was matched by the
last successful pattern match$,
- defines what Perl should put in between elements of
lists that are printed use moduleName;
lib
. Within lib
,
make a directory for your module. Save the script into that directory as scriptName.pm
(perl module). If the script is simply one or more subroutines, it should end
with: 1;
which is necessary because all Perl modules must return
true. To use the module, just call it with: use dirName::scriptName;
then, you can call any subroutine within the module.File::Find
- enables use of find function, which
takes two params: a subroutine that determines what to do with the list of
files returned, and a list of directories to be searched. The filenames
found are stored in $_
(within your subroutine)strict
- a pragma which requires all variables, references,
and subroutines be explicitely declared$someVar = Win32::ChangeNotify->new($pathToWatch, $subTrees,
$events);
while (1) {
$someVar->wait;
...
$someVar->reset;
}
$events
contains a predefined string saying what sort
of event to watch for (for example, "FILE_NAME"
,
which covers filename changes). The wait method (within the loop)
causes the program to pause until the directory is changed somehow. The reset
method does what it says after a change has occured, and the code between
wait and reset has run.Functions
abs
- returns absolute value of supplied argument (or
$_
, if no argument)alarm int
- fires an alarm signal; int
is
the number of seconds from now for it to go offchdir directory
- changes Perl's working
directory (you should ...or die "Error";
when using chdir)join delimiter, @aryName
- joins the
elements of @aryName
into one string, separated by
delimiter. A comma-delimited list of strings or scalar variables can be
supplied to join in place of @aryName
grep expression, @aryName
- returns a
new array containing elements of @aryName
that match expression
.
If expression
is not a simple regex (say, a statement
including conditional logic), it should be enclosed in {}
- in which case, a comma is not necessary after it. If expression
is
not a match condition at all, but an action (say, chop), the chopped
remains of elements will be returned. Expression
can
also be a subroutine callint(value)
- casts value as an integer (no
decimal points)lc (strValue)
- makes strValue
lowercasemap expression, @aryName
- like grep,
it returns an array, but that array contains the results of each true
evaluation of @aryName
against
expression
,
rather than the matching elements themselves. If expression
is
not a match condition at all, but an action (say, chop), the letters
that were chopped will be returned. Expression
can also
be a subroutine call. If you want map to return values rather than
results, you can add ;$_
to the end of expression (within the
curly braces), which results in $_ being the last thing evaluated.printf FILEHANDLE list
- outputs a
formatted string to STDOUT
if FILEHANDLE
is
omitted. To simply return, rather than output, use sprintf. The first
item in list is a string that indicates how to format the rest of the items.
That string should be formatted as follows: %m.nx
- where m
and n
are optioanl sizes, and x
is
one of the following:c
- characterd -
decimal integere
- exponential formatf
- fixed point formats
- stringx
- hexidecimal numberX
- hex num with upercase letterssplit delimiter, $varName, numElements
-
turns $varName
into an array, sliced by delimiter
(a string value). The target can be a scalar (in which case the
number of elements created is returned), or an array (in which case the
elements are plugged into it), or a list of scalars and/or arrays (in which
case as many elements as there are room for get plugged in; the rest are
discarded). In the latter case, the target list should be enclosed in
parentheses. The last parameter, numElements
, sets the
maximum number of return elements.sort subname, list
- sorts list
(array or list of scalars) by standard string comparison, unless you use subname
,
which should call a subroutine that returns either 1, 0, and -1, based on
evaluations of $a
and $b
, which
should *not* be modified within the subroutine - just referenced.srand
- initializes the random number generator. After
this, you can call rand
, which will either generate a
number between 0 and 1, or between 0 and whatever number is fed to it as a
parameterReferences and Nested Data Strucutres
$someVar = \$someOtherVar;
$someConstant = \someOtherConstant;
$someSub = \&someOtherSub;
To access the value of a referenced item, you need to
"dereference" it. Otherwise, the reference you create is nothing
more than a pointer at the value's space in memory, which is great for
passing it around, but useless once you actually need the value itself. For
example:$var = $$var2;
&$someSub(1,2,3);
$$arrayref[0] = "January";
$refrefref =
\\\"hello";
the string "hello" can later be
retrieved with $$$$refrefref
->
For example, instead of: $$arrayref[0] =
"January"
, you can do: $arrayref->[0] =
"January"
$var = "hey";
$$var = 1; # $hey is set to equal 1