AutoTutor Authoring Tool -

AutoTutor Authoring Tool -

Natural Language Processing Vasile Rus 1 Outline Introduction to Perl 2 About Perl 1987 Larry Wall Develops PERL

1989 October 18 Perl 3.0 is released under the GNU Public License (GPL) 1991 March 21 Perl 4.0 is released under the GPL and the new Perl Artistic License Now Perl 6 PERL is not officially a

Programming Language per se. Walls original intent was to develop a scripting language more powerful than Unix Shell Scripting, but not as tedious as C. PERL is an interpreted language. That means that there is no explicitly separate compilation step. Rather, the processor reads the whole file, converts it to an internal form and executes it immediately. P.E.R.L. = Practical Extraction 3 and Report Language

Recommended Readings Learning Perl by Randal L. Schwartz and Tom Phoenix, ISBN 0-596-00132-0 OReilly, Llama book Online: (OBSOLETE; check the class website for a local copy) 4 Taste of Perl Problem: Substitute Text In Multiple Files perl -e 's/andrew/andy/gi' -p -i.bak *.txt g: globally i: ignoring cases -p, add loops around your -e code i.bak, if a .txt file needs to be changed it is

backed up next slide presents a procedural equivalent to the same problem 5 Taste of Perl $original=andrew'; $replacement=andy"; $nchanges = 0; undef $/; # no record separator, meaning the default \n will not be used; an entire file is read at once below # as one record foreach $file (@ARGV) { if (! open(INPUT,"<$file") ) { print STDERR "Can't open input file $file\n"; next; } # Read input file as one long record. $data=;

close INPUT; if ($data =~ s/$original/$replacement/gi) { $bakfile = "$file.bak"; # Abort if can't backup original or output. if (! rename($file,$bakfile)) { die "Can't rename $file $!"; } # $! Contains errorno or string if (! open(OUTPUT,">$file") ) { die "Can't open output file $file\n"; } print OUTPUT $data; close OUTPUT; print STDERR "$file changed\n"; $nchanges++; } else { print STDERR "$file not changed\n"; } } print STDERR "$nchanges files changed.\n"; exit(0); 6 Variables

A variable is a name of a place where some information is stored. For example: $yearOfBirth = 1976; $currentYear = 2000; $age = $currentYear-$yearOfBirth; print $age; The variables in the example program can be identified as such because their names start with a dollar ($). Perl uses different prefix characters for structure names in programs. Here is an overview: $: variable containing scalar values such as a number or a string @: variable containing a list with numeric keys %: variable containing a list with strings as keys &: subroutine *: matches all structures with the associated name

7 Operations on numbers Perl contains the following arithmetic operators: +: sum -: subtraction *: product /: division %: modulo division **: exponent Apart from these operators, Perl contains some built-in arithmetic functions. Some of these are mentioned in the following list: abs($x): absolute value int($x): integer part

rand(): random number between 0 and 1 sqrt($x): square root 8 Input and output # age calculator print "Please enter your birth year "; $yearOfBirth = <>; chomp($yearOfBirth); print "Your age is ",2007-$yearOfBirth,".\n"; # count the number of lines in a file open (INPUTFILE, <$myfile) || die Could not open the file $myfile\n; $count = 0; while($line = ) { $count++; } print $count lines in file $myfile\n; # open for writing

open OUTPUTFILE, >$myfile; 9 Conditional structures # determine whether number is odd or even print "Enter number: "; $number = <>; chomp($number); if ($number-2*int($number/2) == 0) { print "$number is even\n"; } elsif (abs($number-2*int($number/2)) == 1) { print "$number is odd\n"; } else { print "Something strange has happened!\n"; } 10

Numeric test operators An overview of the numeric test operators: ==: equal !=: not equal <: less than <=: less than or equal to >: greater than >=: greater than or equal to All these operators can be used for comparing two numeric values in an if condition. Truth expressions: three logical operators: and: and (alternative: &&) or: or (alternative: ||) not: not (alternative: !) 11

Iterative structures #print numbers 1-10 in three different ways $i = 1; while ($i<=10) { print "$i\n"; $i++; } for ($i=1;$i<=10;$i++) { print "$i\n"; } foreach $i (1,2,3,4,5,6,7,8,9,10) { print "$i\n"; }

Stop a loop, or force continuation: last; next; # C break # C continue; Exercise: Read ten numbers and print the largest, the smallest and a count representing how many of them are divisible by three. if (not(defined($largest)) or $number > $largest) { $largest = $number; } if (not(defined($smallest)) or $number < $smallest) { $smallest = $number; } if ($number-3*int($number/3) == 0) { $count3++; } 12 A parenthesis: PERL philosophy(ies) There is more than one way to do it

If you want to shoot yourself in the foot, who am I to stop you? And a comment: DO write comments in your Perl programs! 13 Basic string operations strings are stored in the same type of variables we use for storing numbers string values can be specified between double and single quotes !!! in the former specification variables will be evaluated, in the latter they will not Comparison operators for strings eq: equal

ne: not equal lt: less than le: less than or equal to gt: greater than ge: greater than or equal to - Examples: if ($a eq $b) { . } 14 String substitution and string matching

The power of Perl! The s/// operator modifies sequences of characters The tr/// operator changes individual characters. The m/// operator checks for matching (or in short //) the first part between the first two slashes contains a search pattern the second part between the final two slashes contains the replacement behind the final slash we can put characters/options to modify the behavior of the commands By default s/// only replaces the first occurrence of the search pattern append a g to the operator to replace every occurrence append an i to the operator, to have the search case insensitive The tr/// operator allows modification of characters c (replace the complement of the search class) d (delete characters of the search class that are not replaced) s (squeeze sequences of identical replaced characters to one character) =~ operator performs pattern matching and substitution

15 Examples # replace first occurrence of "bug" $text =~ s/bug/feature/; # replace all occurrences of "bug" $text =~ s/bug/feature/g; # convert to lower case $text =~ tr/[A-Z]/[a-z]/; # delete vowels $text =~ tr/AEIOUaeiou//d; # replace nonnumber sequences with x $text =~ tr/[0-9]/x/cs; # replace all capital characters by CAPS $text =~ s/[A-Z]/CAPS/g; 16 Grep with Perl

Simple example: Print all lines from a file that include a given sequence of characters [emulate grep behavior] Regular Expression: is a template for strings In Perl a regular expression is like a predicate: Returns true if a string matches the regular expression False, otherwise $_ = pattern matching; if ( /pattern/ ){ # equivalent to ($_ =~ /pattern/) print matching; } else { print not maching; } 17 Regular expressions

\b: word boundaries \d: digits Examples: \n: newline

1. Clean an HTML formatted text \r: carriage return \s: white space characters 2. Grab URLs from a Web page \t: tab \w: alphanumeric characters 3. Transform all lines from a file into ^: beginning of string lower case $: end of string .: any character [bdkp]: characters b, d, k and p [a-f]: characters a to f [^a-f]: all characters except a to f (abc|def): string abc or string def (used to indicate alternatives or as memory; it stores matched strings in \1, \2, and $1, $2, ) 18 More on Regular Exp

*: zero or more times +: one or more times ?: zero or one time {p,q}: at least p times and at most q times {p,}: at least p times {p}: exactly p times 19 Lists and Arrays Scalar: elementary data numbers or strings (in Perl)

List = ordered collection of scalars Array = variable that contains a list Element of a list or array is an independent scalar 20 Lists and Arrays @a = (); # empty list @b = (1,2,3); # three numbers @c = ("Jan","Piet","Marie"); # three strings @d = ("Dirk",1.92,46,"20-03-1977"); # a mixed list Variables and sublists are interpolated in a list @b = ($a,$a+1,$a+2); # variable interpolation, assume $a = 1 @c = ("Jan",("Piet","Marie")); # list interpolation @d = ("Dirk",1.92,46,(),"20-03-1977"); # empty list interpolation @e = ( @b, @c ); # same as (1,2,3,"Jan","Piet","Marie") Practical construction operators

($x..$y) @x = (1..6) # same as (1, 2, 3, 4, 5, 6) @y = (1.2..5.2) # same as (1.2, 2.2, 3.2, 4.2, 5.2) @z = (2..5,8,11..13) # same as (2,3,4,5,8,11,12,13) qw() ("quote word") function qw(Jan Piet Marie) is a shorter notation for ("Jan","Piet","Marie"). 21 Split function $string = "Jan Piet\nMarie \tDirk"; @list = split /\s+/, $string; # yields ( "Jan","Piet","Marie","Dirk" ) $string = " Jan Piet\nMarie \tDirk\n"; # watch out, empty string at the begin and end!!! @list = split /\s+/, $string; # yields ( "", "Jan","Piet","Marie","Dirk", "" ) $string = "Jan:Piet;Marie---Dirk"; # use any regular expression... @list = split /[:;]|---/, $string; # yields ( "Jan","Piet","Marie","Dirk" ) $string = "Jan Piet"; # use an empty regular expression to split on letters

@letters= split //, $string; # yields ( "J","a","n"," ","P","i","e","t") 22 Split Function - Example Example: 1. Separate simple punctuation from words in a text (, . ; ! ? ( ) ) 2. Add all the digits in a number 23 More about Arrays @array = ("an","bert","cindy","dirk"); $length = @array; # $length now has the value 4 @array = ("an","bert","cindy","dirk"); $length = @array; print $length; # prints 4 print $#array; # prints 3 print $array[$#array] # prints "dirk" print scalar(@array) # prints 4

24 More about Arrays ($a, $b) = ("one","two"); ($onething, @manythings) = (1,2,3,4,5,6) # now $onething equals 1 # and # @manythings = (2,3,4,5,6) ($array[0],$array[1]) = ($array[1],$array[0]); # swap the first two Pay attention to the fact that assignment to a variable first evaluates the right hand-side of the expression, and then makes a copy of the result @array = ("an","bert","cindy","dirk"); @copyarray = @array; # makes a copy $copyarray[2] = "XXXXX"; 25 Manipulating lists and their elements

push ARRAY LIST appends the list to the end of the array. if the second argument is a scalar rather than a list, it appends it as the last item of the array. @array = ("an","bert","cindy","dirk"); @brray = ("evelien","frank"); push @array, @brray; # @array is ("an","bert","cindy","dirk","evelien","frank") push @brray, "gerben"; # @brray is ("evelien","frank","gerben") 26 Manipulating lists and their elements pop ARRAY does the opposite of push; it removes the last item of its argument list and returns it; if the list is empty it returns undef @array = ("an","bert","cindy","dirk");

$item = pop @array; # $item is "dirk" and @array is ( "an","bert","cindy") shift ARRAY works on the left end of the list, but is otherwise the same as pop. unshift ARRAY LIST puts stuff on the left side of the list, just as push does for the right side. 27 Working with lists Convert lists to strings @array = ("an","bert","cindy","dirk"); print "The array contains $array[0] $array[1] $array[2] $array[3]"; # interpolate print "The array contains @array"; function join STRING LIST. $string = join ":", @array; # $string now has the value "an:bert:cindy:dirk" $string = join "+", "", @array; # $string now has the value "+an+bert+cindy+dirk"

28 Working with lists Iteration over lists for( $i=0 ; $i<=$#array; $i++){ $item = $array[$i]; $item =~ tr/a-z/A-Z/; print "$item "; } foreach $item (@array){ $item =~ tr/a-z/A-Z/; print "$item "; # prints a capitalized version of each item } 29 Grep and map

grep CONDITION LIST returns a list of all items from list that satisfy some condition. For example: @large = grep $_ > 10, (1,2,4,8,16,25); # returns (16,25) @i_names = grep /i/, @array; # returns ("cindy","dirk") map OPERATION LIST is an extension of grep, and performs an arbitrary operation on each element of a list.

For example: @more = map $_ + 3, (1,2,4,8,16,25); # returns (4,5,7,11,19,28) @initials = map substr($_,0,1), @array; # returns ("a","b","c","d") 30 Hashes (Associative Arrays) - associate (non-numeric) keys with values allows for almost instantaneous lookup of a value that is associated with some particular key - Existing, Defined and true. If the value for a key does not exist in the hash, the access to it returns the undef value. special test function exists(HASHENTRY) returns true if the hash key exists in the hash

if($hash{$key}){...}, or if(defined($hash{$key})){...} return false if the key $key has no associated value - 31 Hashes (contd) - Examples $wordfrequency{"the"} = 12731; # creates key "the", value 12731 $phonenumber{John Smith"} = "+1-901-678-5259"; $index{$word} = $nwords; $occurrences{$a}++; # if this is the first reference, # the value associated with $a will # be increased from 0 to 1 %birthdays = ("An","25-02-1975","Bert","12-10-1953","Cindy","23-051969","Dirk","01-04-1961"); # fill the hash %birthdays = (An => "25-02-1975", Bert => "12-10-1953", Cindy =>

"23-05-1969", Dirk => "01-04-1961" ); # fill the hash; the same as above, but more explicit @list = %birthdays; # make a list of the key/value pairs %copy_of_bdays = %birthdays; # copy a hash 32 Operations on Hashes keys HASH returns a list with only the keys in the hash. As with any list, using it in a scalar context returns the number of keys in that list. values HASH returns a list with only the values in the hash, in the same order as the keys returned by keys. foreach $key (sort keys %hash ){ push @sortedlist, ($key , $hash{$key} );

print "Key $key has value $hash{$key}\n"; } 33 Operations on Hashes reverse the direction of the mapping, i.e. construct a hash with keys and values swapped: %backwards = reverse %forward; (if %forward has two identical values associated with different keys, those will end up as only a single element in %backwards) - hash slice @birthdays{"An","Bert","Cindy","Dirk"} = ("25-02-1975","12-101953","23-05-1969","01-04-1961"); each( HASH ) traverse a hash while (($name,$date) = each(%birthdays)) { print "$name's birthday is $date\n"; } # alternative: foreach $key (keys %birthdays)

34 Multidimensional data structures Perl does not really have multi-dimensional data structures, but a nice way of emulating them, using references $matrix[$i][$j] = $x; $lexicon1{"word"}[0] = $partofspeech; $lexicon2{"word"}{"noun"} = $frequency; Array of arrays @matrix = ( # an array of references to anonymous arrays

[1, 2, 3], [4, 5, 6], [7, 8, 9] ); 35 Multidimensional structures Hash of arrays %lexicon1 = ( the => [ "Det", 12731 ], man => [ "Noun", 658 ], with => [ "Prep", 3482 ] ); # a hash from strings to anonymous arrays Hash of hashes %lexicon2 = ( # a hash from strings to anonymous hashes of

# strings to numbers the => { Det => 12731 }, man => { Noun => 658 , Verb => 12 }, with => { Prep => 3482 } ); 36 Programming Example A program that reads lines of text, gives a unique index number to each word, and counts the word frequencies #!/usr/local/bin/perl # read all lines in the input $nwords = 0; while(defined($line = <>)){ # cut off leading and trailing whitespace

$line =~ s/^\s*//; $line =~ s/\s*$//; # and put the words in an array @words = split /\s+/, $line; if([email protected]){ # there are no words? next; } # process each word... while($word = pop @words){ # if it's unknown assign a new index if(!exists($index{$word})){ $index{$word} = $nwords++; } # always update the frequency $frequency{$word}++; } } # now we print the words sorted

foreach $word ( sort keys %index ){ print "$word has frequency $frequency{$word} and index $index{$word}\n"; } 37 A note on sorting If we would like to have the words sorted by their frequency instead of by alphabet, we need a construct that imposes a different sort order. sort function can use any sort order that is provided as an expression. the usual alphabetical sort order: sort { $a cmp $b } @list; - !! $a and $b are placeholders for the two items from the list that are to be compared. Do not attempt to replace them with other variable names. Using $x and $y instead will not provide the same effect - a numerical sort order: sort { $a <=> $b } @list; - for a reverse sort, change the order of the arguments: sort { $b <=> $a } @list;

- sort the keys of a hash by their value instead of by their own identity, substitute the values for the arguments of sort: sort { $hash{$b} <=> $hash{$a} } ( keys %hash ) 38 Basics about Subroutines Calls to subroutines can be recognized because subroutine names often start with the special character &. sub askForInput { print "Please enter something: "; } # function call &askForInput(); Tip: put related subroutines in a file (usually with the extention .pm = perl module) and include the file with the command require: # files with subroutines are stored here

use lib "C:\PERL\MYLIBS"; # we will use this file require "nlp"; 39 Variables Scope A variable $a is used both in the subroutine and in the main part program of the program. $a = 0; print "$a\n"; sub changeA { $a = 1; } print "$a\n"; &changeA(); print "$a\n";

The value of $a is printed three times. Can you guess what values are printed? $a is a global variable. 40 Variables Scope Hide variables from the rest of the program using my. my $a = 0; print "$a\n"; sub changeA { my $a = 1; } print "$a\n"; &changeA(); print "$a\n";

What values are printed now? 41 Communication between subroutines and programs Provide the arguments of the subroutine call: &doSomething(2,"a",$abc). Perl converts all arguments to a flat list. This means that &doSomething((2,"a"),$abc) will result in the same list of arguments as the earlier example.

Access the argument values inside the procedure with the special list @_. E.g. my($number, $letter, $string) = @_; # reads the parameters from @_ A tricky problem is passing two or more lists as arguments of a subroutine. &sub(@a,@b) the subroutine receives the two list as one big list and it will be unable to determine where the first ends and where the second starts. pass the lists as reference arguments: &sub(\@a,\@b). 42 Communication between subroutines and programs -

Subroutines also use a list as output. # the return statement from a subroutine return (1,2); # or simply (1,2) # read the return values from the subroutine ($a,$b) = &subr(). - Read the main program arguments using $ARGC and @ARGV (same as in C) 43 More about file management open(INFILE,"myfile"): reading open(OUTFILE,">myfile"): writing open(OUTFILE,">>myfile"): appending

open(INFILE,"someprogram |"): reading from program open(OUTFILE,"| someprogram"): writing to program opendir(DIR,"mydirectory"): open directory Operations on an open file handle $a = : read a line from INFILE into $a @a = : read all lines from INFILE into @a $a = readdir(DIR): read a filename from DIR into $a @a = readdir(DIR): read all filenames from DIR into @a read(INFILE,$a,$length): read $length characters from INFILE into $a print OUTFILE "text": write some text in OUTFILE Close files/directories close(FILE): close a file closedir(DIR): close a directory

44 Other file management commands binmode(HANDLE): change file mode from text to binary unlink("myfile"): delete file myfile rename("file1","file2"): change name of file file1 to file2 mkdir("mydir"): create directory mydir rmdir("mydir"): delete directory mydir chdir("mydir"): change the current directory to mydir system("command"): execute command command die("message"): exit program with message message warn("message"): warn user about problem message Example open(INFILE,"myfile") or die("cannot open myfile!"); 45

Other About $_: Holds the content of the current variable Examples: while() # $_ contains the current line read foreach (@array) # $_ contains the current element in @array 46 Summary Introduction to Perl 47

Recently Viewed Presentations

  • To Kill A Mockingbird

    To Kill A Mockingbird

    White folks of Maycomb & Maycomb County. The Ewell Family. Tom Robinson. Even the law was one-sided: Juries were always all-white and all-male. The word of a black man meant nothing against the word of a white man. ... Lee...

    Word. Sentence. 1. almost. It's almost time to go home. 2. answer. When someone asks you a question, you should answer. 3. because. I like the beach because I like to build sand
  • Miss Adcock 7th January Monday Training day Tuesday

    Miss Adcock 7th January Monday Training day Tuesday

    Monday. Tuesday. Wednesday . Thursday . Friday . 1. 8H4. 9K3. NEW CLASS . BACTERIAL GROWTH. PPA. 11D laptop 2 sets. Selecting final menu. 10C . Assessment smart foods ...
  • The Wright Brothers Presented by David M Rogers

    The Wright Brothers Presented by David M Rogers

    Who are the Wright Brothers. ... Wright Bros Plane. Boeing 747 8l. Total Engine HP. 12 HP ~ 90,000 HP . Engine Weight ~ 180 lbs. 12,400 lbs. Top Speed ~ 30 mph ~ 600 mph (cruising) Engine Comparison: The...
  • CELLS!! - Grants Pass School District

    CELLS!! - Grants Pass School District

    7th Grade- Unit 3December 5, 2015HW: Cell Project/ Study for TEST! AGENDA:Study Guide end of bookletCell Party!! BELLWORK:. Tell a neighbor the differences between a plant and animal cell.
  • To Kill A Mockingbird

    To Kill A Mockingbird

    To Kill A Mockingbird By Harper Lee UNIT OVERVIEW & INTRODUCTION TO THE NOVEL * * * * * * * * * * Harper Lee Born Nelle Harper Lee on 28th April 1926 in Monroeville Alabama.
  • Friendly Letter Boogie -

    Friendly Letter Boogie -

    Friendly Letter Boogie. ... Helps students remember the format of a letter rather than having to memorize the format. ... Best implemented after modeling and teaching students how to write a format of a letter. Can add music or create...
  • Feb. 2017 15-17-0130-00-lpwa Project: IEEE P802.15 Working Group

    Feb. 2017 15-17-0130-00-lpwa Project: IEEE P802.15 Working Group

    Dimensioning and Parameter Configuration of 802.15.4 CSMA/CA-based Metering Networks. Target Report Success Probability. Maximize Battery Lifetime. ... Contour Plot for ?? Performance versus ? and ?????given ??=1000 and ??=6, ??=?? and Urban ...