Lecture 1: Introduction, Basic UNIX - Computer Science

Lecture 1: Introduction, Basic UNIX - Computer Science

Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks. Awk Works well on record-type data Reads input file(s) a line at a time Parses each line into fields Performs user-defined tests against each line, performs actions on matches Other Common Uses

Input validation Every record have same # of fields? Do values make sense (negative time, hourly wage > $1000, etc.)? Filtering out certain fields Searches Who got a zero on lab 3? Who got the highest grade?

Many others Invocation Can write little one-liners on the command line (very handy): print the 3rd field of every line: $ awk '{ print $3 }' input.txt Execute an awk script file: $ awk f script.awk input.txt Or, use this sha-bang as the first line, and give your script execute permissions: #!/bin/awk -f

Form of an AWK program AWK programs are entries of the form: pattern { action } pattern some test, looking for a pattern (regular expressions) or C-like conditions if null, actions are applies to every line action a statement or set of statements if not provided, the default action is to print the entire line, much like grep

Form of an AWK program Input files are parsed, a record (line) at a time Each line is checked against each pattern, in order There are 2 special patterns: BEGIN true before any records are read END true at end of input (after all records have been read)

Awk Features Patterns can be regular expressions or C like conditions. Each line of the input is matched against the patterns, one after the next. If a match occurs the corresponding action is performed. Input lines are parsed and split into fields, which are accessed by $1,,$NF, where NF is a variable set to the number of fields. The variable $0 contains the entire line, and by default lines are split by white space (blanks, tabs) Variables

Not declared, nor typed No character type Only strings and floats (support for ints) $n refers to the nth field (where n is some integer value) # prints each field on the line for( i=1; i<=NF; ++i ) print $i Some Built-in Variables FS the input field separator OFS the output field separator

NF # of fields; changes w/each record NR the # of records read (so far). So, the current record # FNR the # of records read so far, reset for each named file $0 the entire input line Example Print pay for those employees who actually worked $ awk $3>0 {print $1, $2*$3} emp.data Kathy Mark Mary

Susie 40 100 121 76.5 $ cat emp.data Beth 4.00 0 Dan 3.75 0 Kathy 4.00 10 Mark 5.00 20 Mary 5.50 22 Susie 4.25 18

Example CSV file $ cat students.csv smith,john,js12 jones,fred,fj84 bee,sue,sb23 fife,ralph,rf86 james,jim,jj22 cook,nancy,nc54 banana,anna,ab67 russ,sam,sr77 loeb,lisa,guitarHottie $ cat getEmails.awk #!/bin/awk -f $ getEmails.awk students.csv

john's email is: [email protected] fred's email is: [email protected] sue's email is: [email protected] ralph's email is: [email protected] jim's email is: [email protected] nancy's email is: [email protected] anna's email is: [email protected] sam's email is: [email protected] lisa's email is: [email protected] BEGIN { FS = "," } { printf( "%s's email is: %[email protected]\n", $2, $3 ); } Example output separator $ cat out.awk

#!/bin/awk -f BEGIN { FS = ","; OFS = "-*-"; } { print $1, $2, $3; } $ out.awk students.csv smith-*-john-*-js12 jones-*-fred-*-fj84 bee-*-sue-*-sb23 fife-*-ralph-*-rf86 james-*-jim-*-jj22 cook-*-nancy-*-nc54 banana-*-anna-*-ab67 russ-*-sam-*-sr77 loeb-*-lisa-*-guitarHottie Flow Control Awk syntax is much like C

Same loops, if statements, etc. AWK: Aho, Weinberger, Kernighan Kernighan and Ritchie wrote the C language Associative Arrays Awk also supports arrays that can be indexed by arbitrary strings. They are implemented using hash tables. Total[Sue] = 100; It is possible to loop over all

indices that have currently been assigned values. for (name in Total) print name, Total[name]; Example using Associative Arrays $ cat scores Fred 90 Sue 100 Fred 85 Sam 70 Sue 98 Sam 50 Fred 70 $ cat total.awk

{ Total[$1] += $2} END { for (i in Total) print i, Total[i]; } $ awk -f total.awk scores Sue 198 Sam 120 Fred 245 Useful one-liners Line count: awk 'END {print NR}' grep

awk '/pat/' head awk 'NR<=10' Add line #s to a file awk '{print NR, $0}' awk '{ printf( "%5d %s", NR, $0 )}' Many more. See the resources tab on the course webpage for links to more examples.

Recently Viewed Presentations

  • Diapositive 1 - Free

    Diapositive 1 - Free

    S'aider du paramétrage RUD Savoir reconnaître et traiter une dépression Écoute et respect exigibles Je pense souvent que ce serait mieux pour tout le monde si je n'étais plus là… Je suis complètement inutile… A quoi bon continuer ?... Je...
  • TRA 5514B : Terminologie transsystmique et documentation :

    TRA 5514B : Terminologie transsystmique et documentation :

    Canadian Abridgment : un petit rappel N'oubliez pas de toujours consulter le supplément pour chaque série pour vous assurer que vous avez les informations les plus à jour. Les suppléments se retrouvent à deux endroits : à côté de chaque...
  • Aristotle the Great - University of Oregon

    Aristotle the Great - University of Oregon

    A 10 lb weight would reach Earth by the time a 1 lb weight had fallen one-tenth as far Oh yeah, sucker, prove it …. Projectile Motion In Aristotle's view, objects moved parallel to the Earth's surface until it was...
  • Percepción de ambientes de trabajo libres de humo de tabaco ...

    Percepción de ambientes de trabajo libres de humo de tabaco ...

    En un esfuerzo por disminuir la exposición de la población a los efectos nocivos del HTA en el 2003, la Organización Mundial de la Salud (OMS) aprueba el Convenio Marco para el Control del Tabaco (CMCT)3, el cual en su...
  • ME 3031 Lecture Notes Week 1

    ME 3031 Lecture Notes Week 1

    Types of Errors Difference between measured result and true value. Illegitimate errors Blunders resulting from mistakes in procedure. You must be careful.
  • ECONOMICS - Universität Graz

    ECONOMICS - Universität Graz

    The Plan. Perfect competition. A simple graphical model of general equilibrium. Comparative statics. A mathematical model of an exchange economy. FUN Theorems of welfare economics
  • Future PASSHE Academic Libraries & Collaboration sweeney@njit.edu Richard

    Future PASSHE Academic Libraries & Collaboration [email protected] Richard

    64 MB Memory Chip. 2 Panasonic Cordless phones. R009770. Graybar Electric Co. ALA. Membership Dues for Rich. P008621. P008623. P008620. Slutsky. $200.13 =3 toners @ 66.71 ea. ACCOUNT. 9049 OBJECT CODE . 8-80051-6999. On-line literature search in CA done on...
  • The Role of Color in Design

    The Role of Color in Design

    The Role of Color in Design. Element of design #1. Color. Color is the most important, versatile, and distinctive of the elements of design. Color is almost always the . first. thing you notice when entering a room. ... creating...