Introduction to AWK utility AWK is a programming language created by Aho , Kernighan, and Weinberger. It is useful for: manipulation of data files, text retrieval and processing, generation of reports, and for prototyping and experimenting with algorithms.
Versions: awk, nawk , mawk, pgawk , and gawk (GNU).
AWK CONTD An AWK program is a sequence of pattern {action} pairs and function definitions. Short programs are entered on the command line usually enclosed in ' ' to avoid shell interpretation. Longer programs can be read in from a file with the -f option.
AWK CONTD.
Syntax
AWK CONTD.
• pattern {action} . • One, but not both, of pattern {action} can be omitted. • i.e., a program must have either pattern or {action}, or both. • If pattern is missing, action is applied to all lines (it is implicitly matched), • if action is missing, the matched line is printed (it is implicitly {print}). • E.g., the command: awk '/for/' testfile prints all lines containing string “ for ” in testfile
Basic Terminology of Input files • Data on the input file is broken into records as determined by the record separator variable, RS. • By default, RS = "\n" i.e. new line. • Each line of data or text on the input file is referred to as a record. • Records are read in one at a time, and the current record is stored in the field variable $0.
A record is split into fields which are stored in the field buffers $1, $2, ..., $NF. • A field is thus, a unit of data in a line (record). • Each field in a record is separated from the other fields by the field separator, FS. • The default field separator is whitespace.
Some System/Built-in Variables
EXAMPLE 1
EXAMPLE 2
• A pattern can be: BEGIN, END, expression expression, expression • Note that, BEGIN and END patterns require an action. • An AWK script can be divided into three main parts as follows:
• BEGIN: performs pre-processing that must be completed before awk starts reading records from the input file. • Mostly to initialize variables and to create report headings. • BODY: contains main processing logic to be applied to input records, • like a loop that processes input data one record at a time: • the body executes mostly ones for each record. • END: post-processing contains logic to be executed after all input data have been processed. • Logic such as printing report grand total are performed in this part of the script.
Statements • Statements in an AWK program are terminated by newlines, semi-colons or both. • Groups of statements such as actions or loop bodies are blocked via {...} as in C. • The last statement in a block doesn't need a terminator. • Blank lines have no meaning; an empty statement is terminated with a semicolon. • Long statements can be continued with a backslash, \. • A statement can be broken without a backslash after a comma, left brace, &&, ||, do, else, the right parenthesis of an if, while or for statement, and the right parenthesis of a function definition. • A comment in AWK starts with #.
Expressions and operators Primary AWK expressions are – numeric constants, – string constants, – variables, – fields, – arrays and – function calls.
The identifier for a variable, array or function can be a sequence of – letters, digits and underscores – and does not start with a digit. • Variables are not declared; they exist when first referenced and are initialized to null.
New expressions are composed with the following operators in order of increasing precedence.
Expression pattern types • uses marching • either searches through an entire record for a possible march using regular expression enclosed by ‘/’s • or explicitly searches for a march in a particular field or group of fields using the expressions ~ (march) or !~ (not march).