Working with More Files and Text

Pipes

Part 1 : Standard Input and Standard Output

We can send the stanard output of one command to the standard input of another.

Philosophy

“Unix Philosophy”: A program should do one thing, and do it well.

There are many command line utilities that do one thing.

Pipes allow us to connect them together

Pipes

Probably the most powerful feature of the command line.

$ ls | head

Demo

./demos/01-standard_input_and_output.sh

Part 2: Text processing commands

Demo: Data Processing

./demos/02-commands.sh

./demos/03-data_processing.sh

Practice Problems

  1. The sandbox directory contains a directory named log/.

    1. How many regular files are under this directory and all of its sub-directories?
    2. How many directories are under this directory and all of its sub-directories?
    3. How many regular files whose filename ends with .txt are under this directory and all of its sub-directories?
  2. The sandbox directory contains a directory named data/.

    1. The oscillations.txt file contains data that collected using three different detectors. The first column is time (in seconds), and the next three columns are the values read from detectors 1, 2, and 3.

      1. How long was data collected during this run?
      2. What were the the minimum and maximunm values read by detector 1?
      3. What were the the minimum and maximunm values read by detector 2?
      4. What were the the minimum and maximunm values read by detector 3?
      5. What were the the tenth largest value read by detector 1?
      6. What were the the tenth largest value read by detector 2?
      7. What were the the tenth largest value read by detector 3?
  3. Write a shell script named count_log_files.sh that contains a single pipeline that prints the number of regular files with a filename that ends with .log at or below the current directory.

  4. Write a script named largest_file.sh that prints out the name of the file in the current directory that contains the most lines, and the number of lines it contains.

  5. Write a shell script named max_value_column_3.sh that takes one argument that specifies a datafile name and prints the maximum value in column three of the file.

  6. Write a shell script named max_value.sh that takes two arguments, the name of a data file and the column number to consider, and prints the maximum value in the given column for the given file.

Part 3: gawk

$ gawk '{print $1}' file.txt
$ cat file.txt | gawk '{print $1}'
$ cat file.txt | gawk '$1 > 10 {print $1}'

Part 4: sed

cat file.csv | sed 's/,/ /g'

Example : Travel Data

Example : Travel Data

Let’s ask some questions about the travel rates…

Example : Travel Data

How many places are listed in the table?

Example : Travel Data

What city has the most expensive lodging?

Example : Travel Data

What city has the least expensive lodging?

Example : Travel Data

Where is the greatest lodging meal allowance?

Example : Travel Data

Where is the lowest lodging meal allowance?

Example : Travel Data

How many places in Kansas are there?

Example : Travel Data

Where is the greatest lodging allowance in Kansas?

Example : Travel Data

Where is the lowest lodging allowance in Kansas?

Example : Travel Data

What is the lodging allowance in Hays?

Example : Travel Data

What is the average lodging allowance in Kansas?

Example : Travel Data

What is the average meal allowance in Kansas?

Example : Travel Data

What is the average lodging + meal allowance in Kansas?

Example : Travel Data

What state has the most destinations?

Example : Travel Data

What state has the fewest destinations?

Example : Travel Data

How many different states are there?

Last Slide

Last Slide

Last Slide

Last Slide

Last Slide

Last Slide

Last Slide

Last Slide

Last Slide

Last Slide

Last Slide

Last Slide

This slide intentionally left blank