Pipes and Redirection

                                                                 Standard Input, Output, and Text Processing

              ██ Standard Input and Output

                                                                                                                                                                       2 / 47

              ██ How Commands Read and Write

              Every command has three streams — connections for reading and writing data:

                 •  stdin (standard input, stream 0) — where the command reads input from

                 •  stdout (standard output, stream 1) — where the command sends its output

                 •  stderr (standard error, stream 2) — where the command sends error messages

              By default:

                 •  stdin is your keyboard

                 •  stdout is your terminal

                 •  stderr is also your terminal

              But these defaults can be changed!

                                                                                                                                                                       3 / 47

              ██ Visualizing the Streams

                                                                         ┌──────────────────┐

                                                                keyboard ─►  stdin   stdout ──► terminal│

                                                                         │                  │

                                                                         │    COMMAND

                                                                         │                  │

                                                                         │           stderr ──► terminal

                                                                         └──────────────────┘

              We can redirect stdout and stderr to files, or send stdout directly to another command's stdin.

                                                                                                                                                                       4 / 47

              ██ Remember the WARNING from Module 01?

              Some commands read from stdin when no filename is given:

                                                                $ cat

              The command is waiting for you to type — reading from stdin (keyboard).

              Press Ctrl-C to cancel.

              Now you know why this happens: cat reads from stdin when there's no file argument.

              This is true for head, tail, grep, wc, and many other commands too.

                                                                                                                                                                       5 / 47

              ██ File Redirection

                                                                                                                                                                       6 / 47

              ██ Saving Output to a File

              Use > to redirect stdout to a file:

                                                                $ date > timestamp.txt

                                                                $ wc -l *.txt > line_counts.txt

                 •  If the file doesn't exist, it's created

                 •  If the file already exists, it's overwritten (no warning!)

                                                                                                                                                                       7 / 47

              ██ Appending to a File

              Use >> to add stdout to the end of a file:

                                                                $ echo "Run 1 started" >> log.txt

                                                                $ date >> log.txt

                                                                $ echo "Run 1 finished" >> log.txt

                 •  If the file doesn't exist, it's created

                 •  If the file already exists, new output is added to the end

                                                                                                                                                                       8 / 47

              ██ Overwrite vs. Append

                                                      $ echo "Line 1" > file.txt     # creates file with "Line 1"

                                                      $ echo "Line 2" > file.txt     # OVERWRITES — now only "Line 2"

                                                      $ echo "Line 3" >> file.txt    # appends — file has lines 2 and 3

              Common mistake: Using > when you meant >>!

              When in doubt, use cat file.txt to check what's there before writing.

                                                                                                                                                                       9 / 47

              ██ Redirecting Error Messages

              Error messages go to stderr, not stdout — so > won't capture them:

                                                 $ ls /nonexistent > output.txt    # error message still appears on screen!

              Use 2> to redirect stderr:

                                                                $ ls /nonexistent 2> errors.txt

                                                                $ ./myscript.sh 2> errors.txt

              To redirect both stdout and stderr to the same file:

                                                                $ ./myscript.sh > output.txt 2>&1

                                                                                                                                                                      10 / 47

              ██ Demo - File Redirection

                                                                                                                                                                      11 / 47

              ██ Pipes

                                                                                                                                                                      12 / 47

              ██ The Unix Philosophy

              ▍ "Write programs that do one thing and do it well.

              ▍ Write programs to work together."

              — Doug McIlroy, one of the creators of Unix

              This is why there are so many small, focused commands:

              cat, head, tail, sort, wc, grep, uniq, cut, ...

              Each does one thing. Pipes let you combine them.

                                                                                                                                                                      13 / 47

              ██ Connecting Commands with Pipes

              The | character (pipe) sends the stdout of one command to the stdin of the next:

                                                                $ command1 | command2 | command3

                                    command1 ──► stdout ──► stdin ──► command2 ──► stdout ──► stdin ──► command3 ──► stdout ──► terminal

              command1 never knows its output isn't going to the terminal. command2 never knows its input isn't coming from the keyboard.

                                                                                                                                                                      14 / 47

              ██ Your First Pipeline

                                                                $ ls | head

                 •  ls lists all files, sending the list to stdout

                 •  | sends that list to head's stdin

                 •  head prints just the first 10 lines

              Count the number of files in a directory:

                                                                $ ls | wc -l

              See files sorted by line count:

                                                                $ wc -l *.txt | sort -n

                                                                                                                                                                      15 / 47

              ██ Pipelines vs. Reading Files

              Many commands can read from either a file or stdin:

                                                                $ wc -l data.txt          # reads from a file

                                                                $ cat data.txt | wc -l    # reads from stdin

              Both count lines in data.txt. The pipe version becomes useful when the input comes from another command — not a static file.

                                                       $ ls *.txt | wc -l        # count how many .txt files there are

                                                                                                                                                                      16 / 47

              ██ Demo - Pipes

                                                                                                                                                                      17 / 47

              ██ Text Processing Commands

                                                                                                                                                                      18 / 47

              ██ Commands That Power Pipelines

              These commands read from stdin (or a file) and write to stdout — they're designed to work in pipelines:

                 •  grep — keep only lines that match a pattern

                 •  wc — count lines, words, or characters

                 •  sort — sort lines

                 •  uniq — remove or count duplicate lines

                 •  cut — extract specific columns

                 •  head — get top of stream

                 •  tail — get bottom of stream

                 •  cat — send multiple files to stdout

                 •  paste — print files side-by-side

                 •  column — align columns

                 •  find — search for files by name, type, or other attributes

                 •  seq — print a sequence of numbers

                                                                                                                                                                      19 / 47

              ██ grep: Filtering Lines

              grep prints only the lines that match a pattern:

                                                                $ grep "error" log.txt

                                                                $ cat log.txt | grep "error"

              Useful options:

                 •  -i — case-insensitive (error, Error, ERROR all match)

                 •  -v — invert: print lines that do not match

                 •  -c — count matching lines instead of printing them

                                                         $ grep -i "warning" log.txt           # any case

                                                         $ grep -v "^#" config.txt             # skip comment lines

                                                         $ grep -c "error" log.txt             # just the count

                                                                                                                                                                      20 / 47

              ██ wc: Counting

              wc counts things in its input:

                                                                $ wc data.txt

                                                                  100  500  3200 data.txt

                                                                #lines words chars

              Useful options:

                 •  -l — count lines only

                 •  -w — count words only

                 •  -c — count characters only

                                                       $ wc -l *.txt                       # line count of each file

                                                       $ ls | wc -l                        # count files in directory

                                                       $ grep "error" log.txt | wc -l      # count error lines

                                                                                                                                                                      21 / 47

              ██ sort: Sorting Lines

              sort sorts lines alphabetically by default:

                                                                $ sort names.txt

                                                                $ cat names.txt | sort

              Useful options:

                 •  -n — sort numerically (so 10 comes after 9, not after 1)

                 •  -r — reverse order

                 •  -k2 — sort by the 2nd whitespace-separated column

                                                      $ sort -n numbers.txt              # numeric sort

                                                      $ sort -rn numbers.txt             # largest first

                                                      $ wc -l *.txt | sort -n            # files ordered by line count

                                                                                                                                                                      22 / 47

              ██ uniq: Removing Duplicates

              uniq removes consecutive duplicate lines:

                                                                $ sort colors.txt | uniq

              Important: sort first! uniq only removes adjacent duplicates.

              Useful options:

                 •  -c — prefix each line with its count

                 •  -d — print only lines that appear more than once

                                                   $ sort log.txt | uniq -c              # count occurrences of each line

                                                   $ sort log.txt | uniq -c | sort -rn   # most frequent first

                                                                                                                                                                      23 / 47

              ██ cut: Extracting Columns

              cut extracts specific columns from each line:

                                                        $ cut -d: -f1 /etc/passwd      # first field, colon-delimited

                                                        $ cut -d, -f2 data.csv         # second column of a CSV

                 •  -d — delimiter character (default: tab)

                 •  -f — field number(s) to extract

                                                          $ cut -d, -f1,3 data.csv       # columns 1 and 3

                                                          $ cut -d, -f2- data.csv        # column 2 through the end

                                                                                                                                                                      24 / 47

              ██ head: First Lines of a Stream

              head prints the first lines of its input (default: 10):

                                                                $ head data.txt

                                                                $ cat data.txt | head

              Useful options:

                 •  -n N — print the first N lines

                                                     $ head -n 5 data.txt              # first 5 lines

                                                     $ sort -rn scores.txt | head -3   # top 3 scores

                                                     $ ls -lt | head -5                # 5 most recently modified files

                                                                                                                                                                      25 / 47

              ██ tail: Last Lines of a Stream

              tail prints the last lines of its input (default: 10):

                                                                $ tail data.txt

                                                                $ cat data.txt | tail

              Useful options:

                 •  -n N — print the last N lines

                 •  -f — follow: keep printing as new lines are added (great for watching log files)

                                                    $ tail -n 20 log.txt              # last 20 lines

                                                    $ sort -n numbers.txt | tail -3   # 3 largest numbers

                                                    $ tail -f /var/log/syslog         # watch a live log (Ctrl-C to stop)

                                                                                                                                                                      26 / 47

              ██ cat: Concatenate Files

              cat sends one or more files to stdout — it concatenates them:

                                                     $ cat file.txt              # print one file

                                                     $ cat file1.txt file2.txt   # print two files back to back

                                                     $ cat *.log                 # combine all log files into one stream

              This is its core use in pipelines — combining multiple files before piping:

                                                    $ cat *.log | grep "error" | wc -l    # count errors across all logs

                                                    $ cat header.txt data.txt footer.txt > report.txt

                                                                                                                                                                      27 / 47

              ██ paste: Side-by-Side Files

              paste merges files line by line, printing them as columns:

                                                                $ paste names.txt scores.txt

                                                                Alice   92

                                                                Bob     85

                                                                Carol   78

              Useful options:

                 •  -d — delimiter between columns (default: tab)

                                                        $ paste -d, names.txt scores.txt    # comma-separated output

                                                        $ paste -d' ' first.txt last.txt    # space-separated columns

              Useful when you have related data split across files and want to combine them.

                                                                                                                                                                      28 / 47

              ██ column: Aligning Columns

              column formats tab- or space-separated input into neatly aligned columns:

                                                                $ column -t data.txt

              Useful options:

                 •  -t — table mode: align all columns

                 •  -s — input separator (for non-tab delimited input)

                                                      $ cat data.csv | column -t -s,     # align a comma-separated file

                                                      $ paste names.txt scores.txt | column -t

              Mostly useful at the end of a pipeline to make output readable:

                                                                $ cut -d: -f1,3,7 /etc/passwd | column -t -s:

                                                                                                                                                                      29 / 47

              ██ find: Searching for Files

              find searches a directory tree for files matching criteria and prints their paths:

                                                  $ find . -name "*.txt"          # all .txt files under current directory

                                                  $ find /var/log -name "*.log"   # all .log files under /var/log

              Useful options:

                 •  -name — match by filename (supports wildcards)

                 •  -type f / -type d — files only / directories only

                 •  -newer file — files modified more recently than file

                                              $ find . -type f -name "*.py"              # Python files only

                                              $ find data/ -newer checkpoint.txt         # files changed since last checkpoint

                                              $ find . -name "*.log" | xargs grep "error"  # grep across found files

              Unlike the other commands, find generates filenames — it's a pipeline source, not a filter.

                                                                                                                                                                      30 / 47

              ██ Building Pipelines: Examples

              Count lines containing errors across all log files:

                                                                $ cat *.log | grep -i "error" | wc -l

              Find the 5 most common words in a file:

                                                     $ cat essay.txt | tr ' ' '\n' | sort | uniq -c | sort -rn | head -5

              Count how many files were modified today:

                                                                $ ls -l | grep "$(date +"%b %e")" | wc -l

                                                                                                                                                                      31 / 47

              ██ Demo - Text Processing

                                                                                                                                                                      32 / 47

              ██ Puzzle Time!

                                                                                                                                                                      33 / 47

              ██ Practice: Working with Columns

              The file data-processing/multi-column-data.txt has multiple columns of data.

                 1. How many columns are there in this file?

                 2. How many lines are there in this file?

                 3. Print the maximum value in column 1.

                 4. Print the minimum value in column 2.

                 5. Print the 5'th largest value in column 3.

                 6. The value in column 1 on the row that contains the minimum value in column 4.

                 7. How many different values are in column 3?

                 8. Create a new file named multi-column-extracted.txt that contains just columns 2 and 4, sorted in descending order on column 2.

                                                                                                                                                                      34 / 47

              ██ Practice: What Line Is That On?

              Still using data-prcoessing/multi-column-data.txt

                 1. On what line number does the maximum value in column 3?

                                                                                                                                                                      35 / 47

              ██ Practice: Exploring Directories

              The simulations/ directory contains a bunch of run-*.out files.

                 1. How many run-*.out files are there?

                 2. These files contain status lines printed from a simulation. How many errors where there?

                 3. What was the greatest number of files that were processed on a successful run (no errors)?

                                                                                                                                                                      36 / 47

              ██ entr

              So how does the entr command work?

                                                                                                                                                                      37 / 47

              ██ Command Substitution

                                                                                                                                                                      38 / 47

              ██ Using Command Output as Arguments

              You already saw this in Module 02:

                                                                for i in $(seq 1 5); do

              $(command) is command substitution — the shell runs the command inside and replaces $(...) with its output.

              This lets you use the output of one command as an argument to another.

                                                                                                                                                                      39 / 47

              ██ Command Substitution: Basic Examples

              Store command output in a variable:

                                                                today=$(date +%Y-%m-%d)

                                                                echo "Today is ${today}"

              Use directly in a string:

                                                                echo "Logged in as $(whoami) on $(hostname)"

              Use in a filename:

                                                                cp data.txt "data_backup_$(date +%Y%m%d).txt"

                                                                                                                                                                      40 / 47

              ██ Command Substitution vs. Pipes

              Both connect commands, but differently:

              Pipes — send stdout as stdin to the next command:

                                                          $ ls | wc -l          # wc reads the file list from stdin

              Command substitution — insert output as arguments:

                                                         $ wc -l $(ls *.txt)   # passes filenames as arguments to wc

              Use pipes when the next command reads from stdin.

              Use command substitution when the next command takes filenames or values as arguments.

                                                                                                                                                                      41 / 47

              ██ Command Substitution in Scripts

              Create uniquely named output files — useful for preserving each run's results:

                                                                #!/bin/bash

                                                                set -e

                                                                output="results_$(date +%Y%m%d_%H%M%S).txt"

                                                                echo "Writing results to ${output}..."

                                                                ./analyze.sh > "${output}"

                                                                echo "Done! Results saved to ${output}"

              Every run creates a new file with a timestamp in the name.

                                                                                                                                                                      42 / 47

              ██ Demo - Command Substitution

                                                                                                                                                                      43 / 47

              ██ Summary

              We learned:

                 •  Every command has stdin, stdout, and stderr streams

                 •  > redirects stdout to a file (overwrite), >> appends

                 •  2> redirects stderr; 2>&1 merges stderr into stdout

                 •  | (pipe) sends stdout of one command to stdin of the next

                 •  grep filters lines, wc counts, sort sorts, uniq deduplicates, cut extracts columns, head/tail slice streams, cat combines files, paste

                    merges side-by-side, column aligns output

                 •  $(command) substitutes the output of a command as an argument

              Key habit: Before writing a complex script, build and test the pipeline at the command line first!

                                                                                                                                                                      44 / 47

              ██ Next Time

              In Module 04, we'll go deeper into text processing:

                 •  More powerful column manipulation with gawk

                 •  Stream editing with sed

                 •  Regular expressions

                 •  Putting it all together with real data

                                                                                                                                                                      45 / 47

              ██ Last Slide

              This space intentionally left blank

                                                                                                                                                                      46 / 47

                                                                                                                                                                      47 / 47