Pipes and Redirection Standard Input, Output, and Text Processing ██ Standard Input and Output 2 / 47 ██ How Commands Read and Write Every command has three streams — connections for reading and writing data:
• stdin (standard input, stream 0) — where the command reads input from
• stdout (standard output, stream 1) — where the command sends its output
• stderr (standard error, stream 2) — where the command sends error messages
By default: • stdin is your keyboard
• stdout is your terminal
• stderr is also your terminal
But these defaults can be changed! 3 / 47 ██ Visualizing the Streams ┌──────────────────┐
keyboard ─► stdin stdout ──► terminal│
│ │
│ COMMAND
│ │
│ stderr ──► terminal
└──────────────────┘
We can redirect stdout and stderr to files, or send stdout directly to another command's stdin.
4 / 47 ██ Remember the WARNING from Module 01? Some commands read from stdin when no filename is given: $ cat
The command is waiting for you to type — reading from stdin (keyboard). Press Ctrl-C to cancel.
Now you know why this happens: cat reads from stdin when there's no file argument.
This is true for head, tail, grep, wc, and many other commands too.
5 / 47 ██ File Redirection 6 / 47 ██ Saving Output to a File Use > to redirect stdout to a file:
$ date > timestamp.txt
$ wc -l *.txt > line_counts.txt
• If the file doesn't exist, it's created
• If the file already exists, it's overwritten (no warning!)
7 / 47 ██ Appending to a File Use >> to add stdout to the end of a file:
$ echo "Run 1 started" >> log.txt
$ date >> log.txt
$ echo "Run 1 finished" >> log.txt
• If the file doesn't exist, it's created • If the file already exists, new output is added to the end 8 / 47 ██ Overwrite vs. Append $ echo "Line 1" > file.txt # creates file with "Line 1"
$ echo "Line 2" > file.txt # OVERWRITES — now only "Line 2"
$ echo "Line 3" >> file.txt # appends — file has lines 2 and 3
Common mistake: Using > when you meant >>!
When in doubt, use cat file.txt to check what's there before writing.
9 / 47 ██ Redirecting Error Messages Error messages go to stderr, not stdout — so > won't capture them:
$ ls /nonexistent > output.txt # error message still appears on screen!
Use 2> to redirect stderr:
$ ls /nonexistent 2> errors.txt
$ ./myscript.sh 2> errors.txt
To redirect both stdout and stderr to the same file:
$ ./myscript.sh > output.txt 2>&1
10 / 47 ██ Demo - File Redirection 11 / 47 ██ Pipes 12 / 47 ██ The Unix Philosophy ▍ "Write programs that do one thing and do it well.
▍ Write programs to work together."
— Doug McIlroy, one of the creators of Unix This is why there are so many small, focused commands: cat, head, tail, sort, wc, grep, uniq, cut, ...
Each does one thing. Pipes let you combine them.
13 / 47 ██ Connecting Commands with Pipes The | character (pipe) sends the stdout of one command to the stdin of the next:
$ command1 | command2 | command3
command1 ──► stdout ──► stdin ──► command2 ──► stdout ──► stdin ──► command3 ──► stdout ──► terminal
command1 never knows its output isn't going to the terminal. command2 never knows its input isn't coming from the keyboard.
14 / 47 ██ Your First Pipeline $ ls | head
• ls lists all files, sending the list to stdout
• | sends that list to head's stdin
• head prints just the first 10 lines
Count the number of files in a directory: $ ls | wc -l
See files sorted by line count: $ wc -l *.txt | sort -n
15 / 47 ██ Pipelines vs. Reading Files Many commands can read from either a file or stdin:
$ wc -l data.txt # reads from a file
$ cat data.txt | wc -l # reads from stdin
Both count lines in data.txt. The pipe version becomes useful when the input comes from another command — not a static file.
$ ls *.txt | wc -l # count how many .txt files there are
16 / 47 ██ Demo - Pipes 17 / 47 ██ Text Processing Commands 18 / 47 ██ Commands That Power Pipelines These commands read from stdin (or a file) and write to stdout — they're designed to work in pipelines: • grep — keep only lines that match a pattern
• wc — count lines, words, or characters
• sort — sort lines
• uniq — remove or count duplicate lines
• cut — extract specific columns
• head — get top of stream
• tail — get bottom of stream
• cat — send multiple files to stdout
• paste — print files side-by-side
• column — align columns
• find — search for files by name, type, or other attributes
• seq — print a sequence of numbers
19 / 47 ██ grep: Filtering Lines grep prints only the lines that match a pattern:
$ grep "error" log.txt
$ cat log.txt | grep "error"
Useful options: • -i — case-insensitive (error, Error, ERROR all match)
• -v — invert: print lines that do not match
• -c — count matching lines instead of printing them
$ grep -i "warning" log.txt # any case
$ grep -v "^#" config.txt # skip comment lines
$ grep -c "error" log.txt # just the count
20 / 47 ██ wc: Counting wc counts things in its input:
$ wc data.txt
100 500 3200 data.txt
#lines words chars
Useful options: • -l — count lines only
• -w — count words only
• -c — count characters only
$ wc -l *.txt # line count of each file
$ ls | wc -l # count files in directory
$ grep "error" log.txt | wc -l # count error lines
21 / 47 ██ sort: Sorting Lines sort sorts lines alphabetically by default:
$ sort names.txt
$ cat names.txt | sort
Useful options: • -n — sort numerically (so 10 comes after 9, not after 1)
• -r — reverse order
• -k2 — sort by the 2nd whitespace-separated column
$ sort -n numbers.txt # numeric sort
$ sort -rn numbers.txt # largest first
$ wc -l *.txt | sort -n # files ordered by line count
22 / 47 ██ uniq: Removing Duplicates uniq removes consecutive duplicate lines:
$ sort colors.txt | uniq
Important: sort first! uniq only removes adjacent duplicates.
Useful options: • -c — prefix each line with its count
• -d — print only lines that appear more than once
$ sort log.txt | uniq -c # count occurrences of each line
$ sort log.txt | uniq -c | sort -rn # most frequent first
23 / 47 ██ cut: Extracting Columns cut extracts specific columns from each line:
$ cut -d: -f1 /etc/passwd # first field, colon-delimited
$ cut -d, -f2 data.csv # second column of a CSV
• -d — delimiter character (default: tab)
• -f — field number(s) to extract
$ cut -d, -f1,3 data.csv # columns 1 and 3
$ cut -d, -f2- data.csv # column 2 through the end
24 / 47 ██ head: First Lines of a Stream head prints the first lines of its input (default: 10):
$ head data.txt
$ cat data.txt | head
Useful options: • -n N — print the first N lines
$ head -n 5 data.txt # first 5 lines
$ sort -rn scores.txt | head -3 # top 3 scores
$ ls -lt | head -5 # 5 most recently modified files
25 / 47 ██ tail: Last Lines of a Stream tail prints the last lines of its input (default: 10):
$ tail data.txt
$ cat data.txt | tail
Useful options: • -n N — print the last N lines
• -f — follow: keep printing as new lines are added (great for watching log files)
$ tail -n 20 log.txt # last 20 lines
$ sort -n numbers.txt | tail -3 # 3 largest numbers
$ tail -f /var/log/syslog # watch a live log (Ctrl-C to stop)
26 / 47 ██ cat: Concatenate Files cat sends one or more files to stdout — it concatenates them:
$ cat file.txt # print one file
$ cat file1.txt file2.txt # print two files back to back
$ cat *.log # combine all log files into one stream
This is its core use in pipelines — combining multiple files before piping: $ cat *.log | grep "error" | wc -l # count errors across all logs
$ cat header.txt data.txt footer.txt > report.txt
27 / 47 ██ paste: Side-by-Side Files paste merges files line by line, printing them as columns:
$ paste names.txt scores.txt
Alice 92
Bob 85
Carol 78
Useful options: • -d — delimiter between columns (default: tab)
$ paste -d, names.txt scores.txt # comma-separated output
$ paste -d' ' first.txt last.txt # space-separated columns
Useful when you have related data split across files and want to combine them. 28 / 47 ██ column: Aligning Columns column formats tab- or space-separated input into neatly aligned columns:
$ column -t data.txt
Useful options: • -t — table mode: align all columns
• -s — input separator (for non-tab delimited input)
$ cat data.csv | column -t -s, # align a comma-separated file
$ paste names.txt scores.txt | column -t
Mostly useful at the end of a pipeline to make output readable: $ cut -d: -f1,3,7 /etc/passwd | column -t -s:
29 / 47 ██ find: Searching for Files find searches a directory tree for files matching criteria and prints their paths:
$ find . -name "*.txt" # all .txt files under current directory
$ find /var/log -name "*.log" # all .log files under /var/log
Useful options: • -name — match by filename (supports wildcards)
• -type f / -type d — files only / directories only
• -newer file — files modified more recently than file
$ find . -type f -name "*.py" # Python files only
$ find data/ -newer checkpoint.txt # files changed since last checkpoint
$ find . -name "*.log" | xargs grep "error" # grep across found files
Unlike the other commands, find generates filenames — it's a pipeline source, not a filter.
30 / 47 ██ Building Pipelines: Examples Count lines containing errors across all log files: $ cat *.log | grep -i "error" | wc -l
Find the 5 most common words in a file: $ cat essay.txt | tr ' ' '\n' | sort | uniq -c | sort -rn | head -5
Count how many files were modified today: $ ls -l | grep "$(date +"%b %e")" | wc -l
31 / 47 ██ Demo - Text Processing 32 / 47 ██ Puzzle Time! 33 / 47 ██ Practice: Working with Columns The file data-processing/multi-column-data.txt has multiple columns of data.
1. How many columns are there in this file? 2. How many lines are there in this file? 3. Print the maximum value in column 1. 4. Print the minimum value in column 2. 5. Print the 5'th largest value in column 3. 6. The value in column 1 on the row that contains the minimum value in column 4. 7. How many different values are in column 3? 8. Create a new file named multi-column-extracted.txt that contains just columns 2 and 4, sorted in descending order on column 2.
34 / 47 ██ Practice: What Line Is That On? Still using data-prcoessing/multi-column-data.txt
1. On what line number does the maximum value in column 3?
35 / 47 ██ Practice: Exploring Directories The simulations/ directory contains a bunch of run-*.out files.
1. How many run-*.out files are there?
2. These files contain status lines printed from a simulation. How many errors where there? 3. What was the greatest number of files that were processed on a successful run (no errors)? 36 / 47 ██ entr
So how does the entr command work?
37 / 47 ██ Command Substitution 38 / 47 ██ Using Command Output as Arguments You already saw this in Module 02: for i in $(seq 1 5); do
$(command) is command substitution — the shell runs the command inside and replaces $(...) with its output.
This lets you use the output of one command as an argument to another.
39 / 47 ██ Command Substitution: Basic Examples Store command output in a variable: today=$(date +%Y-%m-%d)
echo "Today is ${today}"
Use directly in a string: echo "Logged in as $(whoami) on $(hostname)"
Use in a filename: cp data.txt "data_backup_$(date +%Y%m%d).txt"
40 / 47 ██ Command Substitution vs. Pipes Both connect commands, but differently: Pipes — send stdout as stdin to the next command:
$ ls | wc -l # wc reads the file list from stdin
Command substitution — insert output as arguments:
$ wc -l $(ls *.txt) # passes filenames as arguments to wc
Use pipes when the next command reads from stdin. Use command substitution when the next command takes filenames or values as arguments. 41 / 47 ██ Command Substitution in Scripts Create uniquely named output files — useful for preserving each run's results: #!/bin/bash
set -e
output="results_$(date +%Y%m%d_%H%M%S).txt"
echo "Writing results to ${output}..."
./analyze.sh > "${output}"
echo "Done! Results saved to ${output}"
Every run creates a new file with a timestamp in the name. 42 / 47 ██ Demo - Command Substitution 43 / 47 ██ Summary We learned: • Every command has stdin, stdout, and stderr streams
• > redirects stdout to a file (overwrite), >> appends
• 2> redirects stderr; 2>&1 merges stderr into stdout
• | (pipe) sends stdout of one command to stdin of the next
• grep filters lines, wc counts, sort sorts, uniq deduplicates, cut extracts columns, head/tail slice streams, cat combines files, paste
merges side-by-side, column aligns output
• $(command) substitutes the output of a command as an argument
Key habit: Before writing a complex script, build and test the pipeline at the command line first!
44 / 47 ██ Next Time In Module 04, we'll go deeper into text processing: • More powerful column manipulation with gawk
• Stream editing with sed
• Regular expressions • Putting it all together with real data 45 / 47 ██ Last Slide This space intentionally left blank 46 / 47 47 / 47