Getting Started with Scripting

by Seth Kenlon

The world's best conceptual introduction to shell scripting comes from an old

. In the video, Brian W. Kernighan (the "K" in `awk`) and Lorinda L. Cherry (co-author of `bc`) demonstrate how one of the founding principles of UNIX was to empower users to use existing utilities to create complex and customized tools.

In the [condensed] words of

Kernighan

: "Think of the UNIX system programs basically as [...] building blocks with which you can create things. [...] The notion of pipelining is the fundamental contribution of the [UNIX] system; you can take a bunch of programs...and stick them together end to end so that the data flows from the one on the left to the one on the right and the system itself looks after all the connections. The programs themselves don't know anything about the connection; as far as they're concerned, they're just talking to the terminal."

That's all shell scripting is: if you can figure out how to complete a task in a POSIX shell, then you can automate that task.

True to its name, the *shell script* is not an "object oriented" discipline, but a line-by-line "recipe" for what you want your computer to do, in the same sequence that you would have done it manually.

Since a shell script is a recipe consisting of common, everyday commands, familiarity with a UNIX or Linux (generically known as `POSIX`) shell is helpful, but this article assumes nothing. The more you practise using the shell, though, the easier it is to formulate new scripts; it's like learning a foreign language: the more vocabulary you internalize, the easier it is to form complex sentences.

When you open a terminal window, you are opening a *shell*. There are several shells out there, and this tutorial is valid for `bash`, `tcsh`, `ksh`, `zsh`, and probably others. In a few sections, I do provide some bash-specific examples, but the final script abandons those, so you can either switch to bash for the lesson about setting variables, or do some simple

syntax adjustment

If you're new to all of this, just use bash (the default on Linux, Cygwin, WSL, Mac, and an option on most BSDs).

Hello World

You can generate your own `hello world` script from a terminal window. Mind the quotation marks; single and double have different effects.

$ echo "#\!/bin/sh" > hello.sh
$ echo "echo 'hello world' " >> hello.sh

As you can see, writing a shell script consists, with the exception of the first line, of echoing or pasting commands into a text file.

To run the script as an application:

$ chmod +x hello.sh
$ ./hello.sh
hello world

And that's, more or less, all there is to it!

Despacer

If there's one thing that confuses the computer and human interaction, it's spaces in file names. You've seen it on the internet: URLs like `http://example.com/cats/omg%2ccutest%20cat%20photo%21%211.jpg`. Or maybe it's tripped you up when running a simple command:

$ cp llama pic.jpg ~/photos
cp: cannot stat 'llama': No such file or directory
cp: cannot stat 'pic.jpg': No such file or directory

The solution is to "escape" the space with a backslash, or quotation marks:

$ ls foo\ bar.txt
foo bar.txt
$ ls "foo bar.txt"
foo bar.txt

That gets inconvenient, so why not write a script to remove those annoying spaces from file names?

Create a file to hold the script, starting with a "shebang" (`#!`) to let your system know that the file should run in a shell:

$ echo '#!/bin/sh' > despace

Good code starts with documentation. Defining the purpose lets us know what to aim for. Here's a good README:

despace is a shell script that removes spaces from file names.

Usage:
$ despace "foo bar.txt"

Now let's figure out how to do it manually, and build the script as we go.

Assuming you have a file called "foo bar.txt" in an otherwise empty directory, try this:

$ ls
foo bar.txt

Computers are all about input and output. In this case, the input has been a request to `ls` a specific directory. The output is what you would expect: the name of the file in that directory.

In UNIX, output can be sent as the input of another command through a "pipe". Whatever's on the opposite side of the pipe acts as a sort of filter. The `tr` utility happens to be designed especially to modify strings passed through it; for this task, use the `--delete` option to delete a character defined in quotes.

$ ls "foo bar.txt" | tr --delete ' '
foobar.txt

And now we have just the output we need.

In the BASH shell, output can also be stored as a *variable*. You can think of a variable as an empty box, into which you might put information for storage:

$ NAME=foo

When you need the information back, you can "look in the box" by referencing a variable name preceded by a `$`.

$ echo $NAME
foo

So to get the output of our despacing command and set it aside for later, use a variable. To place the *results* of a command into a variable, use backticks:

$ NAME=`ls "foo bar.txt" | tr -d ' '`
$ echo $NAME
foobar.txt

This gets us half way to our goal; we now have a method to determine the destination filename from the source filename.

So far, the script looks like this:

#!/bin/sh

NAME=`ls "foo bar.txt" | tr -d ' '`
echo $NAME

The second part of the script must perform the renaming. You probably already now that command:

$ mv "foo bar.txt" foobar.txt

Remember in the script, though, that we're using a variable to hold the destination name. We do know how to reference variables:

#!/bin/bash

NAME=`ls "foo bar.txt" | tr -d ' '`
echo $NAME
mv "foo bar.txt" $NAME

You can try out your first draft by marking it executable and running it in your test directory. Make sure you have a test file.

$ touch "foo bar.txt"
$ chmod +x despace
$ ./despace
foobar.txt
$ ls
foobar.txt

Despacer v2.0

The script works, but not exactly as our documentation describes. It's currently very specific, and will only work on a file called `foo\ bar.txt`, and nothing else.

A POSIX command refers to itself as `$0` and anything typed after it sequentially as `$1`, `$2`, `$3`, and so on. Your shell script counts as a POSIX command, so try swapping out 'foo\ bar.txt' with '$1'.

#!/bin/bash

NAME=`ls $1 | tr -d ' '`
echo $NAME
mv $1 $NAME

Create a few new test files with spaces in the names:

$ touch "one two.txt"
$ touch "cat dog.txt"

Then test your new script:

$ ./despace "one two.txt"
ls: cannot access 'one': No such file or directory
ls: cannot access 'two.txt': No such file or directory

Looks like a bug has been found.

The bug is not actually a bug, as such; everything's working as designed, it's just not how you want it to work. Your script is "expanding" the `$1` variable to exactly what it is: "one two.txt", and along with that comes that bothersome space we're trying to get rid of.

The answer is to wrap the variable in quotations the same way we wrap filenames in quotes:

#!/bin/bash

NAME=`ls "$1" | tr -d ' '`
echo $NAME
mv "$1" $NAME

Another test or two:

$ ./despace "one two.txt"
onetwo.txt
$ ./despace c*g.txt
catdog.txt

This script acts the same as any other POSIX command. You can use it in conjunction with other commands just as you would expect to be able to use any POSIX utility:

$ find ~/test0 -type f -exec /path/to/despace {} \;
$ for FILE in ~/test1/* ; do /path/to/despace $FILE ; done

And so on.

Despacer v2.5

The despace script is functional, but technically it could be optimised, and it could use a few usability improvements.

First of all, the variable is actually not needed; the shell can calculate the required information all in one go.

POSIX shells have an order of operations. The same way we, in maths, solve for statements in brackets first, the shell resolves statements in backticks (or `$()` in BASH) before executing a command. Therefore, in the statement:

$ mv foo\ bar.txt `ls foo\ bar.txt | tr -d ' '`

gets transformed into:

$ mv foo\ bar.txt foobar.txt

and then the actual `mv` command is performed, leaving us with just `foobar.txt`.

Knowing this, the shell script can actually be condensed to just this:

#!/bin/bash

mv "$1" `ls "$1" | tr -d ' '`

That looks disappointingly simple, reduced to basically a one-liner, but don't let that minimize your sense of accomplishment; there are plenty of simple utilities that make life significantly easier.

Besides, your script can still use improvement. Some more testing reveals a few weak points. For instance, running `despace` with no argument renders an unhelpful error:

$ ./despace
ls: cannot access '': No such file or directory

mv: missing destination file operand after ''
Try 'mv --help' for more information.

These errors are confusing because they're for `ls` and `mv`, but as far as the user knows, it wasn't `ls` or `mv`, but `despace`, that they ran.

If you think about it, our little script shouldn't even attempt to rename a file if it didn't get a file as part of the command in the first place. So let's use what we know about variables along with the `test` function.

If and Test

The `if` statement is what turns your little despace utility from a script into a **program**. This is serious code territory. But don't worry, it's also pretty easy to understand and use.

An `if` statement is a kind of switch; if something is true, then we'll do one thing, and if it's false, we'll do something different. That's exactly the kind of binary decision making computers are best at; all we have to do is define for the computer what needs to be true or false and what to do as a result.

The easiest way we test for True or False is the `test` utility. We don't call it directly, we just use its syntax. Try this in a terminal:

$ if [ 1 == 1 ]; then echo "yes, true, affirmative"; fi
yes, true, affirmative
$ if [ 1 == 123 ]; then echo "yes, true, affirmative"; fi
$

That's how a test works. There's all manner of shorthand. The one we'll use is the `-z` option, which detects if the length of a string of characters is zero (0). Here's how the idea translates in our despace script:

#!/bin/bash

if [ -z "$1" ]; then
   echo "Provide a \"file name\", using quotes to nullify the space."
   exit 1
fi

mv "$1" `ls "$1" | tr -d ' '`

I've broken the `if` statement into separate lines for readability, but the concept remains: if the data inside the `$1` variable is empty (zero characters are present), then print an error statement.

Try it:

$ ./despace
Provide a "file name", using quotes to nullify the space.
$

Success!

Well, actually it was a failure; but it was a *pretty* failure, and more importantly, a *helpful* failure.

Notice that I also include the statement `exit 1`. This is a way for POSIX applications to send an alert to the system that it has encountered an error. That's important for yourself and for other people who may want to use despace in scripts that *depend* on despace succeeding in order for everything else to happen correctly.

The final improvement is to add something to protect the user from overwriting files accidentally. Ideally, we'd pass this option through to the script so that it's optional, but for the sake of simplicity, we'll just hardcode it. The `-i` option tells `mv` to ask for permission before overwriting a file that already exists:

#!/bin/bash

if [ -z "$1" ]; then
   echo "Provide a \"file name\", using quotes to nullify the space."
   exit 1
fi

mv -i "$1" `ls "$1" | tr -d ' '`

Now your shell script is helpful, useful, and friendly. And you're a programmer, so don't stop now. Learn new commands, use them in your terminal, take note of what you do, and then script it! Eventually, you'll put yourself out of a job, and the rest of your life will be spent relaxing while your robotic minions run shell scripts.

Happy hacking!