How I Learned to Love tar
Tar stands for tape archive. So if you think of tar files like tape archives it makes it a lot easier to manage. Or so I think because I worked with tapes for years.
Create an Archive
Creating an archive is as simple as -cf
the c is create and f is for file then your data paths
Compression & Decompression
Tar has an automatic mode for a while now, as long as you a way to do it, tar will figure it out most of the time for you. Let's compress something with the new hotness of zstd. In this example -avcf
stands for automatic, verbose, create, file
Now how about extracting a file?
Example
In that example we told tar to use the flags -avxf
that stands for automatic, verbose, extract, file.
Using Custom Compression
If you need to use something that isn't built into tar, for instance if you're using a version of tar that does not support zstd you can use -I
to set the program to use for compression
Note
-
Keep in mind, although this doesnβt apply to tar itself, when dealing with most compression programs there is a big difference between
-e
and-x
being that-e
will preserve paths.So if Alice made a tar file in
/home/alice/mytardis
that means that when Bob uses-e
it will try to extract the data to the same path. So if Bob wants/home/bob/mytardis
rather than/home/alice/tardis
Bob needs to use the-x
flag.
Adding and Removing Data
So you want to add or remove data from your tar file? Good news, you can!
However it's not as easy as you may like it to be. In most causes it might be a lot easier to use the --exclude
flag and remake the file. However let's solve this problem one step at a time.
Did you know the --exlude
flag also works for extracting?
First things first, let's get a list of all the file in the tar. Here are two examples on how to list things and search things in a tar file.
Tip
-
Don't worry if you leave off the
-a
as tar is pretty smart and will figure out that it's compressed. Just remember to use the main part of the command e.g.-tf
at min.
Now that you know the path in the file to the data you're going to use the --delete
flag to remove the data or the --append
flag to append data.
Keep in mind, there is no way to append or delete from a compressed archive.
If you have a compressed tar you will need to decompress it, let's use file.name.here.tar.zstd
as an example. This works the same for anything the file was compressed with e.g. zstd
or gzip
.
Example
So what we did is decompress file.name.here.tar.zstd
and that became file.name.here.tar
. Now we removed oops.foo
and added yes_this.bar
to the tar file. Lastly we compressed file.name.here.tar
to create a shiny new file.name.here.tar.zstd
.
Backups with Tar
Tar is or was a backup tool first and for most, so it goes without saying you can do the normal differential and incremental backups with it. Differential backups are hard to automate so let's just stick with incremental backups for now.
Give it the -g
When you want to create an incremental backup you will need to first create a full backup and a snapshot file.
Example
This command will compress the backup using zstd, create the snar file, and make a fill backup. Once that's done you can make as many backups you want and the snar will store lists of changes between backups.
Now to make our first incremental backup!
Example
Excluding Data
Tar can exclude data a number of ways
File Steaming with Tar
Tar was created to stream data to tapes. You can do a lot of things that you wouldn't think you should be able to do with it. However thanks to the Unix way⦠EVERYTHING is a file.
Copying a Large Amount of Small Files
Let's say we have 10 million small files that we need to move from /var/omg/why/did/you/do/this
to /mnt/this/is/where/it/should/have/been
that is going to take a while right? Wellβ¦.
Example
You can also do this over ssh! Now let's add some compression too!
Example
# Make a file on the other end
tar -cf - /var/lib/data | zstd | ssh user@hostname "cat >/var/backups/backup.tar.zst"
# Send the data compressed with a progress bar!
tar -cf - /var/lib/data | pv | zstd | ssh user@hostname tar -axf - -C /var/lib/data/
# Send the data with a progress bar
tar -cf - /var/lib/data | pv | ssh user@hostname tar -xf - -C /var/lib/data/
# Get the data
ssh user@hostname "tar -cf - /var/lib/data" | tar -xvf - -C /var/lib/data/
Note
-
The pipe view command
pv
will give you different information depending on where it's put and what flags are used. The example above shows the raw data being passed. If you wanted to see the compressed data passed then you should put it afterzstd
The above examples show how to use tar with ssh. You can mix and match as needed. Your only limit is your creativity!
Oh and that being said, rsync
is almost always the better tool for sending data around. The edge cases being special files like symbolic links, special devices, sockets, named pipes, etc. If you're sending and backing up anything like that or other data rsync is bad for, tar is your go too.