Recently, I bought my very first tape drive. Yes, it is 2019. Tape drives are not archaic, obsolete pieces of technology. They excel at particular use-cases. I bought an LTO-7 drive which offers 6 TB of storage per tape, or even more if the input is compressible. Earlier this year, I wrote a piece about my NAS setup. The final missing piece of my setup was a way to create off-site backups need in case of catastrophic events (fire, etc.). Using my new tape drive, I can quickly ship backups to other geographical locations.

Before purchasing an expensive tape drive, I did a lot of research to be prepared for all the pitfalls and caveats. However, I noticed some hick-ups or inconvenience which my research didn’t prepare me for. This article talks about things that I have learned by using the drive on CentOS 7.

This article might sound a bit pessimistic because I focus on the negative aspects; however, I do not regret spending money on a tape drive.

Rewind devices

Several blogs (e.g., 1, 2) outline how to work with a tape drive using the command line tool mt. On CentOS, the tapes drive are available via the two devices

  • /dev/st0 and
  • /dev/nst0.

(There are more devices with different compression levels. I didn’t try any of them, and don’t think I miss any functionality.)

The stX device rewinds the tape after every command (rewind-device); the nstX device keeps its current positions after every operation (non-rewind-device). Assume you have a single file already on the tape and want to write a second file after the first one and let’s say we are at the beginning of the tape. To append, we first need to forward space file: mt -f /dev/st0 fsf . Issuing this command will make the tape drive scan through the tape until it finds an end of file mark. Since this is a rewind-device, the tape will rewind after it found the EOF mark.

I don’t see a good use-case when you ever want to use a rewind device with mt. In the above example, if we assumed we are working with a non-rewind-device, we would accidentally overwrite the first file.

My recommendation is to always use a non-rewind-device, e.g., /dev/nst0, and rewind manually if this is what you want to do.

mt -f /dev/nst0 fsf N  # forward space files
# write somehting
mt -f /dev/nst0 rewind  # rewind manually if needed

mt essentials

The swiss-knife tool for tape devices is mt. You need the command to rewind or forward the tape to the desired position, check the status of the tape, and eject the tape. The command provides many methods to navigate through the tape. Four of them are of particular interest–the following list details the four navigation commands and other basic commands.

Command Description
mt status Prints the current position of the tape
mt fsf n Advances n files on the tape
mt bsfm n Goes back n-1 files on the tape.
mt rewind Goes to the beginning of the tape BOT
mt asf n Same as rewind followed by fsf n
mt offline Rewinds the tape and ejects the cartridge

If you want to know more about the layout of tapes, I recommend reading the etutorial’s chapter on tape drives.

Read and write programs

Everything is a file, and so are the files on a tape. If the tape is positioned at a file or at the end of the tape, there are several programs to read or write a file from or to the tape. Surprisingly enough, cat and the standard input and output redirection of bash work with tape drives. If the tape is positioned at the beginning of a file, sha256sum < /dev/nst0, will compute the checksum of the file. However, because of buffer sizes (see later), I do not recommend using this method. There are better alternatives.

tar

For me, tar was always a tool to create a tar-archive of files. Over the last few weeks, I learned that tar actually stands for tape archive. Tar seems to be the natural solution to copy files on tape.

xkcd.com comic about tar CC BY-NC 2.5, Randall Munroe, xkcd.com

(Super quick tar cheat sheet for xkcd-like situations: tar -cf ARCHIVE FILES to create an archive, tar -xf ARCHIVE DEST to extract an archive, Throw-in -j or -z for compression.)

Indeed, tar can be used to write files to tape. It is surprisingly simple.

$ tar -cf /dev/nst0 FILES-TO-BE-ARCHIVED

However, the write speed using the default options is staggeringly slow. The reason is the default buffer size of tar. I will say more about the buffer size in a later section. The write chunk of tar can be increased with the blocking factor option -b. The default block size of tar is 512 bytes, the blocking factor is a multiplier for this. So -b 2048 will lead to write chunks of 1 MB. However, this means that the archives are not binary compatible with archives created with the default settings, which is essential if you intend to compare checksums.

dd

The tool dd is also one of the classical command line tools from the Linux toolbox. If you browse articles about dd, there seem the be two different groups: those who recommend using dd for a given task, and those who say nobody should ever use dd. A common misconception or inconvenience about dd is related to the count= argument. The tool performs count-many bs=-sized read operations. If the input device has a smaller block size, dd will perform partial reads. Partial reads can be padded (thus inserting spaces or null into the binary stream!) and are treated as a complete read for the count= argument. However, if you are aware of these issues, I think dd is a convenient tool to copy data to tape. In short:

  • Do not use options to pad partial reads unless you know what you do, and
  • If you use count= also use iflag=fullblock and bs=.

(The situation is more complicated if you try to clone a damaged disk which produces read errors; however, this is a special case.)

To copy a file (including a tar archive), you can use

dd if=YourFile.tar of=/dev/nst0 bs=1M status=progress

SIT Archive Tool

For my backup strategy, I wanted to automate the full chain of the following steps

  1. Create a tar archive of input files,
  2. Compress the archive using multiple cores (pigz),
  3. Symmetrically encrypt the compressed archive using GPG,
  4. Write the encrypted archive to tape and
  5. Compute the SHA256 checksum of archive on-the-fly.

The SIT Archive Tool was specifically designed to perform these steps. The steps are chained via pipes and do not save intermediate files to save disk space. The final step uses dd to copy the archive to the tape.

To run the full chain on a directory, run

sat /path/to/InputDirectory -o tape /dev/nst0

Incredible read write speeds

According to the datasheet of my tape drive, the native read and write speed of the drive is 300 MB/s. Without any tuning, this is too fast for my setup if I stream from a spinning HDD, or if I transfer the data across my local network. Luckily, the drive can adapt to the input speed and slow down to 80 MB/s, which is comparable to old spinning HDDs. The actual write speed can be much larger if the data can be compressed. The tape drive tries to compresses the data on the fly. If your data can be compressed by a factor of 2.5, the transfer rate of data to the drive can be as high as 750 MB/s and as low as 200 MB/s. For already compresses or encrypted data, you should expect the native speeds.

So what happens when your system cannot provide the data fast enough? The tape drive has a 1GB internal buffer and tries to write as much as possible in one go. If the buffer runs empty, the tape stops, rewinds a bit, repositions at the end of the last write operation and waits for the buffer to fill again. If the data stream is slow, this repositioning happens quite frequently and is aptly named shoe-shining. Shoe-shining reduces the lifetime of your tape and the tape drive.

How can you tune your system in order to keep up with the tape and provide a high enough throughput?

Buffer sizes

One way to improve the data throughput is to increase the write buffer size of the tool you use to write to the tape. Changing the buffer size of dd from 1 byte to 1MB can impact the speed by many orders of magnitude.

I found it quite challenging to find reliable information about the data structure of LTO-7 tapes. My current understanding is based on various sources and on small experiments carried out by myself. Data is written in blocks. Every block adds a bit of overhead, including checksums and compression flags. mt indicates that the block size is not fixed. I assume that the block size is identical to the size used when calling write(). Therefore, small blocks lead to more overhead and thus small write speeds.

There seems to be a limit on the block size above 1MB. If the block size is too large, dd will through an invalid arg error.

I experienced that the block size also affects the read speed. Every read() operation will only retrieve a single block. If the tape was written with small blocks, reading small blocks leads to small read speeds.

The reading tool needs to have a buffer size at least as large as the block size. If the buffer size for dd is too small during reading, it will throw the cannot allocate mem error.

As already mentioned earlier, if the system is not able to sustain the data throughput, the tape does shoe-shining. How to find the optimal buffer size? If you compress or encrypt on the fly, i.e., pipe your input through pigz or gpg, you might want to test the throughput of different (compression) algorithms and buffer sizes. For this, you need an (incompressible) source stream with a high output rate such that you do not measure the source itself. On a Linux system, there are several different options to consider.

  • A regular file on a (spinning) hard disk drive will most likely be (part of) the bottleneck.
  • /dev/zeros is incredibly fast but also perfectly compressible.
  • /dev/urandom is incompressible but not fast enough.

I have developed the Pipe Source tool, which provides a fast and incompressible output stream to measure the throughput of external tools in order to optimize their buffer size settings. For example, to measure the throughput of symmetric GPG encryption, run

pipesrc | gpg -c > /dev/null

Encrypted tapes

LTO-7 tapes support on-the-fly AES encryption. To enable this feature and manage encryption-keys, I found the command-line tool stenc very useful. It compiles smoothly on CentOS if you run touch NEWS README before compilation. The tool is able to set the encryption/decryption key and to disable encryption again. Please note that the drive usually stores the keys until the drive is power cycled.

To generate a new encryption key, run

stenc -g 128 -k my_key.hex

To enable encryption, run

stenc -f /dev/nst0 -e on -k my_key.hex

To disable encryption again, run

stenc -f /dev/nst0 -e off

Drive Temperature

I noticed that during normal operation the half-height internal tape drive heats up quite significantly. In order to prevent this, I installed special fans which create a constant stream of air below and above the drive to prevent overheating of the drive and potentially the tape.