Full and Incremental Backups Using Tar Linux Command

undefined

In this article we will discuss the using of tar command to perform a full and incremental backups for files and directories in Linux systems. Tar is very useful tool in backing up files, directories, and even  full systems “full OSes”, here will use it only to backing up data “file and directories” in the following scenario; will use it to perform a full backup every month “day one of every month” and perform a daily backup to the end of the month. We will use cron jobs to schedule “automate” the backup process and will use cron jobs to auto-remove the expired backups “backups which exceeded the retention period”.

This article will include a shell script that performs backups in both types and a shell script that performs tracking and removing expired backups. I’ll explain the idea of each script, and step by step to automate the whole process.

Requirements:

  • In general you must have a root privilege to backup most of systems you will work with.
  • You must have enough storage space to save your full and incremental backups.
  • You should have some experience in using tar command simply by reading our Sysadmins Most Used Tar Command Examples in Linux article.
  • Better to have a little experience in writing shell scripts, if No, never mind you can go and use my scripts with little modifications.
  • Better to have a little experience in schedule tasks using cron jobs, if No, never mind I’ll show you how to schedule a cron jobs task.

After reading this three parts article, you will gain all the experience to protect your data “the first task sysadmins must master”.
So, let’s start.

Part 1: Performing a Full and Incremental Backups Using Tar Linux Command.

There are three kinds of backups. The simplest type is a ‘full’ or level 0 backup then come the differential backup and finally incremental backup. In this article I’ll only perform full backup and incremental backup to my data.
Full backup means taking a backup of all files and directories, in general, it takes longer time to finish than incremental backup and also needs a huge storage for saving it.
On the other hand, the incremental backup only takes the modified/changed/added files and directories and mark the deleted files and directories, since last taken backup “either full or incremental”, it takes less time and needs less space for saving it.

Here’s the backup scenario I’m using in my work:

  1. Taking a monthly full backup at day one of each month.
  2. Taking incremental backup daily from day 2 of each month to the last day of the month.
  3. After a certain period, I remove the old full and increment backups to save my storage space, this period is known as a retention period.
Hint:
 1. The above scenario it totally based on including a time-stamp format in naming the backup files and it's snapshot files. This time-stamp is too important to speed the process of removing the old backups and facilitate the restore process.

Taking a Full and Incremental Backup Using Tar command.

Here’s the tar command I use to take a full and incremental backup of my data, I’ll backup a daily modified image directory called uploads which has this full path “/var/discourse/shared/standalone/uploads/“, which daily updated with users photos. I’ll need two directories to save my backup files and my snapshot file, I’ll save my backups files in “/Backups/uploads/“, and save my snapshot file in “/Backups/snap_files/“. We are including a time-stamp in naming the compressed backup file “images_uploads-`date +%Y-%m-%d`.tar.gz“, and in naming the snapshot file “images_uploads-`date +%Y-%m`.snap“, you can see the difference between the two time-stamps. To perform a full and daily incremental backup for the uploads directory, I run the following command:

 # tar -cvzpf /Backups/uploads/images_uploads-`date +%Y-%m-%d`.tar.gz -g /Backups/snap_files/images_uploads-`date +%Y-%m`.snap /var/discourse/shared/standalone/uploads/

Let’s discuss the each option we have used in the above command for creating a full backup.

c –> Creates a new .tar archive file.
v –> Verbosely show the .tar file progress.
f –> File name type of the archive file.
z –> Compress the archive file with gzip
p or –preserve-permissions, –same-permissions –> Extract information about file permissions (default for superuser), no need to explicit add this option if you are the root user.

g or –listed-incremental=FILE –> Handle incremental backups with snapshot data in FILE

What action the above tar command exactly do?

The above command will perform a full backup at first time it run, then it will perform incremental backups every time it run after it’s first run till the last day of the current month. Then at the first day of the next month it’ll perform a full backup and will perform incremental backups till the end of the month, and so on. Sure you need to schedule this task to run daily using cron jobs, later in this article I’ll explain the cron job we use.

Why the above tar command do that action / behavior?

The above action / behavior because of the existence of “-g” option. the default behavior of tar archive is to perform a full backup every time it runs, but because we added “-g” option, it will perform a full backup at it’s first run and create a binary snapshot file “images_uploads-`date +%Y-%m`.snap” which specified by “-g” option and will write all the files it archived into this file. In the second time it runs, it’ll detect the existence of the snapshot file, so it’ll not create any new snapshot file and will use this file. By help of the existing snapshot file, it will only backup the modified/changed/added files, and will mark the deleted files in this incremental archive and will do this task every time it run.

At the first day of the next month, the tar command will not find the snapshot file of the current month because of the time-stamp in the snapshot file naming changed to be the current month, so the tar command will perform a full backup and create the new snapshot file for this month, and so on…

Examples of Naming the  full and incremental backups naming  and the snapshot files.

Let’s say we are in August and will run the above command today “which was at time of writing this post August 4, 2016”, the first backup will be full backup and will have this name “images_uploads-2016-08-04.tar.gz” and the tar will create a snapshot file with this name “images_uploads-2016-08.snap“. Next time the same tar command will run in this month “August”, it will find the snapshot file existing so it’ll perform an incremental backup with this name “images_uploads-2016-08-05.tar.gz“, and so on till the end of August.
Now, suppose we are at “September 01, 2016“, running the above tar will not find a snapshot with this name “images_uploads-2016-09.snap” so it’ll create it and do a full backup then every other day it’ll do incremental backup till the end of September, and will repeat this process forever “suppose the above tar command run daily in a cron job”.

Now you understand the idea of the backup scenario I perform, next part will discuss schedule / automate the above process.

Part 2: Writing The Shell Backup Script And Add It To Cron Job To Automate the Backup Process.

The above tar command must run daily to perform a full and incremental backups, but of course you will not run it manually, you need to schedule this task to run at a certain time every day using cron jobs.

Before we create the cron job, we will write the above command in a shell script and make it a general script. I’ll use variables to set the Backup files and the snapshot files backup location, and finally the directory you want to backup it.
Here’s our general tar backup shell script:

#!/bin/bash
     
#### The Fixed Variables I Use ####
    
Backup_file_timestamp=`date +%Y-%m-%d`           ### This is the times-tamp used in the backup files naming. This variable is Fixed ###
Snapshot_file_timestamp=`date +%Y-%m`            ### This is the time-stamp used in the snapshot files naming. This variable is Fixed ###

#### You can set the value of the following variables with names of backups files,   ####
#### and the storage place you will save your backups files and snapshots files into ####
    
Backup_Snapshot_file_name=images_uploads                           ### Here's I use one variable for naming both backup files and snapshots. You can change the value of this variable ### 
Backups_Destination=/Backups/uploads                               ### Your backups storage location ###
Snapshots_Destination=/Backups/snap_files                          ### Your snapshots storage location ##
Data_to_be_Backed_up=/var/discourse/shared/standalone/uploads      ### Your important data directory, you need to backup it ###

##### Here's come the tar command that will perform the backup. This part is fixed, and it's not a variable to change it's value ####
      
tar -cvzpf $Backups_Destination/$Backup_Snapshot_file_name-$Backup_file_timestamp.tar.gz -g $Snapshots_Destination/$Backup_Snapshot_file_name-$Snapshot_file_timestamp.snap $Data_to_be_Backed_up

Now, save the above script with name “Server-Backup_V1.sh” in your administration scripts directory “if you have one”. As showed in comments in the above shell script, you only can change the values of four variables “Backup-Snapshot_file_name, Backups_Destination, Snapshots_Destination, and Data_to_be_Backed_up“. Set the values for those variables with the values in your backup system infrastructure. Values naming must be have a absolute path.
Now, it’s time to use the cron jobs, we will create a cron job that runs daily at 12 AM to execute the Server-Backup_V1.sh backup script.

In your terminal, run the following command:

# crontab -e

And append the following cron job to the end of the existing jobs as follow:

00 00 * * * /bin/bash /home/mohammed.semari/Semari-Scripts/Server-Backup_V1.sh

Save then exit. Now your cron job will run daily at 12 AM to run your backup script located in “/home/mohammed.semari/Semari-Scripts/Server-Backup_V1.sh

Hints:
 1. You must create the needed directories for your backup system, In our case you need to create two directories to save backup and snapshots files, and one directory to save your backup script "Server-Backup_V1.sh".
 2. Always in all of your administration tasks and general task use absolute paths for files and directories, We are using absolute paths in cron jobs and backups location in the above script.

Part 3: Removing Old / Expired Backups.

Removing the old / expired backups is also known as “the backup retention period“, which is how long do you keep old backups files in your storage. It’s important to set the retention period carefully, the perfect case is to set it as long as you can, but this will need a large disk space. Feel free to set this value as you wish in your systems.

I’ll write a shell script to remove the backups files and snapshot files that older than one month. I’ll use the same variables naming used in the backup script, because at the end of this article, I’ll merge the two scripts into one, but now need to show you the ideas behind this retention script.

Here’s our backup retention shell script:

#!/bin/bash

#### You can set the value of the following variables with names of the storage place you will save your backups files and snapshots files into ####
     
Backup_Snapshot_file_name=images_uploads       ### Here's I use one variable for naming both backup files and snapshots. You can change the value of this variable. ###
Backups_Destination=/Backups/uploads           ### Your backups storage location ###
Snapshots_Destination=/Backups/snap_files      ### Your snapshots storage location ##
Retention_period=3                             ### Will keep always one month backup at any time, This value must be >= 3 ###
 
##### Here's come the code that will perform the removing of old backups. This part is fixed, and it's not a variable to change it's value ####
      
 array=( 01 02 03 04 05 06 07 08 09 10 11 12 )
 NOW=$(date +"%m")
 cur=`expr $NOW - $Retention_period`
 del=${array[$cur]}
 rm -rf $Backups_Destination/$Backup_Snapshot_file_name-????-$del-??.tar.gz
 rm -rf $Snapshots_Destination/$Backup_Snapshot_file_name-????-$del.snap

Now, save the above script with name “AutoRetention_V1.sh” in your administration scripts directory “if you have one”, and if you will use it alone, you need to run this script monthly using cron jobs just as what we did with the backup script. For me I’ll merge both scripts into one and run the new script daily at 12 AM.

Hint:
1. If you will use the AutoRetention_V1.sh as a separate script, it's better to run it  monthly, add the following cron job to your existing cron jobs, of course change the file path to the location you saved your file in

Here’s the cron job in this case:

00 00 1 * * /bin/bash /home/mohammed.semari/Semari-Scripts/AutoRetention_V1.sh

What is the ideas behind the retention script we created?

A good question, here’s the idea; simply we need to remove the old backups that created older than a specific period “in our case one month”. Because we only take a full backup once at the start of each month and incremental backup to the end of the month, we can not remove old backups file by file “If we removed only the full backup which taken at day one of the month, other incremental backups will worth nothing if the files we want to restore wasn’t changed since the full backup” So, we have to remove old backups month by month.

In the above script, I set the value of “Retention_period” to 3, this will do the following:
At the start of each month “day one” the “AutoRetention_V1.sh” script will remove the existing backups of the month before before the current month i.e suppose we are At August 01, and the script runs daily at 12 AM, the script will remove the backups file of June, and the snapshot file of June. At this case we have the backups and snapshot of July in our storage. The backup script will take it’s backups daily to the end of August, at September the “AutoRetention_V1.sh” script will remove the backups file of July, and now we have the backups and snapshots of August in our storage, and so on.

Hint:
 1. Always set "Retention_period" greater than the number of backup months you want to keep by two. for example if you need to keep the backup of the previous 3 months set "Retention_period" to be 5 in the above script.

Part 4: Merging The Backup Script And the Retention Period Script Into One.

As the above two scripts use the same variable, it’s better for us to merge them into one script. We will name the new script “Full_Backup_Systems_V1.sh” and will run it daily at 12 AM. I’ll remove the comments from it as it exists in the above two scripts.

Here’s our final backup and retention period script:

#!/bin/bash

echo -e "\e[1;34m=============================================== \e[0m"
echo -e "\e[1;34mThis Script performs Creating Full and incremental Backups, and Removes Old Backups exceeded the retention period. \e[0m"
echo -e "\e[1;34mThis Script Created By: \e[0m"
echo -e "\e[1;34mMimastech.com Engineers \e[0m"
echo -e "\e[1;34mFeel Free To Use This Script \e[0m"
echo -e "\e[1;34mRegards \e[0m"
echo -e "\e[1;34m=============================================== \e[0m"
echo " "

      
### Part 1 Variables ####
#### The Fixed Value Variables I Use ####
     
Backup_file_timestamp=`date +%Y-%m-%d`
Snapshot_file_timestamp=`date +%Y-%m`

#### The Changeable Value Variables I Use ####

Backup_Snapshot_file_name=images_uploads
Backups_Destination=/Backups/uploads
Snapshots_Destination=/Backups/snap_files
Data_to_be_Backed_up=/var/discourse/shared/standalone/uploads
Retention_period=3

##### Part 2: The Full and Incremental Backups #####
        
tar -cvzpf $Backups_Destination/$Backup_Snapshot_file_name-$Backup_file_timestamp.tar.gz -g $Snapshots_Destination/$Backup_Snapshot_file_name-$Snapshot_file_timestamp.snap $Data_to_be_Backed_up

##### Part 3: Removing Old Backups ####
       
array=( 01 02 03 04 05 06 07 08 09 10 11 12 )
NOW=$(date +"%m")
cur=`expr $NOW - $Retention_period`
del=${array[$cur]}
rm -rf $Backups_Destination/$Backup_Snapshot_file_name-????-$del-??.tar.gz
rm -rf $Snapshots_Destination/$Backup_Snapshot_file_name-????-$del.snap

############ End Of the Script ###############

Now, save the above script with name “Full_Backup_Systems_V1.sh” in your administration scripts directory, and use cron jobs to run it daily at 12 AM.

Summary

In this article, we discussed different ideas for backing up your system. We used tar Linux command to perform a full and incremental backups. In this article we created a full backup at first day in every month and incremental backups in other days till the end of the month. We used cron jobs to schedule the backup process. You have two options for using our two scripts “the backup script and the retention script” either use each  of them separately or use the full backup script created by merging the two scripts. All needed from you is to change some values of variables in our scripts.

I hope this article is good enough for you.
See you in other articles

If You Appreciate What We Do Here On Mimastech, You Should Consider:

  1. Stay Connected to: Facebook | Twitter | Google+
  2. Support us via PayPal Donation
  3. Subscribe to our email newsletters.
  4. Tell other sysadmins / friends about Us - Share and Like our posts and services

We are thankful for your never ending support.

Leave a Reply

Your email address will not be published. Required fields are marked *