Bacula – copying a job; doing it better

Bacula – copying a job; doing it better

I recently wrote about Bacula’s
Copy job option. In that solution,
I used an approach which copied all uncopied jobs from disk to tape. I’ve
found a few annoyances with that initial approach and now I’m experimenting
with a new approach. An added side-effect: less wear on the tape library.

My initial approach used the Selection Type = PoolUncopiedJobs. My
new approach, originally discarded, will use Selection Type = SQL Query,
three jobs, each with a different pool. The results are much smoother and
faster.

Duplicate Job Control

By default, Bacula allows
for duplicate jobs. For the purposes of defining a duplicate job, Bacula goes by the job name.
Various directives for fine tuning of duplicates. This comes into play with the solution
I have selected.

The approach I am using will select multiple jobs for copying. Thus, the potential for duplicates
is high, and almost guaranteed given the nature of backups. Therefore, if the original jobs do not
allow duplicate and your selection method returns duplicate jobs, you will wind up with
cancellations. Which means the next copy job will pick up the targets which were canceled.
This can wind up in a never ending cycle.

The goal of my jobs are to copy any and all jobs that meet my criteria, thus I have removed all my
duplicate job control. This may sound rash, but I may resolve the problem of duplicate jobs by
using Max Start Delay and/or Max Wait Time = time.

The new copy job

This is my new copy job. It works on my file based pool for incrementals backup. Note the
following points:

  • Level = Incremental
  • Pool = IncrFile
  • P.Name = ‘IncrFile’

The goal of this job is to copy jobs from the IncrFile pool over to tape.

Job {
  Name     = "CopyToTape-Inc"
  Type     = Copy
  Level    = Incremental
  Pool     = IncrFile
  JobDefs  = "DefaultJobCopyDiskToTape"
  
  Schedule = Never
            
  Selection Type = SQL Query
  Selection Pattern = "
 SELECT DISTINCT J.JobId, J.StartTime
   FROM Job J, Pool P
   WHERE P.Name = 'IncrFile' 
     AND P.PoolId = J.PoolId
     AND J.Type = 'B' 
     AND J.JobStatus IN ('T','W')
     AND J.jobBytes > 0
     AND J.JobId NOT IN
         (SELECT PriorJobId 
            FROM Job
           WHERE Type IN ('B','C')
             AND Job.JobStatus IN ('T','W')
             AND PriorJobId != 0)
ORDER BY J.StartTime
LIMIT 1;
"
}

I added the LIMIT clause for testing purposes. My production job does not have that clause.
I wanted to see this works with just one job to copy; walk first, run later.

Originally, that went to the Fulls tape pool

This section outlines a problem I encountered and how I solved it. It also shows you
the JobDefs I’m using.

When I initially ran this job, the result went to the Fulls tape pool, not to the Incrementals tape pool
as I expected:

02-Feb 19:39 bacula-dir JobId 50343: The following 1 JobId was chosen to be copied: 45849
02-Feb 19:39 bacula-dir JobId 50343: Copying using JobId=45849 Job=ngaio.2011-01-11_05.55.02_52
02-Feb 19:39 bacula-dir JobId 50343: Bootstrap records written to /home/bacula/working/bacula-dir.restore.83.bsr

The above is the bconsole output. The below is some SQL for my own curiosity.

bacula=# select jobid, job, name, type, level, poolid from job where jobid = 50343;
 jobid |                  job                  |      name      | type | level | poolid
-------+---------------------------------------+----------------+------+-------+--------
 50343 | CopyToTape-Inc.2011-02-02_19.39.15_03 | CopyToTape-Inc | c    | F     |      3
(1 row)
  
bacula=# select poolid, name, nextpoolid from pool where poolid = 3;                                                     poolid | name  | nextpoolid
--------+-------+------------
      3 | Fulls |
(1 row)

bacula=#

Interesting. There is no value for nextpoolid. In fact, none of the pools have this value set.
That is not, however, the reason why the job went to the wrong pool.

However, I think the clue is here, in the bconsole output from when I started the job:

JobName:       CopyToTape-Inc
Bootstrap:     *None*
Client:        polo-fd
FileSet:       Full Set
Pool:          FullFile (From Job resource)
Read Storage:  MegaFile (From Pool resource)
Write Storage: DigitalTapeLibrary (From Storage from Pool's NextPool resource)
JobId:         *None*
When:          2011-02-02 19:39:14
Catalog:       MyCatalog
Priority:      400
OK to run? (yes/mod/no): yes
Job queued. JobId=50343

Right there… you can see source pool is FullFile. Why? It should be IncrFile. That would be
why the write pool is Fulls. As can be seen in the definition of the FullFile Pool:

Pool {
  Name             = FullFile
  Pool Type        = Backup
  Recycle          = yes
  AutoPrune        = yes
  Volume Retention = 3 years
  Storage          = MegaFile
  Next Pool        = Fulls

  Maximum Volume Bytes = 5G

  LabelFormat = "FullAuto-"
}

Oh, but I think I found the cause. The JobDefs…

JobDefs {
  Name        = "DefaultJobCopyDiskToTape"
  Type        = Backup
  Level       = Incremental
  Client      = polo-fd
  FileSet     = "Full Set"
  Schedule    = "WeeklyCycleForCopyingToTape"
  Storage     = DigitalTapeLibrary
  Messages    = Standard

  Pool        = FullFile # required parameter for all Jobs

  #
  # since this JobDef is meant to be used with a Copy Job
  # these Pools are the source for the Copy... not the destination.
  # The Destination is determined by the Next Pool directive in
  # the respective Pools.
  #
  Full         Backup Pool = FullFile
  Differential Backup Pool = DiffFile
  Incremental  Backup Pool = IncrFile

  Priority    = 400

  # don't spool date when backing up to tape from local disk
  Spool Data  = no
  Spool Attributes = yes

  RunAfterJob  = "/home/dan/bin/dlt-stats-kraken"

  # no sense spooling local data
  Spool Data       = no
  Spool Attributes = yes
  Maximum Concurrent Jobs = 6
}

Right there… the pool is defined as FullFile. Thus, I altered my original Job
to included Pool = IncrFile as shown above. But before I made that change,
I discovered that this also worked:

run job=CopyToTape-Inc Pool=IncrFile yes

I ran a few test cases and I was satisfied that it would do what I wanted.
I removed the LIMIT clause, issued a reload in bconsole, and ran the job
manually. It took several hours to run.

But wait! There’s more!

The above job works on the IncrFile pool. But I have other pools that need copying to tape.
The following query shows much much data is waiting.

bacula=#  SELECT P.Name, pg_size_pretty(sum(J.jobbytes)::bigint)
bacula-#    FROM Job J, Pool P
bacula-#    WHERE P.Name in ( 'IncrFile' , 'DiffFile', 'FullFile' )
bacula-#      AND P.PoolId = J.PoolId
bacula-#      AND J.Type = 'B'
bacula-#      AND J.JobStatus IN ('T','W')
bacula-#      AND J.jobBytes > 0
bacula-#      AND J.JobId NOT IN
bacula-#          (SELECT PriorJobId
bacula(#             FROM Job
bacula(#            WHERE Type IN ('B','C')
bacula(#              AND Job.JobStatus IN ('T','W')
bacula(#              AND PriorJobId != 0)
bacula-# group by 1;
   name   | pg_size_pretty
----------+----------------
 FullFile | 212 GB
 IncrFile | 57 GB
 DiffFile | 94 GB
(3 rows)

bacula=# 

I will have two more jobs to create; one for Fulls, another for Differentials. They appear below:

Job {
  Name     = "CopyToTape-Diff"
  Type     = Copy
  Level    = Differential
  Pool     = DiffFile
  JobDefs  = "DefaultJobCopyDiskToTape"

  Schedule = Never

  Selection Type = SQL Query
  Selection Pattern = "
  SELECT DISTINCT J.JobId, J.StartTime
    FROM Job J, Pool P
   WHERE P.Name = 'DiffFile'
     AND P.PoolId = J.PoolId
     AND J.Type = 'B'
     AND J.JobStatus IN ('T','W')
     AND J.jobBytes > 0
     AND J.JobId NOT IN
         (SELECT PriorJobId
            FROM Job
           WHERE Type IN ('B','C')
             AND Job.JobStatus IN ('T','W')
             AND PriorJobId != 0)
ORDER BY J.StartTime
"
}

And there is the job for the Fulls pool:

Job {
  Name     = "CopyToTape-Full"
  Type     = Copy
  Level    = Full
  Pool     = FullFile
  JobDefs  = "DefaultJobCopyDiskToTape"

  Schedule = Never

  Selection Type = SQL Query
  Selection Pattern = "
 SELECT DISTINCT J.JobId, J.StartTime
   FROM Job J, Pool P
  WHERE P.Name = 'FullFile'
    AND P.PoolId = J.PoolId
    AND J.Type = 'B'
    AND J.JobStatus IN ('T','W')
    AND J.jobBytes > 0
    AND J.JobId NOT IN
        (SELECT PriorJobId
           FROM Job
          WHERE Type IN ('B','C')
            AND Job.JobStatus IN ('T','W')
            AND PriorJobId != 0)
ORDER BY J.StartTime
"
}

I ran them manually, to see how well they worked. I was very happy. 🙂

Never say NEVER!

You will see that the schedule used by these jobs is called Never. Here is what that job
looks like:

Schedule {
  Name = "Never"
}

Lacking a Run directive, this Jobs running under this Schedule will never run. I will be
creating a new schedule just for copy jobs. That will not be covered in this article.

Much better

Copy jobs are fantastic. I really like them. They enable me to do the copy from disk to tape
that I’ve always wanted to do. Your backups are streamed to disk, the quickest way to backup.
This reduces the backup window for your clients. You can them backup to tape at any time. You
can also afford to use slower media because of this.

What am I using? DLT 7000. It has been amazing reliable, for used hardware and tapes.

How does this approach improve upon the old approach? This solution puts all the jobs
headed to the same pool into one batch. This means less changing of tapes. My old approach
would put incremental, full, and diffs all into the same batch of jobs. The new approach
deals with only one level / pool at a time.

2 thoughts on “Bacula – copying a job; doing it better”

  1. We used bacula a few years ago,
    but with the setup we had back then,
    it made fuzzy backups of SQL.
    I take it you’ve set up a separate job to make a
    valid SQL dump before bacula visits at night.

Leave a Comment

Scroll to Top