Bacula – copying a job; doing it better
I recently wrote about Bacula’s
Copy job option. In that solution,
I used an approach which copied all uncopied jobs from disk to tape. I’ve
found a few annoyances with that initial approach and now I’m experimenting
with a new approach. An added side-effect: less wear on the tape library.
My initial approach used the Selection Type = PoolUncopiedJobs. My
new approach, originally discarded, will use Selection Type = SQL Query,
three jobs, each with a different pool. The results are much smoother and
faster.
Duplicate Job Control
By default, Bacula allows
for duplicate jobs. For the purposes of defining a duplicate job, Bacula goes by the job name.
Various directives for fine tuning of duplicates. This comes into play with the solution
I have selected.
The approach I am using will select multiple jobs for copying. Thus, the potential for duplicates
is high, and almost guaranteed given the nature of backups. Therefore, if the original jobs do not
allow duplicate and your selection method returns duplicate jobs, you will wind up with
cancellations. Which means the next copy job will pick up the targets which were canceled.
This can wind up in a never ending cycle.
The goal of my jobs are to copy any and all jobs that meet my criteria, thus I have removed all my
duplicate job control. This may sound rash, but I may resolve the problem of duplicate jobs by
using Max Start Delay and/or Max Wait Time = time.
The new copy job
This is my new copy job. It works on my file based pool for incrementals backup. Note the
following points:
- Level = Incremental
- Pool = IncrFile
- P.Name = ‘IncrFile’
The goal of this job is to copy jobs from the IncrFile pool over to tape.
Job { Name = "CopyToTape-Inc" Type = Copy Level = Incremental Pool = IncrFile JobDefs = "DefaultJobCopyDiskToTape" Schedule = Never Selection Type = SQL Query Selection Pattern = " SELECT DISTINCT J.JobId, J.StartTime FROM Job J, Pool P WHERE P.Name = 'IncrFile' AND P.PoolId = J.PoolId AND J.Type = 'B' AND J.JobStatus IN ('T','W') AND J.jobBytes > 0 AND J.JobId NOT IN (SELECT PriorJobId FROM Job WHERE Type IN ('B','C') AND Job.JobStatus IN ('T','W') AND PriorJobId != 0) ORDER BY J.StartTime LIMIT 1; " }
I added the LIMIT clause for testing purposes. My production job does not have that clause.
I wanted to see this works with just one job to copy; walk first, run later.
Originally, that went to the Fulls tape pool
This section outlines a problem I encountered and how I solved it. It also shows you
the JobDefs I’m using.
When I initially ran this job, the result went to the Fulls tape pool, not to the Incrementals tape pool
as I expected:
02-Feb 19:39 bacula-dir JobId 50343: The following 1 JobId was chosen to be copied: 45849 02-Feb 19:39 bacula-dir JobId 50343: Copying using JobId=45849 Job=ngaio.2011-01-11_05.55.02_52 02-Feb 19:39 bacula-dir JobId 50343: Bootstrap records written to /home/bacula/working/bacula-dir.restore.83.bsr
The above is the bconsole output. The below is some SQL for my own curiosity.
bacula=# select jobid, job, name, type, level, poolid from job where jobid = 50343; jobid | job | name | type | level | poolid -------+---------------------------------------+----------------+------+-------+-------- 50343 | CopyToTape-Inc.2011-02-02_19.39.15_03 | CopyToTape-Inc | c | F | 3 (1 row) bacula=# select poolid, name, nextpoolid from pool where poolid = 3; poolid | name | nextpoolid --------+-------+------------ 3 | Fulls | (1 row) bacula=#
Interesting. There is no value for nextpoolid. In fact, none of the pools have this value set.
That is not, however, the reason why the job went to the wrong pool.
However, I think the clue is here, in the bconsole output from when I started the job:
JobName: CopyToTape-Inc Bootstrap: *None* Client: polo-fd FileSet: Full Set Pool: FullFile (From Job resource) Read Storage: MegaFile (From Pool resource) Write Storage: DigitalTapeLibrary (From Storage from Pool's NextPool resource) JobId: *None* When: 2011-02-02 19:39:14 Catalog: MyCatalog Priority: 400 OK to run? (yes/mod/no): yes Job queued. JobId=50343
Right there… you can see source pool is FullFile. Why? It should be IncrFile. That would be
why the write pool is Fulls. As can be seen in the definition of the FullFile Pool:
Pool { Name = FullFile Pool Type = Backup Recycle = yes AutoPrune = yes Volume Retention = 3 years Storage = MegaFile Next Pool = Fulls Maximum Volume Bytes = 5G LabelFormat = "FullAuto-" }
Oh, but I think I found the cause. The JobDefs…
JobDefs { Name = "DefaultJobCopyDiskToTape" Type = Backup Level = Incremental Client = polo-fd FileSet = "Full Set" Schedule = "WeeklyCycleForCopyingToTape" Storage = DigitalTapeLibrary Messages = Standard Pool = FullFile # required parameter for all Jobs # # since this JobDef is meant to be used with a Copy Job # these Pools are the source for the Copy... not the destination. # The Destination is determined by the Next Pool directive in # the respective Pools. # Full Backup Pool = FullFile Differential Backup Pool = DiffFile Incremental Backup Pool = IncrFile Priority = 400 # don't spool date when backing up to tape from local disk Spool Data = no Spool Attributes = yes RunAfterJob = "/home/dan/bin/dlt-stats-kraken" # no sense spooling local data Spool Data = no Spool Attributes = yes Maximum Concurrent Jobs = 6 }
Right there… the pool is defined as FullFile. Thus, I altered my original Job
to included Pool = IncrFile as shown above. But before I made that change,
I discovered that this also worked:
run job=CopyToTape-Inc Pool=IncrFile yes
I ran a few test cases and I was satisfied that it would do what I wanted.
I removed the LIMIT clause, issued a reload in bconsole, and ran the job
manually. It took several hours to run.
But wait! There’s more!
The above job works on the IncrFile pool. But I have other pools that need copying to tape.
The following query shows much much data is waiting.
bacula=# SELECT P.Name, pg_size_pretty(sum(J.jobbytes)::bigint) bacula-# FROM Job J, Pool P bacula-# WHERE P.Name in ( 'IncrFile' , 'DiffFile', 'FullFile' ) bacula-# AND P.PoolId = J.PoolId bacula-# AND J.Type = 'B' bacula-# AND J.JobStatus IN ('T','W') bacula-# AND J.jobBytes > 0 bacula-# AND J.JobId NOT IN bacula-# (SELECT PriorJobId bacula(# FROM Job bacula(# WHERE Type IN ('B','C') bacula(# AND Job.JobStatus IN ('T','W') bacula(# AND PriorJobId != 0) bacula-# group by 1; name | pg_size_pretty ----------+---------------- FullFile | 212 GB IncrFile | 57 GB DiffFile | 94 GB (3 rows) bacula=#
I will have two more jobs to create; one for Fulls, another for Differentials. They appear below:
Job { Name = "CopyToTape-Diff" Type = Copy Level = Differential Pool = DiffFile JobDefs = "DefaultJobCopyDiskToTape" Schedule = Never Selection Type = SQL Query Selection Pattern = " SELECT DISTINCT J.JobId, J.StartTime FROM Job J, Pool P WHERE P.Name = 'DiffFile' AND P.PoolId = J.PoolId AND J.Type = 'B' AND J.JobStatus IN ('T','W') AND J.jobBytes > 0 AND J.JobId NOT IN (SELECT PriorJobId FROM Job WHERE Type IN ('B','C') AND Job.JobStatus IN ('T','W') AND PriorJobId != 0) ORDER BY J.StartTime " }
And there is the job for the Fulls pool:
Job { Name = "CopyToTape-Full" Type = Copy Level = Full Pool = FullFile JobDefs = "DefaultJobCopyDiskToTape" Schedule = Never Selection Type = SQL Query Selection Pattern = " SELECT DISTINCT J.JobId, J.StartTime FROM Job J, Pool P WHERE P.Name = 'FullFile' AND P.PoolId = J.PoolId AND J.Type = 'B' AND J.JobStatus IN ('T','W') AND J.jobBytes > 0 AND J.JobId NOT IN (SELECT PriorJobId FROM Job WHERE Type IN ('B','C') AND Job.JobStatus IN ('T','W') AND PriorJobId != 0) ORDER BY J.StartTime " }
I ran them manually, to see how well they worked. I was very happy. 🙂
Never say NEVER!
You will see that the schedule used by these jobs is called Never. Here is what that job
looks like:
Schedule { Name = "Never" }
Lacking a Run directive, this Jobs running under this Schedule will never run. I will be
creating a new schedule just for copy jobs. That will not be covered in this article.
Much better
Copy jobs are fantastic. I really like them. They enable me to do the copy from disk to tape
that I’ve always wanted to do. Your backups are streamed to disk, the quickest way to backup.
This reduces the backup window for your clients. You can them backup to tape at any time. You
can also afford to use slower media because of this.
What am I using? DLT 7000. It has been amazing reliable, for used hardware and tapes.
How does this approach improve upon the old approach? This solution puts all the jobs
headed to the same pool into one batch. This means less changing of tapes. My old approach
would put incremental, full, and diffs all into the same batch of jobs. The new approach
deals with only one level / pool at a time.
We used bacula a few years ago,
but with the setup we had back then,
it made fuzzy backups of SQL.
I take it you’ve set up a separate job to make a
valid SQL dump before bacula visits at night.
Yes. It’s part of the Run-Before script.
—
The Man Behind The Curtain