rsync – synchronizing two file trees
This article originally appeared quite some time ago. But for some
unknown reason, it was lost from the indexes. I’ve just come back to upgrade it with
some new error observations.
We now return you to your regularly
scheduled read…
rsync
is an amazing and powerful tool for moving files around. I know
of people that use it for file transfers, keeping dns server records up-to-date, and
along with sshd
to remote restart the services when rsync reports a file change
(how they do that, I don’t know, I’m just told they do it).
This article describes how you can use rsync
to synchronize file trees.
In this case, I’m using two websites to make sure one is a backup of the other. As
an example, I’ll be making sure that one box contains the same files as the other box in
case I need to put the backup box into production, should a failure occur.
Overview
rsync
can be used in six different ways, as documented in man rsync
:
-
for copying local files. This is invoked when neither source nor destination path contains a :
separator -
for copying from the local machine to a remote
machine using a remote shell program as the transport (such as rsh or ssh). This is invoked when the
destination path contains a single : separator. -
for copying from a remote machine to the local
machine using a remote shell program. This is
invoked when the source contains a : separator. -
for copying from a remote rsync server to the local
machine. This is invoked when the source path contains a :: separator or a rsync:// URL. -
for copying from the local machine to a remote
rsync server. This is invoked when the destination
path contains a :: separator. -
for listing files on a remote machine. This is done
the same way as rsync transfers except that you
leave off the local destination.
I’ll only be looking at copying from a remote rsync server (4) to a local machine and when using a remote shell program (2).
Installing
This was an easy port to install (aren’t they all, for the most part?).
Remember, I have the entire ports tree, so I did this:
# cd /usr/ports/net/rsync
# make install
If you don’t have the ports tree installed, you have a bit more work to do…. As far as I know,
you need rsync installed on both client and server, although you do not need to be running rsyncd
unless you are connecting via method 4.
Setting up the server
In this example, we’re going to be using a remote rsync server (4).
On the production web server, I created the /usr/local/etc/rsyncd.conf
file. The contents is based on man rsyncd.conf
.
uid = rsync gid = rsync use chroot = no max connections = 4 syslog facility = local5 pid file = /var/run/rsyncd.pid [www] path = /usr/local/websites/ comment = all of the websites
You’ll note that I’m running rsync
as rsync:rsync
. I added
lines to vipw and /etc/group
to reflect the new user. Something like
this:
rsync:*:4002:4002::0:0:rsync daemon:/nonexistent:/sbin/nologin
and
rsync:*:4002:
Then I started the rsync daemon and verified it was running by doing this:
# rsync --daemon
# ps auwx | grep rsync
root 30114 0.0 3.7 936 500 ?? Ss 7:10PM 0:00.04 rsync --daemon
And I found this in /var/log/messages
:
rsyncd[30114]: rsyncd version 2.3.2 starting
Then I verified that I could connect to the daemon by doing this:
# telnet localhost 873
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
@RSYNCD: 21
I determined the port 873 by looking at man rsyncd.conf
.
See the security section for more information.
You can also specify a login and user id. But if you do that, I suggest you make /usr/local/etc/rsyncd.conf
non-world readable:
chmod 640 /usr/local/etc/rsyncd.conf
This example is straight from the man page. Add this to the configuration file:
auth users = tridge, susan
secrets file = /usr/local/etc/rsyncd.secrets
The /usr/local/etc/rsyncd.secrets
file would look something like this:
tridge:mypass
susan:herpass
And don’t forget to hide that file from the world as well:
chmod 640 /usr/local/etc/rsyncd.secrets
Setting up the client
You may have to install rsync on the client as well.. There wasn’t much to set up on the client. I merely issued the
following command. The rsync
server in question is ducky
.
rsync -avz ducky::www /home/dan/test
In the above example, I’m connecting to ducky, getting the www
collection, and putting it all in /home/dan/test.
And rsync
took off! Note that I have not implemented any security here
at all. See the security section for that.
I checked the output of my first rsync
and decided I didn’t want everything
transferred. So I modified the command to this:
rsync -avz --exclude nz.freebsd.org/* --exclude wusage/* ducky::www /home/dan/test
See the man pages for more exclusion options.
I also wanted deleted server files to be deleted on the client. So I did this:
rsync -avz --delete ducky::www /home/dan/test
Of course, you can combine all of these arguments to suit your needs.
I found the --stats
option interesting:
Number of files: 2707
Number of files transferred: 0
Total file size: 16022403 bytes
Total transferred file size: 0 bytes
Literal data: 0 bytes
Matched data: 0 bytes
File list size: 44388
Total bytes written: 132
Total bytes read: 44465
Security
My transfers are occur on a trusted network and I’m not worried about
the contents of the transfer being observed. However, you can use ssh as the transfer medium by using the following command:
rsync -e ssh -avz ducky:www test
Note that this differs from the previous example in that you have only one : (colon) not two as in
the previous example. See man rsync for details. In this example, we will be grabbing the contents
of ~/www from host ducky using our existing user login. The contents of the remote directory will be
synchronized with the local directory test
.
Now if you try an rsync
, you’ll see this:
$ rsync -e ssh -avz --delete ducky:www /home/dan/test
Password:
@ERROR: auth failed on module www
Here I supplied the wrong password and I didn’t specify the user ID. I suspect it
used my login. A check of the man page confirmed this. This was my next
attempt. You can see that I added the user name before the host, ducky
..
$ rsync -e ssh -avz --delete susan@ducky:www /home/dan/test
Password:
receiving file list ... done
wrote 132 bytes read 44465 bytes 1982.09 bytes/sec
total size is 16022403 speedup is 359.27
In this case, nothing was transferred as I’d already done several successful rsyncs
.
The next section deals with how to use a password in batch mode.
Do it on a regular basis
There’s no sense in having an rsync
set up if you aren’t going to
use it on a regular basis. In order to use rsync from a cron job, you should supply
the password in a non-world readable file. I put my password in /home/dan/test/rsync.password
.
Remember to chmod 640
that password file!
I put the command into a
script file (rsync.sh
), which looks like this:
#!/bin/sh
/usr/local/bin/rsync -e ssh -avz --stats --delete susan@ducky::www /home/dan/test --password-file /home/dan/test/rsync.password
Remember to chmod 740
the script file!
Then I put this into /etc/crontab
in order to run this command every hour
(this should be all on one line):
7 * * * * root /usr/home/dan/rsync.sh 2>&1 | mail -s "rsync script" root
The above will mail you a copy of the output.
If you want to use ssh as your transport medium, I suggest using using the authorized_keys
feature.
My comments
I think rsync
is one of the most powerful tools I’ve seen for
transferring files around a network and the Internet. It is just so powerful! Although
I actually use cvsup
to publish the Diary, I am still impressed with rsync
.
Some recent errors I encountered
I was recently adding some new files to my rsync
tree. I
found these errors:
receiving file list ... opendir(log): Permission denied
opendir(fptest): Permission denied
opendir(example.com): Permission denied
opendir(example.org): Permission denied
readlink dan: Permission denied
opendir(default): Permission denied
It took me a while to understand the problem. It’s a read issue. rsyncd
didn’t have permission to read the files in question. You can either make rsynd
run as a different user, or change the permissions on the files.
If you get the user id for rsync wrong, you’ll see this error:
$ rsync -avz xeon::www /home/dan/rsynctest
@ERROR: invalid uid
I had the rsync
user misspelt as rysnc
.
You should separate the PATH from HOSTNAME with only one ‘:’ to use SSH otherwise rsync still using it method to transport from server to clients.
Another error is that using the option –password-file is not useful when using ssh transport.
Quoting from man:
"This option allows you to provide a password in a file for accessing a remote rsync server. Note that`this option is only useful when accessing a rsync server using the built in transport, not when using a remote shell as the transport."
Regards,
Massimo
Massimo Lusetti wrote:
>
> You should separate the PATH from HOSTNAME with only one ‘:’
> to use SSH otherwise rsync still using it method to transport
> from server to clients.
Right you are. Thank you. I’ll correct the article.
> Another error is that using the option –password-file is not
> useful when using ssh transport.
> Quoting from man:
> "This option allows you to provide a password in a file for
> accessing a remote rsync server. Note that`this option is
> only useful when accessing a rsync server using the built in
> transport, not when using a remote shell as the transport."
DOH! Thank you. I’ve updated the article. How’s it look now?
Combining RSA authentication built in SSH with rsync, provides a very secure way to keep file trees in sync.
Perhaps your article can point this out, and possibly reference an article I believe you had about using SSH in this precise way.
Regards.
-lem
I am having this error when trying to retrive files:
receiving file list … opendir(log): Permission denied
opendir(fptest): Permission denied
opendir(example.com): Permission denied
What could be the problem, how do you provide rsyncd permission and where is rsyncd, please do assist thanks
regards
martin
have you read the <A HREF="/rsync.php">rsync how to</A> on this site?
Great Article! Easy to understand and follow. I have my trees syncing up. I will be pushing the scripts to production after some more testing.
Thanks!
Chris Peterson
So I played around with the rsync and the article got me going with the basics. I was able to transfer directory structures and everything worked great.
I have a unique problem that i wanted rsync to solve. However, on initial use i was not able to use it the way i originally intended and found a work around. I will setup the scenario for you and maybe someone can help find a better solution.
I have MachineA and a hundred of MachineB’s
MachineA has a directory full of files:
/data
Each Machine B has a database with a list of files it needs. Which may not be a complete list of files on MachineA. I have a script that pulls the file list out of the database and I want to sync those files only. If a file gets removed from the database i want it removed from the directory. If a file gets added to the database or updated i want rsync to grab the file.
I tried many different ways to do rsync and found that I couldn’t execute a command like this:
#Pull
machineA# rsync -e ssh –delete –stats -avzcpr root@machineB:`/scripts/grabmedia.pl` /media
#Push (this however worked)
machineA# rsync -e ssh –delete –stats -avzcpr `/scripts/grabmedia.pl` machineB:/media
I wanted to Pull because machine B is a remote site and wanted each machine to iniate the download for speed reasons. (does it make a difference?)
The solution i came up with is below to make it work was:
setup ssh authorized keys and then execute this command
#/bin/sh
# — syncfiles
ssh chris@machineA "rsync -e ssh –stats -avz `/scripts/grabmedia.pl` chris@machineB:/media"
Is my logic all off? Should i just be running the cronjobs on the server and push to all 100?
I thought the pull(download) was quicker. Plus the machineB’s are the ones keeping track of what files it needs.
Any help would be cool.
Thanks in advance!
Chris Peterson
I think you should be asking your question in the Support forum where many more people will read it, not just those reading this article and then clicking on the comments.
My requirement:- I use multiple windows machines for editing
my code, sometimes my latop, sometimes my desktop, sometimes
ssh-ing directly on my FreeBSD server. I would also like to
manage the code via a cvs server with :pserver: authentication.
So I first setup a cvsserver with pserver type authentication
and tried using WinCVS from windows and cvs from the unix side.
The problem was the cvs "working directory" files created
from the server records the IP address of the repository and
that IP wouldn’t route correctly when accessed from a different
machine. I would also be forced to use the default port number
2401 for cvs since the way of specifying non-default port
numbers seems to be different with wincvs 1.2 ( the latest
1.3 beta doesn’t work on my machine – so I couldn’t try that).
Another issue was that I wouldn’t like to force myself
to checkin the code half way just because I would like to
move to patio and work with the laptop. So I
decided to sync the working directories across machines.
The solution:-
I run ssh and create tunnel from 127.0.0.1 port 2401 to
my-servers-ip port X where X can be any port (including 2401
itself). If X is anything other than 2401, you would need a
similar setup on the server machine as well (when u work directly
from the machine via ssh+vi).
Now I can do all cvs operations via the fake :pserver: connection
at 127.0.0.1:2401 and have that work from any machine so long
they have ssh. Using ssh tunnel has the added advantage that
I don’t have to open up any port on the firewall even if I
am connected from outside (internet).
Now the working directories are rsynced across. BTW this setup does not require a rsync server setup.
Now whenever I want to switch my editing from my desktop to
laptop, I just save my files, run the ‘upsync.sh’ script,
go to the latop, run the ‘downsync.sh’ and fireup my editor.
When I am finally done with a version, tested everything,
I can checkin the files from whereever I am.
I can post the scripts if anyone is interested.
note: please remove the dots if u intend to reply.
—
Hari Bhaskaran
At first I thought it would be a frigtening thing to mirror my mysql databases. Everything I had read involved using mysqldump for each database in mysql .. then transferring all the databases over and loading them into the backup database server.
I wanted the flexibility of being able to add a database at any time without having to modify my backup script.
For my solution I am using a primary server, which is live to the world, and a secondary server, which is tucked away waiting for the primary to fail (also on a seperate net connection just in case).
On the primary, I’m running rsyncd with the following entry for mysql:
[mysql]
path = /var/db/mysql/
comment = mysql databases
On the secondary (backup) server, I made a script that stops mysqld, rsyncs the databases, and brings mysql back up (with extra verbosity for the fun of it):
#!/bin/sh
echo "—————-"
echo "–msqyl-rsync—"
echo "—————-"
date
echo "—————-"
# stop mysql server
echo "stopping mysql server…"
/usr/bin/killall mysqld > /dev/null 2>&1 && echo " stop mysqld"
echo "—————-"
# run rsync
echo "starting rsync…"
rsync -avz –stats –delete 10.0.0.2::databases /var/db/mysql
echo "—————-"
# start mysql up again
echo "starting mysql server…"
if [ -x /usr/local/bin/safe_mysqld ]; then
/usr/local/bin/safe_mysqld –user=mysql > /dev/null & && echo " start mysqld"
fi
echo "—————-"
echo "script end"
This script is called by a cron job that runs nightly (and redirects output to an email to me) but can also, of course, be run manually when necessary.
So far, I haven’t had any problems with this method, but if anyone sees any flaw in this, please let me know.
Oops! I made a mistake
The rsync line of the script should read as follows:
rsync -avz –stats –delete 10.0.0.2::mysql /var/db/mysql
When I run the following script on my command line, it works OK, but when I put it into cronjob for every minute, it does not do anything. I am trying to push my source to all production servers every minute ( it is a test right now ).
Please advise.
======= begin rsync.sh ===================
#!/bin/sh
/usr/local/bin/rsync -r -q -b -t /apps/rca/migr/bin/ /apps/rca/prod/bin;
/usr/local/bin/rsync -e ssh -rqbt /apps/rca/prod/bin/ oracle@irmuxs620:/home/oracle/prod/bin;
/usr/local/bin/rsync -e ssh -rqbt /apps/rca/prod/bin/ oracle@irmuxs630:/home/oracle/prod/bin;
============= end rsync.sh
Did you press ENTER at the end of the line in your crontab?
Show us your crontab entry.
Hi Dan,
Thanks for prompt attention. I have resolved the issue, it was due to the path for ssh.
Since you are going to read this. What I should do completely replace the contents of my destination ie irmuxs620 from local machine ie /apps/rca/prod/bin/
/usr/local/bin/rsync -e ssh -rqbt /apps/rca/prod/bin/ oracle@irmuxs620:/home/oracle/prod/bin
Again thanks a lot !Dan Langille wrote:
>
> Did you press ENTER at the end of the line in your
> crontab?
>
> Show us your crontab entry.
CORRECTION :
Since you are going to read this. What I want to do is to completely replace the contents of my destination ie irmuxs620 from local machine ie /apps/rca/prod/bin/
/usr/local/bin/rsync -e ssh -avz –stats –delete susan@ducky::www /home/dan/test –password-file /home/dan/test/rsync.password
Shouldn’t that be:
/usr/local/bin/rsync -e ssh -avz –stats –delete susan@ducky:www /home/dan/test –password-file /home/dan/test/rsync.password
Notice the single : instead of 2. SSH uses the :/path/to/remote option, so it doesn’t read the rsyncd.conf module on the remote machine.
Check man rsync. And look under GENERAL.
Hi there,
This might seem like a silly question, but what format is the password file used with the –password-file command?
I’ve exported RSYNC_PASSWORD=<password>, with no luck and put a plain-text password in /etc/rsync.pass, and chmod’d it to 640
then using the parameter –password-file /etc/rsync.pass
It -still- asks for the password.
(btw I am using -rsh=’ssh -l <username>’ to run the remote shell command… each time I run it, I get a
Password:
prompt.
Any thoughts?
-Geoff
From man rsync:
–password-file
This option allows you to provide a password in a file for
accessing a remote rsync server. Note that this option is only
useful when accessing a rsync server using the built in trans-
port, not when using a remote shell as the transport. The file
must not be world readable. It should contain just the password
as a single line.
So…
1 – it’s plain text, a single line
2 – you should be using it when using remote shell as the tranport
I think that means it won’t work with ssh. Your choice is then an ssh key with a blank/empty passphrase. Search for "authorised keys" in the article.
Mind you, I see myself using the password file and ssh as the medium, so I don’t know what happened there.
Well, the file -is- plain text and -is- a single line
ie echo password > whatever-file-you want
Nevermind, I’ve skipped on ssh for now and am just using the rsync protocol in the daily shell script. It seems to work ok with that, without using the –password-file, and by setting the password as an environment variable (Yes, it clears the environment var directly after use, though I still know this is not as secure as it could be). Seems to work ok. I’ve also fixed up another problem I was havng where rsync would drop out frequently, and throw a time out error with a relatively generous time out. using the bwlimit parameter and allowing for a few kbps fixed it right up. Probably something to do with the slow response of some sort of ack in one direction or the other.
Basically on a 512k/128k ADSL connection, it would of been sending out at about 12-13k/s max. Restricting this to 9k/s allowing some k’s left for other services (and perhaps something this protocol uses to acknowledge transfer? I’m not sure how it works exactly) and got rid of any time out issues that occured.
Hi,
I am wondering whether there is a way to rsync from master IDE disk to salve IDE disk. The master disk and salve disk are on the same PC.
Best regards,
Jennifer
No reason why not.
—
The Man Behind The Curtain