Optimize a CVS server

CVS is not famous because of its scalabilty. Especially for operations like tag and log the amount of data stored into the manipulated module is very signifiant for the operation time length. Even on a medium project, these operations can take several minutes whereas small operations like add, remove, commit, … keep a reasonable response time.

Even if a modern response to this problem is migrating the repository to subversion. There are some tips to keep your CVS server optimized.

A better CVS usage

Limit the repository size

Try to keep the repository size reasonnable. Think that tag and log operations depend of the repository size (not the workspace size). You can have a project with a checkout of 100 Mb and a repository of 1 Gb …

Some tips to limit the repository size :

  • avoid frequently modified binary files
  • remove useless versions in the RCS files: old versions not tagged are often totally useless. See the rcs documentation, particularly the -o option
  • think about split big projects into several modules

CVS Bots

Take care of the cvs usage made by automatic tools :

  • continuous integration tools (like CruiseControl) use a lot of log and tag operations, check the schedule periods, etc …
  • periodic backup (with cvslock) can be optimized by processing each cvs module separately

Server performance

After optimize your CVS usage, it's time to optimize the CVS server performance.

  • avoid Windows CVS servers. The CVS-nt version consumes a lot of CPU and file system are very slow
  • use the pserver protocol without compression (only on a local network)
  • check if some RAM can be useful
  • use a local filesystem (if you find the best filesystem, tell us ;-)
  • optmize the disk storage
    • use several dedicated disks to store the repository
    • prefer having several disks than a big one
    • striping is welcome (via lvm or a RAID controler)
    • SCSI (or SATA) is usefull to obtain highest performances : repository reads can reach 140 Mb/s with simply stripped discs via lvm
  • check the network bandwidth of your CVS server
  • disable the history log
  • use a dedicated lock directory (like /var/lock/cvs and use a tmpfs filesystem)
  • the same thing can be done for the temp directory
  • use a journaling filesystem for the CVS repository
  • disable the atime support of your file system
  • monitor your system (iostats is your friend)

One or several servers

Dispatch several projects on several servers isn't necessary the best solution. Most of the time, the activity peaks are due to a particular project activity (delivery for instance). It seems better to have a big CVS server than several small servers. The activity peak will be supported by a big server rather than one small server.

You can test use clustering technologies .. and tell us how you do it ;o)

On the same server, having multiple repositories doesn't seem to be useful. The only possible limitations are some locked files into the CVSROOT directory.

 
optimize_a_cvs_server.txt · Last modified: 2006/10/22 12:44 by alban
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Run by Debian Driven by DokuWiki