Subsections

Introduction

Darcs is a revision control system, along the lines of CVS or arch. That means that it keeps track of various revisions and branches of your project, allows for changes to propagate from one branch to another. Darcs is intended to be an ``advanced'' revision control system. Darcs has two particularly distinctive features which differ from other revision control systems: 1) each copy of the source is a fully functional branch, and 2) underlying darcs is a consistent and powerful theory of patches.

Every source tree a branch

The primary simplifying notion of darcs is that every copy of your source code is a full repository. This is dramatically different from CVS, in which the normal usage is for there to be one central repository from which source code will be checked out. It is closer to the notion of arch, since the `normal' use of arch is for each developer to create his own repository. However, darcs makes it even easier, since simply checking out the code is all it takes to create a new repository. This has several advantages, since you can harness the full power of darcs in any scratch copy of your code, without committing your possibly destabilizing changes to a central repository.

Theory of patches

The development of a simplified theory of patches is what originally motivated me to create darcs. This patch formalism means that darcs patches have a set of properties, which make possible manipulations that couldn't be done in other revision control systems. First, every patch is invertible. Secondly, sequential patches (i.e. patches that are created in sequence, one after the other) can be reordered, although this reordering can fail, which means the second patch is dependent on the first. Thirdly, patches which are in parallel (i.e. both patches were created by modifying identical trees) can be merged, and the result of a set of merges is independent of the order in which the merges are performed. This last property is critical to darcs' philosophy, as it means that a particular version of a source tree is fully defined by the list of patches that are in it, i.e. there is no issue regarding the order in which merges are performed. For a more thorough discussion of darcs' theory of patches, see Appendix A.

A simple advanced tool

Besides being ``advanced'' as discussed above, darcs is actually also quite simple. Versioning tools can be seen as three layers. At the foundation is the ability to manipulate changes. On top of that must be placed some kind of database system to keep track of the changes. Finally, at the very top is some sort of distribution system for getting changes from one place to another.

Really, only the first of these three layers is of particular interest to me, so the other two are done as simply as possible. At the database layer, darcs just has an ordered list of patches along with the patches themselves, each stored as an individual file. Darcs' distribution system is strongly inspired by that of arch. Like arch, darcs uses a dumb server, typically apache or just a local or network file system when pulling patches. Unlike arch, darcs can only use scp to write to a remote file system. The recommended method is to send patches through gpg-signed email messages, which has the advantage of being mostly asynchronous.

Keeping track of changes rather than versions

In the last paragraph, I explained revision control systems in terms of three layers. One can also look at them as having two distinct uses. One is to provide a history of previous versions. The other is to keep track of changes that are made to the repository, and to allow these changes to be merged and moved from one repository to another. These two uses are distinct, and almost orthogonal, in the sense that a tool can support one of the two uses optimally while providing no support for the other. Darcs is not intended to maintain a history of versions, although it is possible to kludge together such a revision history, either by making each new patch depend on all previous patches, or by tagging regularly. In a sense, this is what the tag feature is for, but the intention is that tagging will be used only to mark particularly notable versions (e.g. released versions, or perhaps versions that pass a time consuming test suite).

Other revision control systems are centered upon the job of keeping track of a history of versions, with the ability to merge changes being added as it was seen that this would be desirable. But the fundamental object remained the versions themselves.

In such a system, a patch (I am using patch here to mean an encapsulated set of changes) is uniquely determined by two trees. Merging changes that are in two trees consists of finding a common parent tree, computing the diffs of each tree with their parent, and then cleverly combining those two diffs and applying the combined diff to the parent tree, possibly at some point in the process allowing human intervention, to allow for fixing up problems in the merge such as conflicts.

In the world of darcs, the source tree is not the fundamental object, but rather the patch is the fundamental object. Rather than a patch being defined in terms of the difference between two trees, a tree is defined as the result of applying a given set of patches to an empty tree. Moreover, these patches may be reordered (unless there are dependencies between the patches involved) without changing the tree. As a result, there is no need to find a common parent when performing a merge. Or, if you like, their common parent is defined by the set of common patches, and may not correspond to any version in the version history.

One useful consequence of darcs' patch-oriented philosophy is that since a patch need not be uniquely defined by a pair of trees (old and new), we can have several ways of representing the same change, which differ only in how they commute and what the result of merging them is. Of course, creating such a patch will require some sort of user input. This is a Good Thing, since the user creating the patch should be the one forced to think about what they really want to change, rather than the user merging the patch. An example of this is the token replace patch (See Section A.5). This feature make it possible to create a patch, for example, which changes every instance of the variable ``stupidly_named_var'' with ``better_var_name'', while leaving ``other_stupidly_named_var'' untouched. When this patch is merged with any other patch involving the ``stupidly_named_var'', that instance will also be modified to ``better_var_name''. This is in contrast to a more conventional merging method which would not only fail to change new instances of the variable, but would also involves conflicts when merging with any patch that modifies lines containing the variable. By more using additional information about the programmer's intent, darcs is thus able to make the process of changing a variable name the trivial task that it really is, which is really just a trivial search and replace, modulo tokenizing the code appropriately.

The patch formalism discussed in Appendix A is what makes darcs' approach possible. In order for a tree to consist of a set of patches, there must be a deterministic merge of any set patches, regardless of the order in which they must be merged. This requires that one be able to reorder patches. While I don't know that the patches are required to be invertible as well, my implementation certainly requires invertibility. In particular, invertibility is required to make use of Theorem 2, which is used extensively in the manipulation of merges.

Features

Record changes locally

In darcs, the equivalent of a cvs ``commit'' is called record, because it doesn't put the change into any remote or centralized repository. Changes are always recorded locally, meaning no net access is required in order to work on your project and record changes as you make them. Moreover, this means that there is no need for a separate ``disconnected operation'' mode.

Interactive records

You can choose to perform an interactive record, in which case darcs will prompt you for each change you have made and ask if you wish to record it. Of course, you can tell darcs to record all the changes in a given file, or to skip all the changes in a given file, or go back to a previous change, or whatever. There is also an experimental graphical interface, which allows you to view and choose changes even more easily, and in whichever order you like.

Unrecord local changes

As a corrolary to the ``local'' nature of the record operation, if a change hasn't yet been published to the world--that is, if the local repository isn't accessible by others--you can safely unrecord a change (even if it wasn't the most recently recorded change) and then re-record it differently, for example if you forgot to add a file, introduced a bug or realized that what you recorded as a single change was really two separate changes.

Interactive everything else

Most darcs commands support an interactive interface. The ``revert'' command, for example, which undoes unrecorded changes has the same interface as record, so you can easily revert just a single change. Pull, push, send and apply all allow you to view and interactively select which changes you wish to pull, push, send or apply.

Test suites

Darcs has support for integrating a test suite with a repository. If you choose to use this, you can define a test command (e.g. ``make check'') and have darcs run that command on a clean copy of the project either prior to recording a change or prior to applying changes--and to reject changes that cause the test to fail.

Any old server

Darcs does not require a specialized server in order to make a repository available for read access. You can use http, ftp, or even just a plain old ssh server to access your darcs repository.

You decide write permissions

Darcs doesn't try to manage write access. That's your business. Supported push methods include direct ssh access (if you're willing to give direct ssh access away), using sudo to allow users who already have shell access to only apply changes to the repository, or verification of gpg-signed changes sent via email against a list of allowed keys. In addition, there is good support for submission of patches via email that are not automatically applied, but can easily be applied via a shell escape from a mail reader (this is how I deal with contributions to darcs).

Symmetric repositories

Every darcs repository is created equal (well, with the exception of a ``partial'' repository, which doesn't contain a full history...), and every working directory has an associated repository. As a result, there is a symmetry between ``uploading'' and ``downloading'' changes--you can use the same commands (push or pull) for either purpose.

CGI script

Darcs has a CGI script that allows browsing of the repositories.

Portable

Darcs runs on UNIX (or UNIX-like) systems (which includes MacOS X) as well as on Microsoft Windows.

File and directory moves

Renames or moves of files and directories, of course are handled properly, so when you rename a file or move it to a different directory, its history is unbroken, and merges with repositories that don't have the file renamed will work as expected.

Token replace

You can use the ``darcs replace'' command to modify all occurences of a particular token (defined by a configurable set of characters that are allowed in ``tokens'') in a file. This has the advantage that merges with changes that introduce new copies of the old token will have the effect of changing it to the new token--which comes in handy when changing a variable or function name that is used throughout a project.

Configurable defaults

You can easily configure the default flags passed to any command on either a per-repository or a per-user basis or a combination thereof.

Switching from CVS

[FIXME: sections in brackets in this file are notes to myself or explanatory notes indicating something that is incomplete. I must work more on this.]

[Note: this section is incomplete, but is intended to orient CVS users as to how darcs is different, and how to do with darcs what they would have done with CVS.]

Darcs is very different from CVS.

CVS breaks the users into two categories: those who can commit and those who can't. For those who can't, CVS is just a way of getting the latest version. If they want to contribute to the project, they have to use a different tool (probably patch/diff). Darcs doesn't have this clear distinction between those who can commit and those who can't. With darcs, any contributer can take advantage of darcs to make changes and share those changes with others-either with a central repository, or simply with other users who might like to have those improvements. Since it is easy to apply a darcs patch from an email, and easy to use darcs to push patches via email, there is less need to give contributors write access to a centralized repository.

Switching from arch

[Note: this section is incomplete, but is intended to orient arch users as to how darcs is different, and how to do with darcs what they would have done with arch.]

Although arch, like darcs, is a distributed system, and the two systems have many similarities (both require no special server, for example), their essential organization is very different--perhaps more so than the differences between darcs and CVS. But hopefully the biggest difference that arch users will find is that darcs is much simpler and easier to use.

Like CVS, arch has a two level system--there are repositories, and in order to modify a repository one must check out a working directory. This leads to ``interesting'' possibilities such as checking out a working directory from one repository and then committing to a different repository. On top of this, arch has a rather system for dealing with branches and versioning within each repository. Darcs uses a much simpler scheme, in which each working directory has an associated repository carrying just one brach. Every repository (and every working directory) is a branch.

Unlike darcs, arch is fully capable of running in a truly centralized manner, and when used in that manner (i.e. with only one repository) is roughly feature-equivalent (and complexity-equivalent?) with CVS.

When using arch in a distributed manner, each contributer creates a repository to store his or her modifications. Getting those modifications into a central repository then is a two step process. First you do a commit to your repository, and then either you or someone with write permissions on the central repository runs a [I don't recall what command] to move the patchset from your repository to the central one. An analagous process is used in darcs. First you use ``record'' to record your changes locally (this is like committing to your local arch repository). Then either you or someone with write access either pull to the central repository, or use push to send your changes to it (or to its maintainer, if you don't have write access).

Isaac Jones 2004-04-12