(Debian) packaging and Git.
The big picture is as follows. In my view, the most natural way to
work on a packaging project in version control [1] is to have an
upstream branch which either tracks upstream Git/Hg/Svn, or imports of
tarballs (or some combination thereof, and a Debian branch where both
modifications to upstream source and commits to stuff in ./debian
are
added [2]. Deviations from this are mainly motivated by a desire to
export source packages, a version control neutral interchange format
that still preserves the distinction between upstream source and
distro modifications. Of course, if you're happy with the distro
modifications as one big diff, then you can stop reading now gitpkg
$debian_branch $upstream_branch
and you're done. The other easy case
is if your changes don't touch upstream; then 3.0 (quilt)
packages
work nicely with ./debian
in a separate tarball.
So the tension is between my preferred integration style, and making
source packages with changes to upstream source organized in some
nice way, preferably in logical patches like, uh, commits in a
version control system. At some point we may be able use some form of
version control repo as a source package, but the issues with that are
for another blog post. At the moment then we are stuck with
trying bridge the gap between a git repository and a 3.0 (quilt)
source package. If you don't know the details of Debian packaging,
just imagine a patch series like you would generate with git
format-patch
or apply with (surprise) quilt
.
From Git to Quilt.
The most obvious (and the most common) way to bridge the gap between
git and quilt is to export patches manually (or using a helper like
gbp-pq
) and commit them to the packaging repository. This has the
advantage of not forcing anyone to use git or specialized helpers to
collaborate on the package. On the other hand it's quite far from the
vision of using git (or your favourite VCS) to do the integration that
I started with.
The next level of sophistication is to maintain a branch of
upstream-modifying commits. Roughly speaking, this is the approach
taken by git-dpm
, by gitpkg
, and with some additional friction
from manually importing and exporting the patches, by gbp-pq
. There
are some issues with rebasing a branch of patches, mainly it seems to
rely on one person at a time working on the patch branch, and it
forces the use of specialized tools or workflows. Nonetheless, both
git-dpm and gitpkg support this mode of working reasonably well [3].
Lately I've been working on exporting patches from (an immutable) git
history. My initial experiments with marking commits with git notes
more or less worked [4]. I put this on the back-burner for two
reasons, first sharing git notes is still not very well supported by
git
itself [5], and second Gitpkg maintainer Ron Lee convinced me to
automagically pick out what patches to export. Ron's motivation (as I
understand it) is to have tools which work on any git repository
without extra metadata in the form of notes.
Linearizing History on the fly.
After a few iterations, I arrived at the following specification.
The user supplies two refs upstream and head. upstream should be suitable for export as a
.orig.tar.gz
file [6], and it should be an ancestor of head.At source package build time, we want to construct a series of patches that
- Is guaranteed to apply to upstream
- Produces the same work tree as head, outside
./debian
- Does not touch
./debian
- As much as possible, matches the git history from upstream to head.
Condition (4) suggests we want something roughly like git
format-patch
upstream..head, removing those patches which are
only about Debian packaging. Because of (3), we have to be a bit
careful about commits that touch upstream and ./debian
. We also
want to avoid outputting patches that have been applied (or worse
partially applied) upstream. git patch-id
can help identify
cherry-picked patches, but not partial application.
Eventually I arrived at the following strategy.
Use git-filter-branch to construct a copy of the history upstream..head with ./debian (and for technical reasons .pc) excised.
Filter these commits to remove e.g. those that are present exactly upstream, or those that introduces no changes, or changes unrepresentable in a patch.
Try to revert the remaining commits, in reverse order. The idea here is twofold. First, a patch that occurs twice in history because of merging will only revert the most recent one, allowing earlier copies to be skipped. Second, the state of the temporary branch after all successful reverts represents the difference from upstream not accounted for by any patch.
Generate a "fixup patch" accounting for any remaining differences, to be applied before any if the "nice" patches.
Cherry-pick each "nice" patch on top of the fixup patch, to ensure we have a linear history that can be exported to quilt. If any of these cherry-picks fail, abort the export.
Yep, it seems over-complicated to me too.
TL;DR: Show me the code.
You can clone my current version from
git://pivot.cs.unb.ca/gitpkg.git
This provides a script "git-debcherry" which does the history linearization discussed above. In order to test out how/if this works in your repository, you could run
git-debcherry --stat $UPSTREAM
For actual use, you probably want to use something like
git-debcherry -o debian/patches
There is a hook in hooks/debcherry-deb-export-hook
that does this at
source package export time.
I'm aware this is not that fast; it does several expensive operations. On the other hand, you know what Don Knuth says about premature optimization, so I'm more interested in reports of when it does and doesn't work. In addition to crashing, generating multi-megabyte "fixup patch" probably counts as failure.
Notes
This first part doesn't seem too Debian or git specific to me, but I don't know much concrete about other packaging workflows or other version control systems.
Another variation is to have a patched upstream branch and merge that into the Debian packaging branch. The trade-off here that you can simplify the patch export process a bit, but the repo needs to have taken this disciplined approach from the beginning.
git-dpm merges the patched upstream into the Debian branch. This makes the history a bit messier, but seems to be more robust. I've been thinking about trying this out (semi-manually) for gitpkg.
See e.g. exporting. Although I did not then know the many surprising and horrible things people do in packaging histories, so it probably didn't work as well as I thought it did.
It's doable, but one ends up spending about a bunch lines of code on duplicating basic git functionality; e.g. there is no real support for tags of notes.
Since as far as I know quilt has no way of deleting files except to list the content, this means in particular exporting upstream should yield a DFSG Free source tree.