racket (previously known as plt-scheme) is an
interpreter/JIT-compiler/development environment with about 6 years of
subversion history in a converted git repo. Debian packaging has been
done in subversion, with only the contents of ./debian
in version
control. I wanted to merge these into a single git repository.
The first step is to create a repo and fetch the relevant history.
TMPDIR=/var/tmp
export TMPDIR
ME=`readlink -f $0`
AUTHORS=`dirname $ME`/authors
mkdir racket && cd racket && git init
git remote add racket git://git.racket-lang.org/plt
git fetch --tags racket
git config merge.renameLimit 10000
git svn init --stdlayout svn://svn.debian.org/svn/pkg-plt-scheme/plt-scheme/
git svn fetch -A$AUTHORS
git branch debian
A couple points to note:
At some point there were huge numbers of renames when then the project renamed itself, hense the setting for
merge.renameLimit
Note the use of an authors file to make sure the author names and emails are reasonable in the imported history.
git svn creates a branch master, which we will eventually forcibly overwrite; we stash that branch as
debian
for later use.
Now a couple complications arose about upstream's git repo.
Upstream releases seperate source tarballs for unix, mac, and windows. Each of these is constructed by deleting a large number of files from version control, and occasionally some last minute fiddling with README files and so on.
The history of the release tags is not completely linear. For example,
rocinante:~/projects/racket (git-svn)-[master]-% git diff --shortstat v4.2.4 `git merge-base v4.2.4 v5.0`
48 files changed, 242 insertions(+), 393 deletions(-)
rocinante:~/projects/racket (git-svn)-[master]-% git diff --shortstat v4.2.1 `git merge-base v4.2.1 v4.2.4`
76 files changed, 642 insertions(+), 1485 deletions(-)
The combination made my straight forward attempt at constructing a history synched with release tarballs generate many conflicts. I ended up importing each tarball on a temporary branch, and the merges went smoother. Note also the use of "git merge -s recursive -X theirs" to resolve conflicts in favour of the new upstream version.
The repetitive bits of the merge are collected as shell functions.
import_tgz() {
if [ -f $1 ]; then
git clean -fxd;
git ls-files -z | xargs -0 rm -f;
tar --strip-components=1 -zxvf $1 ;
git add -A;
git commit -m'Importing '`basename $1`;
else
echo "missing tarball $1";
fi;
}
do_merge() {
version=$1
git checkout -b v$version-tarball v$version
import_tgz ../plt-scheme_$version.orig.tar.gz
git checkout upstream
git merge -s recursive -X theirs v$version-tarball
}
post_merge() {
version=$1
git tag -f upstream/$version
pristine-tar commit ../plt-scheme_$version.orig.tar.gz
git branch -d v$version-tarball
}
The entire merge script is here. A typical step looks like
do_merge 5.0
git rm collects/tests/stepper/automatic-tests.ss
git add `git status -s | egrep ^UA | cut -f2 -d' '`
git checkout v5.0-tarball doc/release-notes/teachpack/HISTORY.txt
git rm readme.txt
git add collects/tests/web-server/info.rkt
git commit -m'Resolve conflicts from new upstream version 5.0'
post_merge 5.0
Finally, we have the comparatively easy task of merging the upstream
and Debian branches. In one or two places git was confused by all of
the copying and renaming of files and I had to manually fix things up
with git rm
.
cd racket || /bin/true
set -e
git checkout debian
git tag -f packaging/4.0.1-2 `git svn find-rev r98`
git tag -f packaging/4.2.1-1 `git svn find-rev r113`
git tag -f packaging/4.2.4-2 `git svn find-rev r126`
git branch -f master upstream/4.0.1
git checkout master
git merge packaging/4.0.1-2
git tag -f debian/4.0.1-2
git merge upstream/4.2.1
git merge packaging/4.2.1-1
git tag -f debian/4.2.1-1
git merge upstream/4.2.4
git merge packaging/4.2.4-2
git rm collects/tests/stxclass/more-tests.ss && git commit -m'fix false rename detection'
git tag -f debian/4.2.4-2
git merge -s recursive -X theirs upstream/5.0
git rm collects/tests/web-server/info.rkt
git commit -m 'Merge upstream 5.0'