Make a directory lab2-scratch somewhere
outside the cs2613 git repository you created earlier.
Now move to the lab2-scratch directory, and make a clone of the
central repo.
$ git clone -b main https://$username@vcs.cs.unb.ca/git/cs2613-$username cs2613-clone
This creates a new directory "cs2613-clone" containing a clone
of your repository. Notice that in general this is a good way of
checking that your work is properly submitted. The TA and Prof will
do exactly this cloning step in order to mark your work. The clone
is on an equal footing with the original project, possessing its
own copy of the original project’s history.
Sharing changes with a central repo
Optional, but useful if you plan to run git on your own computer
Open a terminal
Navigate to the ~/lab2-scratch/cs2613-clone/journal directory.
create a new blog entry, and commit it.
Push your changes back to the central repository.
$ git push origin main
Change directory to your original directory
~/cs2613. Bring in the changes you made
$ git pull origin main
This merges the changes from the central copy of "main" branch
into the current branch. If you made other changes in the
meantime, then you may need to manually fix any conflicts.
The "pull" command thus performs two operations: it fetches changes from
a remote branch, then merges them into the current branch.
Questions
Here are some questions we will be discussing at the beginning of L03.
What is a remote?
What is merging?
What is a conflict?
Git next steps
Congratulations, you now know enough git to finish this course.
There is lots more to learn, particularly about branching and
collaboration. If you want to get a big(ger) picture view, a good
place to start is
Git Concepts Simplified.
make a directory labs/L03inside your ~/cs2613 git repository
All of your work from today should be committed in that directory
(and pushed before you leave).
The DrRacket stepper
Time
30 min
Activity
Individual work
copy arith.rkt, save it as
~/cs2613/labs/L03/arith.rkt and commit it.
run your arith.rkt in DrRacket; observe that the test fails
run your arith.rkt in the DrRacket Stepper. Notice that
after about 7 steps, we have reduced the test case to multiplication by zero which looks wrong. That means that the problem is
with the recursive (reduction) step.
Fix the a > 0 case of my-* to match the following
formula a * b = (a - 1) * b + b, while still using my-+
DrRacket should report "The test passed!" when you have it
working. Commit this version of your code. Remember that git
commit message quality counts in this course, so work on making commit
messages.
Semantics
Time
25 min
Activity
Small Groups
Summary
new evaluation rules for and and or
As you read in FICS unit
3,
we can understand evaluation ("running") of Racket programs as a
sequence of "reductions" or "substitutions". These rules are similar to the reduction steps in the DrRacket stepper.
The stepper uses the following rules for and and or
(notice that these rules enforce short circuit evaluation)
Following Exercise 7, write a new set of rules
that requires at least two arguments for and and or.
The rules are for human consumption; you can write them
as comments in DrRacket. You can write "exp1 exp2 ..." to mean at least 2 expressions.
Discuss your answers with a your group, and try a couple evaluation
small examples by hand using your rules.
Test Coverage
Time
25 min
Activity
Individual work
Unit testing is an important part of programming, and has inspired
something called
test driven development.
This activity continues arith.rkt from the first half of the lab.
Under
Language -> Choose Language -> Show Details -> Dynamic Properties
enable
⊙ Syntactic test suite coverage
comment out the given test
run your code in DrRacket again
most likely you will have some code highlighted with orange text and
black foreground. This means that code is not covered by your test
suite. Uncomment the given test, and if needed add tests to cover
each piece of uncovered code.
When you have complete test coverage, commit your code.
In this course, for all racket assignments you will lose marks if
you don't have complete test coverage.
See if you can come up with answers to the following questions for next time.
The programming languages we will study this term are all
dynamically typed. This means that not only the value but also the
type of variables can change at runtime. Why does this make testing
even more important?
What kind of software problems is testing not well suited to find?
Why might mutable state (e.g. instance variables in Java) make
writing unit tests harder?
Having at look at our test post from Lab 1, we can observe the
post template adds a bunch of stuff related to social media. Let's
suppose that for our cs2613 journal we want a more minimal
look.
Start by finding the right files to edit with
$ git grep disqus
git grep is a very useful (and
fast!) tool to find occurrences of strings in your git
repository. Notice in the output there are some files created by
frog; we will clean those up later. Find the template file (under
_src) and edit it to remove the undesired social media links. Check
the results with
$ raco frog -bp
You might notice one more link to twitter in a different template
file. Feel free to remove that one as well.
You are now ready to
commit. You can see what is about to be committed using git diff
with the --cached option:
$ git diff --cached
(Without --cached, git diff will show you any changes that you’ve made
but not yet added to the index.) You can also get a brief summary of the
situation with git status:
$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# modified: file1
# modified: file2
# modified: file3
#
It’s a good idea to
begin the commit message with a single short (less than 50 character)
line summarizing the change, followed by a blank line and then a more
thorough description. The text up to the first blank line in a commit
message is treated as the commit title, and that title is used
throughout Git.
If you need to make any further adjustments, do so now, and then add any
newly modified content to the index. Finally, commit your changes with:
$ git commit
This will again prompt you for a message describing the change, and then
record a new version of the project.
Alternatively, instead of running git add beforehand, you can use
$ git commit -a
which will automatically notice any modified (but not new) files, add
them to the index, and commit, all in one step. Keep in mind that you
will be marked on the logical structure of your git commits,
so you are better off using git add to explicitely choose what
changes to commit.
Cleaning up generated files
Time
15 minutes
Activity
Individual work
Summary
Get some practice commiting your changes to git.
A common phenomenon in software development is the existence of
generated files. These files are created by some tool, typically
based on some source files. In general it is a bad idea to track
generated files in version control because they introduce spurious
changes into history. We'll look at this more later, but for now
let's try to clean up. We can find out what files are generated
.e.g. by consulting the
frog docs. Let's
first try a shortcut. Run
$ cd ~/cs2613/journal
$ raco frog --clean
To find out what changed, run
$ git diff --stat
All going well, you will see a bunch of deletions. We can tell git to
make those deletions permanent in several ways. It turns out that
there is a handy option to git add
that tells git to stage all of the changes in the
working tree.
Try to figure out which option it is.
to see the effect of a git add command, run git diff --cached --stat
to undo the effect of a git add command, run git reset.
When you are satisfied with the changes, run git commit.
It will turn out that this is not all of the generated files; we can
use git rm to clean up further as we find more.
Make sure you run
$ raco frog -bp
To make sure the the blog still works after your deletions.
Viewing project history
Time
15 minutes
Activity
Small group discussion, presenting work to the group, peer feedback
Summary
Reinforce idea of commit message quality
At any point you can view the history of your changes using
$ git log
Use this command to verify that all the changes you expected to be
pushed to the server really were.
If you also want to see complete diffs at each step, use
$ git log -p
Often the overview of the changes is useful to get a feel for each step
$ git log --stat --summary
In most projects, you have to share commit messages to (at least)
the same people who view your source code. Share your "best commit"
message with one or more of your neighbours.
Find something positive to say about the other commit messages you
are reading.
Find a constructive improvement with one of the other
messages. Don't be mean, people have varying levels of experience
with this.
Getting started with racket
Hello Racket World
Time
15 Minutes
Activity
Group walkthrough
Open a terminal
Make a directory ~/cs2613/labs/L02 (if it does not already exist) and change there with cd
Save the following code to ~/cs2613/labs/L02/hello.rkt
#lang htdp/bsl
"hello world"(*6 7)
Command Line
run the program with
$ racket hello.rkt
What is the difference between the first and second line of output?
Referring to the
DrRacket documentation
as needed, open and run the hello.rkt program from the previous
part.
What is the meaning of the first line of the file?
Racket Expressions.
Time
20 minutes
Activity
Small groups
Form groups of 2 or 3.
Work through Exercise 0: for each of the following lines of code, someone from the group should guess what is wrong before trying it in the Interactions window of DrRacket.
Work through Exercise
1. In
general where the text asks you to predict the value of an expression,
you can also enter into the definitions window something like
(check-expect (* 6 9) 42)
When you run (e.g. F5), then you will find out if your prediction
is correct.
Work through Exercise
2. You can test your answers by applying the following functions to appropriate arguments. Notice that the value of y does not matter.
#lang htdp/bsl
(define y 18)(define(t1 x)(or(= x 0) (<0(/ y x))))(define(t2 x)(or(<0(/ y x)) (= x 0)))(define(t3 x)(and(= x 0) (<0(/ y x))))(define(t4 x)(or(<0(/ y x)) (not(= x 0))))
Make a directory lab2-scratch somewhere
outside the cs2613 git repository you created earlier.
Now move to the lab2-scratch directory, and make a clone of the
central repo.
$ git clone -b main https://$username@vcs.cs.unb.ca/git/cs2613-$username cs2613-clone
This creates a new directory "cs2613-clone" containing a clone
of your repository. Notice that in general this is a good way of
checking that your work is properly submitted. The TA and Prof will
do exactly this cloning step in order to mark your work. The clone
is on an equal footing with the original project, possessing its
own copy of the original project’s history.
Sharing changes with a central repo
Optional, but useful if you plan to run git on your own computer
Open a terminal
Navigate to the ~/lab2-scratch/cs2613-clone/journal directory.
create a new blog entry, and commit it.
Push your changes back to the central repository.
$ git push origin main
Change directory to your original directory
~/cs2613. Bring in the changes you made
$ git pull origin main
This merges the changes from the central copy of "main" branch
into the current branch. If you made other changes in the
meantime, then you may need to manually fix any conflicts.
The "pull" command thus performs two operations: it fetches changes from
a remote branch, then merges them into the current branch.
Questions
Here are some questions we will be discussing at the beginning of L03.
What is a remote?
What is merging?
What is a conflict?
Git next steps
Congratulations, you now know enough git to finish this course.
There is lots more to learn, particularly about branching and
collaboration. If you want to get a big(ger) picture view, a good
place to start is
Git Concepts Simplified.
Before every lab in this course, you will be given tasks to
complete. These will generally be easy tasks like watching videos, but
you need to complete them in order to keep up with the class.
Command Line Familiarity Check
In an FCS linux lab (remote or locally) log in, and open a
terminal.
Create a file in that directory using one of the available text editors
Now clean up, removing the file and the directory.
If any of this was new to you, then please take the time to go through parts 1 to 5 of the
Learning the Shell
Tutorial.
Read the course syllabus.
The Course Syllabus is available on line. Please
read it, and bring any questions you have about it to the first lab.
Background Reading
For every lab there will be some related reading. I'll point these out
as we go through the lab, but I'll also collect them at the start of
the lab in case you want to get a head start (or refer back to them
later).
Install frog and the other tools we need via the following command (throughout the course, $ at the beginning of a line will
indicate a shell prompt, you don't need to type it.):
$ raco pkg install --auto unb-cs2613
Install the python library pygments, used by frog for syntax highlighting
We'll be using frog frog to keep a journal of what we learn in
this course.
Open a terminal.
Make a directory called cs2613 that will keep all of your work
(note that case and spaces matter in Linux and this is not the same
as CS2613 or CS 2613. For the rest of this course we will assume
it is directly under your home directory. The shortcut ~/cs2613
will refer to this directory in the lab texts and in the shell.
Make a directory journal inside ~/cs2613. Inside ~/cs2613/journal, run
$ raco frog --init
Try viewing the newly created blog with
$ raco frog -bp
Start a new blog page for today's lab, and delete the fake
entry created by the frog setup command. Note that you may have to
refresh the browser after running raco frog -bp.
Setting up a git repo
Time
30 minutes
Activity
Individual work
Summary
This is where we create the git repository used for
the rest of the term to hand things in.
Change to ~/cs2613. This directory should have one
subdirectory called journal, which has the results of your
experiments above with frog.
Create the git repository
$ git init -b main
Git will reply with something like
Initialized empty Git repository in /home1/ugrads/$username/cs2613/.git/
You’ve now initialized the working directory — you may notice a new
directory created, named ".git". You should mentally replace
"$username" with whatever the login name is that you use to log
into the FCS linux machines.
Read the git-quickref page, and follow the initial configuration steps there.
note that the "--wait" option for gedit is important here.
Next, tell Git to take a snapshot of the contents of all files under
the journal, with git add:
$ git add journal
Notes
Many revision control systems provide an add command that tells the
system to start tracking changes to a new file. Git’s add command does
something simpler and more powerful: git add is used both for new and
newly modified files, and in both cases it takes a snapshot of the given
files and stages that content in the index, ready for inclusion in the
next commit.
This snapshot is now stored in a temporary staging area which Git calls
the "index". You can permanently store the contents of the index in the
repository with git commit:
$ git commit
This will open and editor and prompt you for a commit message. Enter
one and exit the editor. You’ve now stored the first version of your
project in Git. See
§5.2 of Pro Git
for some hints about writing good commit messages. As mentioned in the
class git policy, you will be marked on your commit messages, so
you may as well get started with good habits.
Pushing to a central repo
Summary
Learn how to upload your work to a server
Time
20 minutes
Activity
Individual work
Notes
You absolutely have to understand this before continuing in the course,
since all marks in the course will be based on work pushed to the coursegit repos.
Since we are using the FCS git repositories there is an
existing repository for all students who registered early enough. If
it turns out there is no repository for you, you may need to do the
last step later.
First add the remote. This something like a nickname for the URL where the
repo will be stored.
Each journal entry should be a minumum of 500 words and a (rough)
maximum of 1000
You can think about your journal entry as a set of notes for a
friend who missed this particular lab.
Your journal entry should answer the following questions
What new concepts (if any) did you learn about in this lab?
What concepts are familiar from other courses or from your own
knowledge?
What new skills did you practice?
What specific details did you find surprising, interesting,
confusing, difficult, or otherwise important?
What explicit tasks (e.g. reading) were you given during this lab?
You are encouraged to link to other pages both inside and outside
UNB in your journal entries.
Presentation
Imagine a future employer reading your journal right before
interviewing you. Write so that the person interviewing will think
of you as a peer, rather than as an "annoying kid".
Your journal entry should use good spelling and grammar, including
complete sentences and paragraphs.
A certain amount of point form is OK, but don't rely on it exclusively.
Try to keep a neutral tone. It's fine to record positive or negative
opinions, but avoid ranting (or gushing).
This journal entry must be in the standard directory, and
must be named <date>-<title>.md or <date>-<title>.scrbl where <date> is the
date of the corresponding lab. Any easy way to ensure this is to
create the journal entry during the lab.
Make sure you preview your journal entry to avoid obvious mistakes.
You will lose marks for syntax errors.
A good reference for git (other than the man pages) is
The Pro Git Book.
Initial configuration
It is a good idea to introduce yourself to Git with your name and
email address before doing any operation. The easiest way to do so is:
$ git config user.name "Your Name Comes Here"
$ git config user.email you@yourdomain.example.com
You may also want to configure an editor to use with git. The default
on fcs-cs2613-dev is
nano; most other
places it is vim. Both are very fast to start up,
but completely keyboard driven. If that doesn't suit you, you can
configure the editor via
$ git config core.editor <something>
On the options for <something> include
nano
vim
emacs
"gedit --wait"
You can optionally used git config --global, but note this probably
won't work in an FCS VM.
A tutorial on git will be offered in the first two labs of the
course (L01 and L02) Like all material presented
in the labs, you are responsible for this material whether you
attend or not. This includes people who register late for the course.
Some hints about using git are available, including pointers to
more documentation.
Technical difficulties with git will not be considered a valid
excuse for late or missed work unless they affect the entire
class (e.g. server downtime).
You will be marked on the quality of your git repository.
Access to the FCS git repos is available from
all machines in FCS Linux Labs. Be aware of
scheduled use of these labs when planning to work on or hand in
coursework.
Remote access (either via lab machines or directly) to FCS git
repos is available if you are connected to the
UNB VPN.
The big picture is as follows. In my view, the most natural way to
work on a packaging project in version control [1] is to have an
upstream branch which either tracks upstream Git/Hg/Svn, or imports of
tarballs (or some combination thereof, and a Debian branch where both
modifications to upstream source and commits to stuff in ./debian are
added [2]. Deviations from this are mainly motivated by a desire to
export source packages, a version control neutral interchange format
that still preserves the distinction between upstream source and
distro modifications. Of course, if you're happy with the distro
modifications as one big diff, then you can stop reading now gitpkg
$debian_branch $upstream_branch and you're done. The other easy case
is if your changes don't touch upstream; then 3.0 (quilt) packages
work nicely with ./debian in a separate tarball.
So the tension is between my preferred integration style, and making
source packages with changes to upstream source organized in some
nice way, preferably in logical patches like, uh, commits in a
version control system. At some point we may be able use some form of
version control repo as a source package, but the issues with that are
for another blog post. At the moment then we are stuck with
trying bridge the gap between a git repository and a 3.0 (quilt)
source package. If you don't know the details of Debian packaging,
just imagine a patch series like you would generate with git
format-patch or apply with (surprise) quilt.
From Git to Quilt.
The most obvious (and the most common) way to bridge the gap between
git and quilt is to export patches manually (or using a helper like
gbp-pq) and commit them to the packaging repository. This has the
advantage of not forcing anyone to use git or specialized helpers to
collaborate on the package. On the other hand it's quite far from the
vision of using git (or your favourite VCS) to do the integration that
I started with.
The next level of sophistication is to maintain a branch of
upstream-modifying commits. Roughly speaking, this is the approach
taken by git-dpm, by gitpkg, and with some additional friction
from manually importing and exporting the patches, by gbp-pq. There
are some issues with rebasing a branch of patches, mainly it seems to
rely on one person at a time working on the patch branch, and it
forces the use of specialized tools or workflows. Nonetheless, both
git-dpm and gitpkg support this mode of working reasonably well [3].
Lately I've been working on exporting patches from (an immutable) git
history. My initial experiments with marking commits with git notes
more or less worked [4]. I put this on the back-burner for two
reasons, first sharing git notes is still not very well supported by
git itself [5], and second Gitpkg maintainer Ron Lee convinced me to
automagically pick out what patches to export. Ron's motivation (as I
understand it) is to have tools which work on any git repository
without extra metadata in the form of notes.
Linearizing History on the fly.
After a few iterations, I arrived at the following specification.
The user supplies two refs upstream and head. upstream should
be suitable for export as a .orig.tar.gz file [6], and it should
be an ancestor of head.
At source package build time, we want to construct a series of
patches that
Is guaranteed to apply to upstream
Produces the same work tree as head, outside ./debian
Does not touch ./debian
As much as possible, matches the git history from upstream to head.
Condition (4) suggests we want something roughly like git
format-patchupstream..head, removing those patches which are
only about Debian packaging. Because of (3), we have to be a bit
careful about commits that touch upstream and./debian. We also
want to avoid outputting patches that have been applied (or worse
partially applied) upstream. git patch-id can help identify
cherry-picked patches, but not partial application.
Eventually I arrived at the following strategy.
Use git-filter-branch to construct a copy of the history
upstream..head with ./debian (and for technical reasons .pc)
excised.
Filter these commits to remove e.g. those that are present
exactly upstream, or those that introduces no changes, or changes
unrepresentable in a patch.
Try to revert the remaining commits, in reverse order. The idea
here is twofold. First, a patch that occurs twice in history
because of merging will only revert the most recent one, allowing
earlier copies to be skipped. Second, the state of the temporary
branch after all successful reverts represents the difference
from upstream not accounted for by any patch.
Generate a "fixup patch" accounting for any remaining
differences, to be applied before any if the "nice" patches.
Cherry-pick each "nice" patch on top of the fixup patch, to
ensure we have a linear history that can be exported to quilt. If
any of these cherry-picks fail, abort the export.
Yep, it seems over-complicated to me too.
TL;DR: Show me the code.
You can clone my current version from
git://pivot.cs.unb.ca/gitpkg.git
This provides a script "git-debcherry" which does the history
linearization discussed above. In order to test out how/if this works
in your repository, you could run
git-debcherry --stat $UPSTREAM
For actual use, you probably want to use something like
git-debcherry -o debian/patches
There is a hook in hooks/debcherry-deb-export-hook that does this at
source package export time.
I'm aware this is not that fast; it does several expensive
operations. On the other hand, you know what Don Knuth says about
premature optimization, so I'm more interested in reports of when it
does and doesn't work. In addition to crashing, generating
multi-megabyte "fixup patch" probably counts as failure.
Notes
This first part doesn't seem too Debian or git specific to me, but
I don't know much concrete about other packaging workflows or other
version control systems.
Another variation is to have a patched upstream branch and merge
that into the Debian packaging branch. The trade-off here that you can
simplify the patch export process a bit, but the repo needs to have
taken this disciplined approach from the beginning.
git-dpm merges the patched upstream into the Debian branch. This
makes the history a bit messier, but seems to be more robust. I've
been thinking about trying this out (semi-manually) for gitpkg.
See e.g. exporting. Although I did not then
know the many surprising and horrible things people do in packaging
histories, so it probably didn't work as well as I thought it did.
It's doable, but one ends up spending about a bunch lines of code
on duplicating basic git functionality; e.g. there is no real support
for tags of notes.
Since as far as I know quilt has no way of deleting files except to
list the content, this means in particular exporting upstream should yield a
DFSG Free source tree.
I've been experimenting with a new packaging tool/workflow based on
marking certain commits on my integration branch for export as quilt
patches. In this post I'll walk though converting the package nauty to
this workflow.
Add a control file for the gitpkg export hook, and enable the hook:
(the package is already 3.0 (quilt))
More conventional git-buildpackage style packaging would not need this step.
Import the patches. If everything is perfect, you can use qit
quiltimport, but I have several patches not listed in "series", and
quiltimport ignores series, so I have to do things by hand.
% git am /tmp/nauty/debian/patches/feature/shlib.diff
The first line is the subject line of the patch, followed by any
notes from debpatch (in this case, just 'Export: true'), followed
by a diffstat. If more patches were marked, this would be repeated
for each one.
In this case I notice subject line is kindof cryptic and decide to amend.
git commit --amend
git debpatch list still shows the same thing, which highlights a
fundemental aspect of git notes: they attach to commits. And I just
made a new commit, so
git debpatch -export afb2c20
git debpatch +export HEAD
Now git debpatch list looks ok, so we try git debpatch export as
a dry run. In debian/patches we have
0001-makefile.in-Support-building-a-shared-library-and-st.patch
series
That looks good. Now we are not going to commit this, since one of
our overall goal is to avoid commiting patches.
To clean up the export, rm -rf debian/patches
gitpkg master exports a source package, and because I enabled the
appropriate hook, I have the following
Example package: bibutils In this package, I was already maintaining the upstream patches merged into my master branch; I retroactively added the quilt export.
As of version 0.17, gitpkg ships with a hook called
quilt-patches-deb-export-hook. This can be used to export patches from
git at the time of creating the source package.
This is controlled by a file debian/source/git-patches.
Each line contains a range suitable for passing to git-format-patch(1).
The variables UPSTREAM_VERSION and DEB_VERSION are replaced with
values taken from debian/changelog. Note that $UPSTREAM_VERSION is
the first part of $DEB_VERSION
This tells gitpkg to export the given two ranges of commits to
debian/patches while generating the source package. Each commit
becomes a patch in debian/patches, with names generated from the
commit messages. In this example, we get 5 patches from the two ranges.
Thanks to the wonders of 3.0 (quilt) packages, these are applied
when the source package is unpacked.
Caveats.
Current lintian
complains bitterly about debian/source/git-patches. This should be fixed
with the next upload.
It's a bit dangerous if you checkout such package from git, don't
read any of the documentation, and build with debuild or something
similar, since you won't get the patches applied. There is a
proposed
check
that catches most of such booboos. You could also cause the build to
fail if the same error is detected; this a matter of personal taste
I guess.
I recently decided to try maintaining a Debian package (bibutils)
without committing any patches to Git. One of the disadvantages of
this approach is that the patches for upstream are not nicely sorted
out in ./debian/patches. I decided to write a little tool to sort out
which commits should be sent to upstream. I'm not too happy about the
length of it, or the name "git-classify", but I'm posting in case
someone has some suggestions. Or maybe somebody finds this useful.
#!/usr/bin/perluse strict;my$upstreamonly=0;if($ARGV[0]eq"-u"){$upstreamonly=1;shift(@ARGV);}open(GIT,"git log -z --format=\"%n%x00%H\"--name-only@ARGV|");# throw away blank line at the beginning.$_=<GIT>;my$sha="";
LINE:while(<GIT>){chomp();next LINE if(m/^\s*$/);if(m/^\x0([0-9a-fA-F]+)/){$sha=$1;}else{my$debian=0;my$upstream=0;foreachmy$word(split("\x00",$_) ) {if($word=~m@^debian/@) {$debian++;}elsif(length($word)>0) {$upstream++;}}if(!$upstreamonly){print"$sha\t";print"MIXED"if($upstream>0&&$debian>0);print"upstream"if($upstream>0&&$debian==0);print"debian"if($upstream==0&&$debian>0);print"\n";}else{print"$sha\n"if($upstream>0&&$debian==0);}}}=pod=head1 Namegit-classify - Classify commits as upstream, debian, or MIXED=head1 Synopsis=over=item B<git classify> [I<-u>] [I<arguments for git-log>]=back=head1 DescriptionClassify a range of commits (specified as for git-log) as I<upstream>(touching only files outside ./debian), I<debian> (touching files onlyinside ./debian) or I<MIXED>. Presumably these last kind are to bediscouraged.=head2 Options=over=item B<-u> output only the SHA1 hashes of upstream commits (as defined above).=back=head1 ExamplesGenerate all likely patches to send upstream git classify -u $SHA..HEAD | xargs -L1 git format-patch -1
racket (previously known as plt-scheme) is an
interpreter/JIT-compiler/development environment with about 6 years of
subversion history in a converted git repo. Debian packaging has been
done in subversion, with only the contents of ./debian in version
control. I wanted to merge these into a single git repository.
The first step is to create a repo and fetch the relevant history.
At some point there were huge numbers of renames when then the project renamed itself, hense the setting for merge.renameLimit
Note the use of an authors file to make sure the author names and emails are reasonable in the imported history.
git svn creates a branch master, which we will eventually forcibly overwrite; we stash that branch as debian for later use.
Now a couple complications arose about upstream's git repo.
Upstream releases seperate source tarballs for unix, mac, and windows. Each of these is constructed by deleting a large number of files from version control, and
occasionally some last minute fiddling with README files and so on.
The history of the release tags is not completely linear. For example,
The combination made my straight forward attempt at constructing a
history synched with release tarballs generate many conflicts. I
ended up importing each tarball on a temporary branch, and the merges
went smoother. Note also the use of "git merge -s recursive -X
theirs" to resolve conflicts in favour of the new upstream version.
The repetitive bits of the merge are collected as shell functions.
The entire merge script is here. A typical step looks like
do_merge 5.0
git rm collects/tests/stepper/automatic-tests.ss
git add `git status -s | egrep ^UA | cut -f2 -d' '`
git checkout v5.0-tarball doc/release-notes/teachpack/HISTORY.txt
git rm readme.txt
git add collects/tests/web-server/info.rkt
git commit -m'Resolve conflicts from new upstream version 5.0'
post_merge 5.0
Finally, we have the comparatively easy task of merging the upstream
and Debian branches. In one or two places git was confused by all of
the copying and renaming of files and I had to manually fix things up
with git rm.
I'm thinking about distributed issue tracking systems that play nice
with git. I don't care about other version control systems anymore :).
I also prefer command line interfaces, because as commentators on the
blog have mentioned, I'm a Luddite (in the imprecise, slang sense).
So far I have found a few projects, and tried to guess how much of a
going concern they are.
Git Specific
ticgit I don't know if
this github at its best or worst, but the original project seems
dormant and there are several forks. According the original author,
this one is probably the
best.
git-issues Originally a
rewrite of ticgit in python, it now claims to be defunct.
VCS Agnostic
ditz Despite my not caring about other
VCSs, ditz is VCS agnostic, just making files. Seems active.
cil takes a similar approach to
ditz, is written in Perl rather than Ruby, and should release again
any day now (hint, hint).
You have a gitolite install on
host $MASTER, and you want a mirror on $SLAVE. Here is one way to do
that. $CLIENT is your workstation, that need not be the same as
$MASTER or $SLAVE.
On $CLIENT, install gitolite on $SLAVE. It is ok to re-use your
gitolite admin key here, but make sure you have both public and
private key in .ssh, or confusion ensues. Note that when gitolite
asks you to double check the "host gitolite" ssh stanza, you probably
want to change hostname to $SLAVE, at least temporarily (if not, at
least the checkout of the gitolite-admin repo will fail) You may want
to copy .gitolite.rc from $MASTER when gitolite fires up an editor.
On $CLIENT copy the "gitolite" stanza of .ssh/config to gitolite-mirror to
a stanza called e.g. gitolite-slave
fix the hostname of the gitolite stanza so it points to $MASTER again.
On $MASTER, as gitolite user, make passphraseless ssh-key. Probably
you should call it something like 'mirror'
Still on $MASTER. Add a stanza like the following to $gitolite_user/.ssh/config
to the bottom of your gitolite.conf
Add mirror.pub to keydir.
Now overwrite the gitolite-admin repo on $SLAVE
git push -f
Note that empty repos will be created on $SLAVE for every
repo on $MASTER.
The following one line post-update hook to any repos you want
mirrored (see the gitolite documentation for how to automate this)
You should not modify the post update hook of the gitolite-admin
repo.
git push --mirror gitolite-mirror:$GL_REPO.git
Create repos as per normal in the gitolite-admin/conf/gitolite.conf.
If you have set the auto post-update hook installation, then each repo
will be mirrored. You should only push to $MASTER; any changes
pushed to $SLAVE will be overwritten.
I wanted to see how bundles worked, and if there was some potential
for speedup versus mr. The following unoptimized script is about
twice as fast as mr in updating 10 repos. Of course is not really
doing exactly the right thing (since it only looks at HEAD),
but it is a start maybe. Of course, maybe the performance difference
has nothing to do with bundles. Anyway IPC::PerlSSH is nifty.
#!/usr/bin/perluse strict;use File::Slurp;use IPC::PerlSSH;use Git;my%config;eval(read_file('config.pl'));die$@if$@;my$ips= IPC::PerlSSH->new(Host=>$config{host});$ips->eval("use Git; use File::Temp qw (tempdir); use File::Slurp;");$ips->eval('${main::tempdir}=tempdir();');$ips->store("bundle",q{my$prefix=shift; my$name=shift; my$ref=shift; chomp($ref); my$repo=Git->repository($prefix.$name.'.git'); my$bfile="${main::tempdir}/${name}.bundle"; eval {$repo->command('bundle','create',$bfile,$ref.'..HEAD'); 1} or do { return undef }; my$bits=read_file($bfile); print STDERR ("got ",length($bits),"\n"); return$bits;});foreach my$pair(@{$config{repos}}){ my ($local,$remote)=@{$pair}; my$bname=$local.'.bundle';$bname=~ s|/|_|;$bname=~ s|^\.|@|; my$repo=Git->repository($config{localprefix}.$local); # force some commit to be bundled, just for testing my$head=$repo->command('rev-list','--max-count=1', 'origin/HEAD^'); my$bits=$ips->call('bundle',$config{remoteprefix},$remote,$head); write_file($bname, {binmode => ':raw'}, \$bits);$repo->command_noisy('fetch',$ENV{PWD}.'/'.$bname,'HEAD');}
So I have been getting used to madduck's
workflow
for topgit and debian packaging, and one thing that bugged me a bit
was all the steps required to to build. I tend to build quite a lot
when debugging, so I wrote up a quick and dirty script to
export a copy of the master branch somewhere
export the patches from topgit
invoke debuild
I don't claim this is anywhere ready production quality, but maybe it helps someone.
Assumptions (that I remember)
you use the workflow above
you use pristine tar for your original tarballs
you invoke the script (I call it tg-debuild) from somewhere in your work tree
You are maintaining a debian package with topgit. You have a topgit
patch against version k and it is has been merged into upstream
version m. You want to "disable" the topgit branch, so that patches
are not auto-generated, but you are not brave enough to just
tg delete feature/foo
You are brave enough to follow the instructions of a random blog post.
Checking your patch has really been merged upstream
This assumes that you tags upstream/j for version j.
git checkout feature/foo
git diff upstream/k
For each file foo.c modified in the output about, have a look at
git diff upstream/m foo.c
This kindof has to be a manual process, because upstream could easily
have modified your patch (e.g. formatting).
The semi-destructive way
Suppose you really never want to see that topgit branch again.
After I worked out the above, I realized that all I had to do was make
an explicit list of topgit branches that I wanted exported. One minor
trick is that the setting seems to have to go before the include, like this
I wanted to report a success story with
topgit which is a rather new patch queue
managment extension for git. If that sounds like
gibberish to you, this is probably not the blog entry you are looking
for.
Some time ago I decided to migrate the debian packaging of
bibutils to topgit. This is
not a very complicated package, with 7 quilt patches applied to
upstream source. Since I don't have any experience to go on, I decided
to follow Martin 'madduck' Krafft's suggestion
for workflow.
It all looks a bit complicated (madduck will be the first to agree),
but it forced me to think about which patches were intended to go
upstream and which were not. At the end of the conversion I had 4
patches that were cleanly based on upstream, and (perhaps most
importantly for lazy people like me), I could send them upstream with
tg mail. I did that, and a few days later, Chris Putnam sent me a
new upstream release incorporating all of those patches. Of course, now I have
to package this new upstream release :-).
The astute reader might complain that this is more about me developing
half-decent workflow, and Chris being a great guy, than about any
specific tool. That may be true, but one thing I have discovered
since I started using git is that tools that encourage good workflow
are very nice. Actually, before I started using git, I didn't even use
the word workflow. So I just wanted to give a public thank you to
pasky for writing topgit and to madduck for pushing it into debian,
and thinking about debian packaging with topgit.
If you want to make many ssh connections to a given host, then the
first thing you need to do is turn on multiplexing. See the
ControlPath and ControlMaster options in ssh config
Presuming that is not fast enough, then one option is to make many
parallel connections (see e.g. git-sync-experiments2). But this
won't scale very far.
In this week I consider the possibilities of running a tunneled socket to
a remote git-daemon
Of course from a security point of view this is awful, but I did it anyway,
at least temporarily.
Running my "usual" test of git pull in 15 up-to-date repos, I get 3.7s
versus about 5s with the multiplexing. So, 20% improvement, probably not
worth the trouble. In both cases I just run a shell script like
cd repo1 && git pull && cd ..
cd repo2 && git pull && cd ..
cd repo3 && git pull && cd ..
cd repo4 && git pull && cd ..
cd repo5 && git pull && cd ..
I have been thinking about ways to speed multiple remote git on the
same hosts. My starting point is
mr, which does the job, but is a
bit slow. I am thinking about giving up some generality for some
speed. In particular it seems like it ought to be possible to optimize for the two following use cases:
many repos are on the same host
mostly nothing needs updating.
For my needs, mr is almost fast enough, but I can see it getting
annoying as I add repos (I currently have 11, and mr update takes
about 5 seconds; I am already running ssh multiplexing).
I am also thinking about the needs of the Debian
Perl Modules Team, which would have over
900 git repos if the current setup was converted to one git repo per
module.
My first attempt, using perl module Net::SSH::Expect to keep an
ssh channel open can be scientifically classified as "utter fail", since
Net::SSH::Expect takes about 1 second to round trip "/bin/true".
Initial experiments using IPC::PerlSSH are more promising. The
following script grabs the head commit in 11 repos in about 0.5
seconds. Of course, it still doesn't do anything useful, but I thought
I would toss this out there in case there already exists a solution to
this problem I don't know about.
#!/usr/bin/perl
use IPC::PerlSSH;
use Getopt::Std;
use File::Slurp;
my %config;
eval( "\%config=(".read_file(shift(@ARGV)).")");
die "reading configuration failed: $@" if $@;
my $ips= IPC::PerlSSH->new(Host=>$config{host});
$ips->eval("use Git");
$ips->store( "ls_remote", q{my $repo=shift;
return Git::command_oneline('ls-remote',$repo,'HEAD');
} );
foreach $repo (@{$config{repos}}){
print $ips->call("ls_remote",$repo);
}
P.S. If you google for "mr joey hess", you will find a Kiss tribute band
called Mr. Speed, started by Joe Hess"
In a previous post I complained that
mr was too slow.
madduck pointed me to the "-j" flag, which
runs updates in parallel. With -j 5, my 11 repos update in 1.2s, so
this is probably good enough to put this project on the back burner
until I get annoyed again.
I have the feeling that the "right solution" (TM) involves running
either git-daemon or something like it on the remote host. The concept
would be to set up a pair of file descriptors connected via ssh to the
remote git-daemon, and have your local git commands talk to that pair
of file descriptors instead of a socket. Alas, that looks like a bit
of work to do, if it is even possible.
To convert an svn repository containing only "/debian" to something
compatible with git-buildpackage, you need to some work. Luckily
zack
already figured out how.
#
package=bibutils
version=3.40
mkdir $package
cd $package
git-svn init --stdlayout --no-metadata svn://svn.debian.org/debian-science/$package
git-svn fetch
# drop upstream branch from svn
git-branch -d -r upstream
# create a new upstream branch based on recipe from zack
#
git-symbolic-ref HEAD refs/heads/upstream
git rm --cached -r .
git commit --allow-empty -m 'initial upstream branch'
git checkout -f master
git merge upstream
git-import-orig --pristine-tar --no-dch ../tarballs/${package}_${version}.orig.tar.gz
If you forget to use --authors-file=file then you can fix up your
mistakes later with something like the following. Note that after
some has cloned your repo, this makes life difficult for them.
Here is a script I wrote that seems to do the trick
#!/bin/sh
package=$1
stage=$1.from-svn
set -x
# my debian packages live under $SVNROOT/debian, with layout 2
mkdir $stage
cd $stage
git-svn init --no-metadata \
--trunk $SVNROOT/debian/trunk/$package \
--branches $SVNROOT/debian/branches/upstream/$package \
--tags $SVNROOT/debian/tags/$package
git-svn fetch
git branch -r upstream current
cd ..
# git clone --bare loses some gunk from git-svn. Anyway we need a bare repo
git clone --bare $stage $1.git
rm -rf $stage
Your mileage may vary of course.
UPDATED Apparently 'git branch -r upstream current' no longer
works, if it ever did. If anyone can psychically figure out what I
wanted to do there, I'm happy to translate that into git.