Git

From Organic Design wiki
Revision as of 18:01, 2 July 2015 by Nad (talk | contribs) (Removing deleted items from history)
Broom icon.svg The content of this article requires cleaning up to meet OD's quality standards. Check the wiki best practices for guidelines on improving article and categorisation quality.

Git is a distributed revision control / software code management project created by Linus Torvalds, initially for the Linux kernel development.

Git's design was inspired by BitKeeper and Monotone. Git was originally designed only as a low-level engine that others could use to write front ends such as Cogito or StGIT. However, the core Git project has since become a complete revision control system that is usable directly. Several high-profile software projects now use Git for revision control, most notably the Linux kernel, X.org Server, One Laptop per Child (OLPC) core development, and the Ruby on Rails web framework.

Git differs from systems such as CVS, or SVN in that the database is maintained beside the working filesystem on peers. Each peer is easily sync'ed to any other by using push/pull or fetch. In subversion you use three main directory structures;

  • trunk
  • branch
  • tag

In git the trunk is equivalent to HEAD, and is a sha1sum to the latest commit. Branches are used to fork development from the HEAD if for example bug fixing is required. A tag in git is just a named sha1sum commit which is effectilvely a static reference to a particular snapshot of code.

If a repository is cloned, git tracks the master which is the latest commit on the remote repository, as well as the origin/master which is the last known commit from the source sourced repository. This is updated if your remote changes are pushed back to the repository you cloned from.

Using Git to update Wikimedia extensions

git clone https://gerrit.wikimedia.org/r/p/mediawiki/extensions/<EXT>.git

This clones the entire repo including all revisions to your local system. To change the local directory structure to a specific revision, use checkout with the required commit hash, e.g.

git checkout 6854b712f200a833220c156a2499f502317c20c1

Reverting a working copy

git reset --hard <branch/tag/rev>

Updating just a single file

git fetch
git checkout origin/master -- path/to/file

Links to Setting up Git Servers

Github Webhooks

Github offers notifications via Webhooks so that services can respond dynamically to events occurring on their repositories. We use this to have some clones on the server automatically update whenever a push occurs. Here's an example PHP script that responds to Github notifications that are in JSON format (the default) with the optional secret used to validate the request. Note that the web-server must have permission to execute git pull on the repo in question.

if( array_key_exists( 'HTTP_X_HUB_SIGNATURE', $_SERVER ) ) {
        $sig = $_SERVER['HTTP_X_HUB_SIGNATURE'];
        $body = file_get_contents( 'php://input' );
        $hmac = hash_hmac( 'sha1', $body, 'SECRET' );
        if( $sig === "sha1=$hmac" ) {
                $repo = json_decode( $body )->repository->name;
                exec( "cd /PATH/TO/LOCAL/CLONES/$repo && git pull" );
        }
}

Hosting your own Git repo

Hosting your own Git repo is much simpler than with Subversion because every clone of the repo is complete and can be used as the source for all the other clones to push to. Nothing needs to be set up or configured for existing Unix accounts to connecting to Git repositories over secure shell, simple clone a repository using the following syntax:

git clone USER@DOMAIN:/PATH/TO/REPO

Setting up a web-based viewer is also a useful feature when serving your own repos, the best one I've found is Gitlist (not to be confused with Gitalist which is so fat that you'll probably need to wait an hour for it to download its dependencies and compile! I don't even want to know what the configuration process is like!). Gitlist is PHP-based, so you just unpack it into your web-space and set up the appropriate rewrite rules etc as you would for any other webapp like Wordpress or MediaWiki. It has a very basic configuration file which you point at the directory containing the repos you want to view, and specify any in there that you'd like to be hidden from view, and that's it! You can check our installation of Gitlist out at code.organicdesign.co.nz.

A simple way to have some repos hidden, except if a password is supplied in the URL is to add the following snippet into index.php after the $config object has been defined. Note that this isn't very secure, especially if you don't force SSL connections, but if you just want to have repos that aren't generally accessible but are easy to provide access to if desired then this is a good method.

$pass = 'PASSWORD';
$gpas = array_key_exists( 'pass', $_GET ) ? $_GET['pass'] : '';
if( $gpas != $pass ) {
	$ref = array_key_exists( 'HTTP_REFERER', $_SERVER ) ? $_SERVER['HTTP_REFERER'] : '';
	if( preg_match( "|pass=$pass|", $ref ) ) {
		$url = $_SERVER['SCRIPT_URI'] . "?pass=$pass&" . $_SERVER['QUERY_STRING'];
		header( "location: $url" );
		exit;
	}
	$config->set( 'git', 'hidden', array( '/PATH/TO/PRIVATE/REPOS' ) );
}

Automatically updating working-copies

It's often useful to have a remote repository automatically update whenever it's pushed to. There's a lot of sites around that explain how to do this, but they all refer to a post-update script that no longer exists. I found this site which has a local copy of the script, and I also added a copy to our tools repo here in case that one disappears as well.

All you need to do is save that script into your remote repo's .git/hooks directory with the file name post-update, make sure it's executable, and set the repository's receive.denyCurrentBranch to ignore, so you don't get error messages every time you push to it.

git config receive.denyCurrentBranch ignore

Changing history with Git

Removing deleted items from history

A couple of circumstances that require this are for example when you accidentally committed a large binary or something in the past that you'd like to get rid of to reduce the size of the repo, or when you want to split off some part of a repo into another repo so that the new one only has history for the items that exist in it. The first thing to do is to make a fresh clone (or copy an existing local clone directory) and then unlink it from the remote origin (unless of course you want to push this to the remote afterwards), just to be sure you don't push any unwanted changes to it.

git clone <REMOTE-REPO> foo
cd foo
git remote remove origin

You can either make a file (called delete.txt in this example) containing all the files/directory names you want to delete, or restructure the new repository to the way you want and obtain a list of all the deleted items like this (you can use git log to get the first commit).

git log --oneline --reverse
git log --diff-filter=D --summary <start_commit>..HEAD | egrep -o '*[[:alnum:]]*(/[[:alnum:].]*)+$' > delete.txt

You should probably edit the deleted.txt file to ensure that you're not deleting anything you want to keep, such as deleted files that are part of the history of the newly split repo. And back up the repo directory so you can modify things and retry if there's any problems.

Then the following commands will delete the items and all their history and references from the repo.

git filter-branch -f --tree-filter 'xargs git rm -rf --ignore-unmatch < /ABS/PATH/TO/delete.txt' HEAD
git reset --hard
git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin
git reflog expire --expire=now --all
git gc --aggressive --prune=now

Do some checkouts into historical states and see if the repo's as you want it. If it's not, then you should start the process again by re-cloning or copying again because running the command again doesn't work.

If this is not a new repo and you want to update these major changes on the remote origin, do the following as well:

git push origin --force --all
git push origin --force --tags

Inserting commits into the past

One interesting thing I had to do on a Git repository, which could me required again, was to insert commits into the history. I had migrated a site over to Github that had previously been worked on directly over FTP. After migrating the site to Github and working on it for a week or so, I discovered that many of the directories had a lot of backup versions of files with the date it was backed up in the filename. Obviously it would be much more organised if those backup files could actually be inserted into the revision history of their associated file. So here's a general procedure for how to insert revisions of a file into history.

Here's an example repository containing a file called foo.txt which has three revisions.

git log --oneline
e4e7230 C state
82f95e9 B state
d6f8638 added first file

I want to place two extra revisions before the "B state" revision, so first lets wind the head back to the commit before that,

git reset --hard d6f8638

Now we add our two new commits (which I've commented as A.1 and A.2), and we add an extra commit that puts the file back to the same state it was before the new commits (d6f8638), that way we avoid having to deal with conflicts in the next step. Then check the log again to see what we have.

f00face back to initial state
71acf31 A.2
f8cfb9b A.1
d6f8638 added first file

Then we wind the head back to the top, which is now a detached fork, so we make it into a branch called tmp so we can refer to it more easily.

git checkout e4e7230
git checkout -b tmp

This tmp branch is connected at the first commit since that's where we went back to and forked by making new commits there, so we can use rebase to disconnect it from there and connect it to the new end of the master branch.

git rebase master

Now let's see what we've got:

git log --oneline
ca94250 C state
5e97ab9 B state
f00face back to initial state
ffd055c A.2
1a9ff08 A.1
d6f8638 added first file

That's just what we wanted, the original three commits with our extra two inserted in. The only problem left to fix up is that we're in the tmp branch, the master's head is still at A.2. So all we need to do to tidy up is merge the tmp branch into master and delete it.

git checkout master
git merge tmp
git branch -d tmp

We still need to get these changes to the remote though, and it won't let you push until after you pull first to merge the remote and local versions together. So either pull then push and enter a comment for the merge commit, or just force the local state onto the origin.

git push origin --force --all

See also