Git

From Organic Design wiki
Revision as of 18:15, 18 April 2019 by Nad (talk | contribs) (See also: Learn Enough Git to be Useful - 4 essential workflows for GitHub projects)

Git is a distributed revision control and software code management project created by Linus Torvalds, initially for the Linux kernel development. Git's design was inspired by BitKeeper and Monotone. Git was originally designed only as a low-level engine that others could use to write front ends such as Cogito or StGIT. However, the core Git project has since become a complete revision control system that is usable directly. Several high-profile software projects now use Git for revision control, most notably the Linux kernel, X.org Server, One Laptop per Child (OLPC) core development, and the Ruby on Rails web framework.

Git differs from systems such as CVS, or Subversion in that the database is maintained beside the working filesystem on peers. Each peer is easily sync'ed to any other by using push/pull or fetch. In subversion you use three main directory structures; the trunk, branch and tag. In git the trunk is equivalent to HEAD, and is a sha1sum to the latest commit. Branches are used to fork development from the HEAD if for example bug fixing is required. A tag in git is just a named sha1sum commit which is effectively a static reference to a particular snapshot of code.

If a repository is cloned, git tracks the master which is the latest commit on the remote repository, as well as the origin/master which is the last known commit from the source sourced repository. This is updated if your remote changes are pushed back to the repository you cloned from.

Reverting to a specific commit

What I want is to have a kind of revert functionality similar to the wiki where you want to take the repo back to a particular state marked by a commit ID, and then make a new commit noting that you're reverting. Note that I'm not talking about winding the repo back because that would remove all the subsequent commits, and I'm not talking about branching off at the old commit. The following syntax creates a new commit that takes us back to the state at SHA1.

git reset --hard SHA1
git reset --soft HEAD@{1}
git commit -m "Revert to SHA1"

Reverting local changes

To revert local changes back to the local repo's last commit, you can use:

git reset --hard

You can also add a SHA after that to revert further back.

To revert just specific files in the working copy to the last commit:

git commit -- [FILE]

No parameter will show you the differences, or you can specify a file or wildcard files etc.

Make a new "master" repo from a clone

git clone --bare localrepo.git newremoterepo.git

Get the clone URL from within a repo

You can fine the details about a repo's remote origin using the following command:

git remote show origin

If the repo is corrupted you may need to use this:

git config --get remote.origin.url

Updating just a single file

Sometimes you want to bring only a single file in a working copy up to date and leave everything else as it is.

git fetch
git checkout origin/master -- path/to/file

Showing local changes

To show the changes between the working directory and the index. This shows what has been changed, but is not staged for a commit.

git diff


To show the changes between the index and the HEAD (which is the last commit on this branch). This shows what has been added to the index and staged for a commit.

git diff --cached


To show all the changes between the working directory and HEAD (which includes changes in the index). This shows all the changes since the last commit, whether or not they have been staged for commit or not.

git diff HEAD

Nuking a local branch and replacing with a completely different one from the origin

This example nukes the local master branch and then replaces it with the one in the remote origin which has completely replaced the original one. This assumes we're on a different branch than master to start with.

git branch -D master
git fetch --all
git checkout --track origin/master

GitHub Webhooks

Github offers notifications via Webhooks so that services can respond dynamically to events occurring on their repositories. We use this to have some clones on the server automatically update whenever a push occurs. Here's an example PHP script that responds to Github notifications that are in JSON format (the default) with the optional secret used to validate the request. Note that the web-server must have permission to execute git pull on the repo in question.

if( array_key_exists( 'HTTP_X_HUB_SIGNATURE', $_SERVER ) ) {
	$sig = $_SERVER['HTTP_X_HUB_SIGNATURE'];
	$body = file_get_contents( 'php://input' );
	$hmac = hash_hmac( 'sha1', $body, 'SECRET' );
	if( $sig === "sha1=$hmac" ) {
		$repo = json_decode( $body )->repository->name;
		exec( "cd /PATH/TO/LOCAL/CLONES/$repo && git pull --no-edit" );
	}
}

GitLab Webhooks

GitLab also offer web-hooks as part of their integration system. It's very similar to Github, but the PHP script would look more like this instead:

if( array_key_exists( 'HTTP_X_GITLAB_TOKEN', $_SERVER ) && $_SERVER['HTTP_X_GITLAB_TOKEN'] == 'SECRET' ) {
	$body = json_decode( file_get_contents( 'php://input' ) );
	if( preg_match( '|([^/]+).git$|', $body->repository->url, $m ) ) {
		exec( "cd /PATH/TO/LOCAL/CLONES/$m[1] && git pull --no-edit" );
	}
}

Note: I've extracted the repo name from url rather than using name because the project name can be different than the repo name.

Hosting your own Git repo

Hosting your own Git repo is much simpler than with Subversion because every clone of the repo is complete and can be used as the source for all the other clones to push to. Nothing needs to be set up or configured for existing Unix accounts to connecting to Git repositories over secure shell, simple clone a repository using the following syntax:

git clone USER@DOMAIN:/PATH/TO/REPO

If you'd like to restrict access to the users connecting to the repo to only be able to perform git operations, the you can add the following to their authorized_keys file:

command="git-shell -c \"$SSH_ORIGINAL_COMMAND\"",no-port-forwarding,no-pty,no-agent-forwarding,no-X11-forwarding ssh-rsa...

Setting up a web-based viewer is also a useful feature when serving your own repos, the best one I've found is Gitlist (not to be confused with Gitalist which is so fat that you'll probably need to wait an hour for it to download its dependencies and compile! I don't even want to know what the configuration process is like!). Gitlist is PHP-based, so you just unpack it into your web-space and set up the appropriate rewrite rules etc as you would for any other webapp like Wordpress or MediaWiki. It has a very basic configuration file which you point at the directory containing the repos you want to view, and specify any in there that you'd like to be hidden from view, and that's it! You can check our installation of Gitlist out at code.organicdesign.co.nz.

A simple way to have some repos hidden, except if a password is supplied in the URL is to add the following snippet into index.php after the $config object has been defined. Note that this isn't very secure, especially if you don't force SSL connections, but if you just want to have repos that aren't generally accessible but are easy to provide access to if desired then this is a good method. Note also that the hidden repos config value is a regular expression not just a path.

$pass = 'PASSWORD';
$gpas = array_key_exists( 'pass', $_GET ) ? $_GET['pass'] : '';
if( $gpas != $pass ) {
	$ref = array_key_exists( 'HTTP_REFERER', $_SERVER ) ? $_SERVER['HTTP_REFERER'] : '';
	if( preg_match( "|pass=$pass|", $ref ) ) {
		$url = $_SERVER['SCRIPT_URI'] . "?pass=$pass&" . $_SERVER['QUERY_STRING'];
		header( "location: $url" );
		exit;
	}
	$config->set( 'git', 'hidden', array( '/\/PRIVATE-REPO\/PATH\/PATTERN/' ) );
}

Automatically updating working-copies

It's often useful to have a remote repository automatically update whenever it's pushed to. There's a lot of sites around that explain how to do this, but they all refer to a post-update script that no longer exists. I found this site which has a local copy of the script, and I also added a copy to our tools repo here in case that one disappears as well.

All you need to do is save that script into your remote repo's .git/hooks directory with the file name post-update, make sure it's executable, and set the repository's receive.denyCurrentBranch to ignore, so you don't get error messages every time you push to it.

git config receive.denyCurrentBranch ignore

And if ownership or mode may need to be changed in the files without being considered as changes the set this too,

git config --global core.fileMode false

To have this update further notify other servers so that they can also update their working copies or perform other actions (similar to GitHub's webhooks solution) you can simply add wget commands after the wc_update function call toward the end of the post-update script.

wget -qO /dev/null "https://www.example.com/git-update.php?pass=foobaz"

The git-update script on the target server can then check the password and perform a sudo git pull using the same procedure as shown above for the GitHub's webhooks. If the update call will always be coming from the same server, then rather than use a password, the script can just check the REMOTE_ADDR instead.

Separating a sub-directory out to it's own repository

Often it's useful to make a single directory within a repository into it's own new repository, for example a client may wish to begin hosting their own repository, or after migrating from Subversion a lot of directories that made sense being bundled together into a single repo are no longer practical together (because Subversion allows checking out of individual paths, but Git does not).

The git filter-branch command has an option called subdirectory-filter which is exactly for this purpose. First make a copy or new clone of the repository that contains the directory in question, then run the command as follows:

git remote remove origin
git filter-branch --subdirectory-filter DIRNAME HEAD

You'll see if you look at the files both present and in past commits, or do a git log --oneline that only the files and commits that apply to the sub-directory now exist. But if you check the size of the repo you'll notice that it's actually more than doubled in size! So we need to do a bit of cleaning up to get rid of all the old data completely.

git reset --hard
git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin
git reflog expire --expire=now --all
git gc --aggressive --prune=now

To create this as a new repo for example in Gitlab, first create a new empty project, then add the new origin and push the stripped repo to it:

git remote add gitlab git@gitlab.com:Aranad/PdfBook.git
git push gitlab master

Changing history with Git

Removing items from history

A couple of circumstances that require this are for example when you accidentally committed a large binary or something in the past that you'd like to get rid of to reduce the size of the repo, or when you want to split off some part of a repo into another repo and remove all evidence of it from the current repo.

There are many posts around showing you how to do this with the filter-index and filter-tree options of git filter-branch, and while these options succeed in removing all the files as they should, if you do look at the commit history there are still many commits in the list that have nothing to do with the remaining files! I haven't figured out what's going on here, yet so to thoroughly remove file and all their associated commits, I see no other option than moving everything you want to keep into it's own directory (using filter-branch because mv will destroy the history) and then using the subdirectory-filter option shown above to make a new repository out of this directory, and then moving the items back to where you want them.

For example here we remove everything except the items listed in the file keep.txt. We do it by making a tmp directory and moving everything we want to keep into it and then using the subdirectory-filter method to retain only that tmp directory's contents.

First create the temporary directory and move all items throughout history into it using xargs, note that this assumes you've already made your keep.txt file list and it's in the repo dir's parent.

cd /path/to/my/repo
mkdir tmp
git filter-branch -f --tree-filter 'xargs -i git mv {} tmp/ < ../keep.txt' HEAD

Then tidy everything up,

git reset --hard
git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin
git reflog expire --expire=now --all
git gc --aggressive --prune=now

Run the sub-directory filter making the tmp directory the new repo root,

git filter-branch --subdirectory-filter tmp HEAD

And finally tidy up again.

git reset --hard
git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin
git reflog expire --expire=now --all
git gc --aggressive --prune=now

You can check that it's worked by checking out various commits and looking at the files, and checking the commit log for words in comments you know should only be associated with the files that have been removed, for example I removed an auction site from my private development repo after the owners wanted to start hosting their own code and so I checked the log for "bid" and "offer".

git log --oneline | grep offer

Inserting commits into the past

One interesting thing I had to do on a Git repository, which could me required again, was to insert commits into the history. I had migrated a site over to Github that had previously been worked on directly over FTP. After migrating the site to Github and working on it for a week or so, I discovered that many of the directories had a lot of backup versions of files with the date it was backed up in the filename. Obviously it would be much more organised if those backup files could actually be inserted into the revision history of their associated file. So here's a general procedure for how to insert revisions of a file into history.

Here's an example repository containing a file called foo.txt which has three revisions.

git log --oneline
e4e7230 C state
82f95e9 B state
d6f8638 added first file

I want to place two extra revisions before the "B state" revision, so first lets wind the head back to the commit before that,

git reset --hard d6f8638

Now we add our two new commits (which I've commented as A.1 and A.2), and we add an extra commit that puts the file back to the same state it was before the new commits (d6f8638), that way we avoid having to deal with conflicts in the next step. Then check the log again to see what we have.

f00face back to initial state
71acf31 A.2
f8cfb9b A.1
d6f8638 added first file

Then we wind the head back to the top, which is now a detached fork, so we make it into a branch called tmp so we can refer to it more easily.

git checkout e4e7230
git checkout -b tmp

This tmp branch is connected at the first commit since that's where we went back to and forked by making new commits there, so we can use rebase to disconnect it from there and connect it to the new end of the master branch.

git rebase master

Now let's see what we've got:

git log --oneline
ca94250 C state
5e97ab9 B state
f00face back to initial state
ffd055c A.2
1a9ff08 A.1
d6f8638 added first file

That's just what we wanted, the original three commits with our extra two inserted in. The only problem left to fix up is that we're in the tmp branch, the master's head is still at A.2. So all we need to do to tidy up is merge the tmp branch into master and delete it.

git checkout master
git merge tmp
git branch -d tmp

We still need to get these changes to the remote though, and it won't let you push until after you pull first to merge the remote and local versions together. So either pull then push and enter a comment for the merge commit, or just force the local state onto the origin.

git push origin --force --all

Git frameworks

See also