Git Repository Merging

Posted on February 15, 2019

I recently blogged about how good a monorepo is for productivity in some cases. If you are working on a project which is spread over multiple repos and struggling with lowered productivity you might be interested in merging them together. I was faced with this task on two of my recent projects and used the method described in this blog post for merging the git repositories together.

Retaining history

The most important requirement for the switch was to make sure the git repositories were still available and preferrably as easily accessible as possible. This ruled out doing a simple change like mv child_project_a . followed by git add child_project_a. Additionally some of the projects that had been separate before would have to be moved into the main project under a subdirectory. In this case I also wanted to history to appear as if all the files had always existed in that directory.

High level plan

  1. Use git filter-branch to rewrite the history to place code under a specific directory in a temporary branch of the child repositories
  2. Add child repositories as local file system remotes to the new main repository
  3. Merge in the filter-branched child repository histories to the main branch of the main repository

In detail

The git filter-branch script

The command for rewriting the history is quite complex and I ended up putting the following script into a file git-index-filter-mv on my $PATH. This script was based on the example here: https://git-scm.com/docs/git-filter-branch see the part about “move the whole tree into a subdirectory”. By naming it git-index-filter-mv I can run it as git index-filter-mv.

#!/usr/bin/env bash

# I use MacOS which doesn't have GNU sed, so I installed `gnu-sed` using brew which gives me GNU sed under the name `gsed`
SED=gsed
# The new subdir to move files into is given by the first argument to this command
NEWSUBDIR=$1
# The second argument defines the initial part of the directory that must match for the file to be moved
REQUIRED_PREFIX=$2

# We assign this sed script to its own variable to make it easier to generate the full final command with all the different contexts and their required escaping
# The script uses - characters as the pattern delimiter and looks for the first match of REQUIRED_PREFIX with some additional characters and replaces the initial part with NEWSUBDIR
SED_COMMAND="s-\(\t\\\"*\)\($REQUIRED_PREFIX\)-\1$NEWSUBDIR/-"
# Next we again define the whole FILTER_COMMAND as it's own variable again for easier control of the tricky shell string escaping
# git ls-files -s lists the current staged files
# next we pipe the staged files into the SED_COMMAND script to replace the paths
# then we redefine GIT_INDEX_FILE to $GIT_INDEX_FILE.new and run git update-index to update that file with the updated file names
# if the $GIT_INDEX_FILE.new file exists the previous command execution worked fine and we can overwrite the original GIT_INDEX_FILE with the contents of $GIT_INDEX_FILE.new
FILTER_COMMAND="git ls-files -s | \
    $SED \"$SED_COMMAND\" | \
    GIT_INDEX_FILE=\$GIT_INDEX_FILE.new git update-index --index-info && \
    if [ -f \$GIT_INDEX_FILE.new ]; then mv \"\$GIT_INDEX_FILE.new\" \"\$GIT_INDEX_FILE\"; fi"

# finally we set up the git params for running git, I would not have known how to do this properly without the wonderful shellcheck command/plugin
# we use filter-branch with index-filter to run the filter command
# index-filter is a faster way to rewrite git branches since it doesn't need to write out all the files from the index into the working directory
# on the other hand, it's more difficult to use, since all your commands have to operate on the index instead of normal shell commands operating on the working directory
GIT_PARAMS=(filter-branch --prune-empty --index-filter "$FILTER_COMMAND" HEAD)

# finally we run the command!
git "${GIT_PARAMS[@]}"

Merging the filtered branches

If we assume that we want to move the contents of “master” to a “common” subdirectory inside common-repo and then merge it into the “master” branch of the “main” repo, we would do the following steps.

  1. cd common to switch to the child-repo-a repository
  2. git checkout master to check out our master branch
  3. git checkout -b master-moved to create a new temporary branch which we can rewrite
  4. cd main to switch to the main repo
  5. git remote add common ~/projects/common to add a git remote that points to the common directory of the local file system
  6. git fetch common to sync up the contents of “common” to “main”
  7. git checkout master to check out the master branch of main
  8. git merge common/master-moved --allow-unrelated-histories to merge the master-moved branch of common into the master branch of main
  9. repeat the above for the rest of the child repos

Special considerations

  1. If you have to repeat the git filter-branch command multiple times for some reason, you will find that it doesn’t work since it creates a temp directory called .git/refs/original which can be used as a back up to get back your old rewritten commits. You can either use a good old rm -rf .git/refs/original or update the script above to run with the --force flag. This tells git that you know what you are doing and that you want git to just do the work.
  2. If this is the first time you do this kind of tricky git work it might take you some time to plan and execute this merge procedure. In the meantime other developers might be making additional changes in to the child repositories in new feature branches. This content needs to be filtered and rebased onto the earlier master-moved branch. It can then be treated as a new feature branch in the “main” repository and be merged in like normal. Make sure you know how to use git rebase correctly.