Git has many features, but it’s likely that you only use a small subset on a daily basis. While git tends to handle things intelligently most of the time, there are situations when doing the most obvious thing doesn’t give git enough information to make informed choices – like merging files in git.
In this post, we’ll examine merging files in git. Through trial and error, I’ll show you the normal approach people take, some of the issues that occur with that approach, and a completely different approach that preserves some data lost in the first approach.
Merging Multiple Files
When dealing with an existing code project, there is a good chance that you’ll need to refactor it at some point. There are many techniques used to refactor code, but no matter which you choose, there’s a decent chance that the process will somehow involve moving pieces of code from multiple files into one or splitting code from one file into multiple files.
Let’s take a look at our example, merging multiple files into one, in the context of a git repository.
Creating Files
We’ll start with a fresh git repository.
$ git init
Then, we’ll create 3 files, each holding artists from different genres of music. To make it interesting, let’s make it so multiple different people make changes to these files.
$ >rap echo Drake $ git add rap $ git commit --author "Adi <adi>" -m "create rap" $ >>rap echo Lil Wayne $ git commit --author "Brad <brad>" -am "add Lil Wayne" $ >>rap echo Kanye West $ git commit --author "Casey <casey>" -am "add Kanye West"
After that, you should get a git blame
that looks similar to this:
$ git blame rap ^90ad259 (Adi 2020-05-16 19:55:54 -0500 1) Drake 43b0617f (Brad 2020-05-16 19:59:20 -0500 2) Lil Wayne 92ba9683 (Casey 2020-05-16 19:59:55 -0500 3) Kanye West
The first file is made. Now, we’ll repeat this process for two more genres of music: country and alternative.
$ >country echo George Strait $ git add country $ git commit --author "Dallas <dallas>" -m "create country" $ >>country echo Tim McGraw $ git commit --author "Eric <eric>" -am "add Tim McGraw" $ >>country echo Kenny Chesney $ git commit --author "Gabe <gabe>" -am "add Kenny Chesney"
$ >alternative echo Red Hot Chili Peppers $ git add alternative $ git commit --author "Haaris <haaris>" -m "create alternative" $ >>alternative echo Foo Fighters $ git commit --author "Jaime <jaime>" -am "add Foo Fighters" $ >>alternative echo Green Day $ git commit --author "Keith <keith>" -am "add Green Day"
Again, you should end up with blames that look similar to these:
$ git blame country 8d8c3573 (Dallas 2020-05-16 20:07:06 -0500 1) George Strait a4261c52 (Eric 2020-05-16 20:07:34 -0500 2) Tim McGraw 224ceb36 (Gabe 2020-05-16 20:08:02 -0500 3) Kenny Chesney
$ git blame alternative 05ad5847 (Haaris 2020-05-16 20:15:13 -0500 1) Red Hot Chili Peppers d41478b5 (Jaime 2020-05-16 20:15:40 -0500 2) Foo Fighters 26e254a7 (Keith 2020-05-16 20:16:05 -0500 3) Green Day
And that’s it! Our three files are now created.
Merging the Files
Trial One
With the files successfully created, let’s tag our repo and try to merge.
$ git tag premerge
At this point in the process, we realize that three different files are a lot to handle and that all we really need is one. So, we decide to create a single file called “music,” which will contain all of the data from the rap, country, and alternative files combined. Here is a straightforward way to do that:
$ cat rap country alternative > music $ git rm rap country alternative $ git add music $ git commit --author "Lauren <lauren>" -m "merge"
Seems simple enough and fairly sound, but let’s see what we maintain out of the commit information we worked so hard to create.
$ git blame music bc43cfcc (Lauren 2020-05-16 20:27:26 -0500 1) Drake bc43cfcc (Lauren 2020-05-16 20:27:26 -0500 2) Lil Wayne bc43cfcc (Lauren 2020-05-16 20:27:26 -0500 3) Kanye West bc43cfcc (Lauren 2020-05-16 20:27:26 -0500 4) George Strait bc43cfcc (Lauren 2020-05-16 20:27:26 -0500 5) Tim McGraw bc43cfcc (Lauren 2020-05-16 20:27:26 -0500 6) Kenny Chesney bc43cfcc (Lauren 2020-05-16 20:27:26 -0500 7) Red Hot Chili Peppers bc43cfcc (Lauren 2020-05-16 20:27:26 -0500 8) Foo Fighters bc43cfcc (Lauren 2020-05-16 20:27:26 -0500 9) Green Day
Error
And now we find a problem. Even though we know a number of people were involved in creating the data in the file, it looks like Lauren did all of the work.
If you’ve ever been on a development team, you know this won’t fly. Let’s take a look at a different way to merge these files.
Trial Two
First, we’ll reset. Then, for our second attempt, let’s try only merging two files together.
$ git reset --hard premerge $ cat rap country > music $ git add music $ git rm rap country $ git commit --author "Lauren <lauren>" -m "merge 2"
Things will look a little bit better this time around, but not quite up to the standard we’re looking for.
$ git blame music 00242b52 music (Lauren 2020-05-16 20:31:41 -0500 1) Drake 00242b52 music (Lauren 2020-05-16 20:31:41 -0500 2) Lil Wayne 00242b52 music (Lauren 2020-05-16 20:31:41 -0500 3) Kanye West 8d8c3573 country (Dallas 2020-05-16 20:07:06 -0500 4) George Strait a4261c52 country (Eric 2020-05-16 20:07:34 -0500 5) Tim McGraw 224ceb36 country (Gabe 2020-05-16 20:08:02 -0500 6) Kenny Chesney
Error
This time, we see that the history of the country file was preserved – great! However, the rap file’s history wasn’t maintained – not great.
What happened here is that git saw two files disappear and one file appear, and its rename logic was activated. Basically, from git’s perspective, all we did was rename the country file to “music,” deleted the rap file, and add three new rows to the top of the music (formerly country) file.
To get around this, there are several options that we could pass to the blame command that will give us the results we want (specifically, messing with the -M and -C options should do it). However, there are still problems with this approach.
The biggest of these problems is that oftentimes, we can’t control how the blame command is run. If an IDE extension is doing the work or if it’s being done through a web interface, we probably won’t be able to control what options are used.
Trial Three
So, how can we get blame to show the correct history? Let’s reset and try something else.
$ git reset --hard premerge
This time around, what if we try making our changes in separate branches and then merging?
$ git checkout -b rename-country $ git mv country music $ git commit --author "Lauren <lauren>" -m "rename country to music" $ git checkout - $ git mv rap music $ git commit --author "Lauren <lauren>" -m "rename rap to music" $ git merge -m "combine rap and country to music" rename-country
From first glance, it doesn’t appear to work; git doesn’t like this either.
CONFLICT (rename/rename): Rename rap->music in HEAD. Rename country->music in rename-country Auto-merging music Automatic merge failed; fix conflicts and then commit the result.
However, there is good news: we can definitely fix this. All we need to do is concatenate the files we created.
$ git cat-file --filters HEAD:music >music $ git cat-file --filters rename-country:music >>music $ git add music $ git merge --continue
Now, after saving our merge message, let’s check out the blame on our new music file.
$ git blame music ^90ad259 rap (Adi 2020-05-16 19:55:54 -0500 1) Drake 43b0617f rap (Brad 2020-05-16 19:59:20 -0500 2) Lil Wayne 92ba9683 rap (Casey 2020-05-16 19:59:55 -0500 3) Kanye West 8d8c3573 country (Dallas 2020-05-16 20:07:06 -0500 4) George Strait a4261c52 country (Eric 2020-05-16 20:07:34 -0500 5) Tim McGraw 224ceb36 country (Gabe 2020-05-16 20:08:02 -0500 6) Kenny Chesney
Great! This is what we wanted to preserve.
An important note: you’ll want to make sure that you don’t change the original contents of the files in this step. Otherwise, git will have to use its “similar file” logic instead of its “rename” logic. The similar file logic is more complex, and it’s also harder to predict what the resulting outcome will look like.
So, since this approach worked to merge two files, we should just be able to follow the same logic to merge all three into one, right?
(Make sure to delete the branches we created earlier before completing this!)
$ git reset --hard premerge $ git checkout -b rename-rap $ git mv rap music $ git commit --author "Lauren <lauren>" -m "rename rap to music" $ git checkout - $ git checkout -b rename-country $ git mv country music $ git commit --author "Lauren <lauren>" -m "rename country to music" $ git checkout - $ git checkout -b rename-alternative $ git mv alternative music $ git commit --author "Lauren <lauren>" -n "rename alternative to music" $ git checkout - $ git merge rename-rap rename-country rename-alternative
Nope! Our result indicates that we are wrong … yet again.
Fast-forwarding to: rename-rap Trying simple merge with rename-country Simple merge did not work, trying automatic merge. Added music in both, but differently. fatal: unable to read blob object e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 error: Could not stat : No such file or directory ERROR: content conflict in music fatal: merge program failed Automated merge did not work. Should not be doing an octopus. Merge with strategy octopus failed.
Error
It blew up, but why? You may notice the new term octopus
here. As the git-merge documentation says, octopus
is the default strategy used by git for merges with more than one branch. However, octopus
can only be used in cases where there are no conflicts (hence the “should not be doing an octopus” message above).
Trial Four
So from here, let’s try a more complex strategy that will also preserve our commit history. We still have our 3 branches with the renamed music file, and we’ll still need them for the upcoming steps. In this attempt, let’s create the merge manually and then have git create the tree that would be created with a successful merge.
$ git reset --hard $ cat rap country alternative >music $ git add music $ git rm rap country alternative $ git write-tree
You may notice the command write-tree
. More information about write-tree
can be found in the git documentation, but basically, this command creates the tree that would have been created if a commit was done at this point. The command returns a hash that we’ll use in the next step, where we’ll create the merge that we want.
$ git commit-tree <tree-hash> -p HEAD -p rename-rap -p rename-country -p rename-alternative -m "combine rap, country, and alternative into music"
So, what exactly are we telling git to do here? Well, we’re telling it to …
- Start with the files in HEAD (in this case, the music file we just created from the cat command),
- Apply the git history, metadata, etc. from rename-rap’s music file onto the HEAD file,
- Do the same for rename-country’s music file,
- And finally, again for rename-alternative’s music file.
The command from above will also return a hash that we will use in the final step.
Now, we can perform our merge.
$ git merge <commit-hash> Updating 26e254a..b656522 Fast-forward alternative | 3 --- country | 3 --- music | 9 +++++++++ rap | 3 --- 4 files changed, 9 insertions(+), 9 deletions(-) delete mode 100644 alternative delete mode 100644 country create mode 100644 music delete mode 100644 rap $ git blame music ^90ad259 rap (Adi 2020-05-16 19:55:54 -0500 1) Drake 43b0617f rap (Brad 2020-05-16 19:59:20 -0500 2) Lil Wayne 92ba9683 rap (Casey 2020-05-16 19:59:55 -0500 3) Kanye West 8d8c3573 country (Dallas 2020-05-16 20:07:06 -0500 4) George Strait a4261c52 country (Eric 2020-05-16 20:07:34 -0500 5) Tim McGraw 224ceb36 country (Gabe 2020-05-16 20:08:02 -0500 6) Kenny Chesney 05ad5847 alternative (Haaris 2020-05-16 20:15:13 -0500 7) Red Hot Chili Peppers d41478b5 alternative (Jaime 2020-05-16 20:15:40 -0500 8) Foo Fighters 26e254a7 alternative (Keith 2020-05-16 20:16:05 -0500 9) Green Day
Success!
Finally, after four tries, success! We were able to merge three files into one and maintain all the history from the original files.
I suppose we could also accomplish the three or more file merge by merging two files and then merging the next file into that file and so forth. However, this method gives us a clean history while still successfully merging multiple files.
$ git log --pretty=oneline b656522165b7b447723b17417d25e37d5a19c829 (HEAD -> master) combine rap, country, and alternative into music ede97e4387b6248218651a08ab3ab1138b943baf (rename-alternative) rename alternative to music 28d6fc2cae3035e782bb5eff9100b690ea6418b3 (rename-country) rename country to music c65626c1e5b31e20cc3854bd04ac05ee1a47c6b7 (rename-rap) rename rap to music 26e254a71ffd594d622d4e34d448ea286ad9e939 (tag: premerge) add Green Day d41478b5c9d40e27c530b1c57ca21e79af67d143 add Foo Fighters 05ad58479fb3787bdfc502b9db182a255237dc51 create alternative 224ceb3663fa6155ee67d0cd969b009d7df7c4b5 add Kenny Chesney a4261c5292860edcdb89393b6cc412e465c38b32 add Tim McGraw 8d8c35732f9b4730ce1fc83547e0d13c1249104d create country 92ba9683f09226e249e0965c1801ccfba292e5e7 add Kanye West 43b0617f80f58fca4b8010ec928d059cacc22ae7 add Lil Wayne 90ad259ded8bff5046a81226abc75dc1c05c168b create rap
Refactoring can be a long and tedious process, and it’s easy to get blamed for just moving code written by others around in circles. Hopefully, this will help you to avoid that and will make life a little easier for those of you currently working on a refactor.