A Better Approach to Merging Files in Git

Brice McIver Development Technology, Git Leave a Comment

Git has many features, but it’s likely that you only use a small subset on a daily basis. While git tends to handle things intelligently most of the time, there are situations when doing the most obvious thing doesn’t give git enough information to make informed choices – like merging files in git.

In this post, we’ll examine merging files in git. Through trial and error, I’ll show you the normal approach people take, some of the issues that occur with that approach, and a completely different approach that preserves some data lost in the first approach.

Merging Multiple Files

When dealing with an existing code project, there is a good chance that you’ll need to refactor it at some point. There are many techniques used to refactor code, but no matter which you choose, there’s a decent chance that the process will somehow involve moving pieces of code from multiple files into one or splitting code from one file into multiple files.

Let’s take a look at our example, merging multiple files into one, in the context of a git repository.

Creating Files

We’ll start with a fresh git repository.

$ git init

Then, we’ll create 3 files, each holding artists from different genres of music. To make it interesting, let’s make it so multiple different people make changes to these files.

$ >rap echo Drake
$ git add rap
$ git commit --author "Adi <adi>" -m "create rap"
$ >>rap echo Lil Wayne
$ git commit --author "Brad <brad>" -am "add Lil Wayne"
$ >>rap echo Kanye West
$ git commit --author "Casey <casey>" -am "add Kanye West"

After that, you should get a git blame that looks similar to this:

$ git blame rap
^90ad259 (Adi   2020-05-16 19:55:54 -0500 1) Drake
43b0617f (Brad  2020-05-16 19:59:20 -0500 2) Lil Wayne
92ba9683 (Casey 2020-05-16 19:59:55 -0500 3) Kanye West

The first file is made. Now, we’ll repeat this process for two more genres of music: country and alternative.

$ >country echo George Strait
$ git add country
$ git commit --author "Dallas <dallas>" -m "create country"
$ >>country echo Tim McGraw
$ git commit --author "Eric <eric>" -am "add Tim McGraw"
$ >>country echo Kenny Chesney
$ git commit --author "Gabe <gabe>" -am "add Kenny Chesney"
$ >alternative echo Red Hot Chili Peppers
$ git add alternative
$ git commit --author "Haaris <haaris>" -m "create alternative"
$ >>alternative echo Foo Fighters
$ git commit --author "Jaime <jaime>" -am "add Foo Fighters"
$ >>alternative echo Green Day
$ git commit --author "Keith <keith>" -am "add Green Day"

Again, you should end up with blames that look similar to these:

$ git blame country
8d8c3573 (Dallas 2020-05-16 20:07:06 -0500 1) George Strait
a4261c52 (Eric   2020-05-16 20:07:34 -0500 2) Tim McGraw
224ceb36 (Gabe   2020-05-16 20:08:02 -0500 3) Kenny Chesney
$ git blame alternative
05ad5847 (Haaris 2020-05-16 20:15:13 -0500 1) Red Hot Chili Peppers
d41478b5 (Jaime  2020-05-16 20:15:40 -0500 2) Foo Fighters
26e254a7 (Keith  2020-05-16 20:16:05 -0500 3) Green Day

And that’s it! Our three files are now created.

Merging the Files

Trial One

With the files successfully created, let’s tag our repo and try to merge.

$ git tag premerge

At this point in the process, we realize that three different files are a lot to handle and that all we really need is one. So, we decide to create a single file called “music,” which will contain all of the data from the rap, country, and alternative files combined. Here is a straightforward way to do that:

$ cat rap country alternative > music
$ git rm rap country alternative
$ git add music
$ git commit --author "Lauren <lauren>" -m "merge"

Seems simple enough and fairly sound, but let’s see what we maintain out of the commit information we worked so hard to create.

$ git blame music
bc43cfcc (Lauren 2020-05-16 20:27:26 -0500 1) Drake
bc43cfcc (Lauren 2020-05-16 20:27:26 -0500 2) Lil Wayne
bc43cfcc (Lauren 2020-05-16 20:27:26 -0500 3) Kanye West
bc43cfcc (Lauren 2020-05-16 20:27:26 -0500 4) George Strait
bc43cfcc (Lauren 2020-05-16 20:27:26 -0500 5) Tim McGraw
bc43cfcc (Lauren 2020-05-16 20:27:26 -0500 6) Kenny Chesney
bc43cfcc (Lauren 2020-05-16 20:27:26 -0500 7) Red Hot Chili Peppers
bc43cfcc (Lauren 2020-05-16 20:27:26 -0500 8) Foo Fighters
bc43cfcc (Lauren 2020-05-16 20:27:26 -0500 9) Green Day

Error

And now we find a problem. Even though we know a number of people were involved in creating the data in the file, it looks like Lauren did all of the work.

If you’ve ever been on a development team, you know this won’t fly. Let’s take a look at a different way to merge these files.

See Also:  Q# Quantum Random Number Generator

Trial Two

First, we’ll reset. Then, for our second attempt, let’s try only merging two files together.

$ git reset --hard premerge
$ cat rap country > music
$ git add music
$ git rm rap country
$ git commit --author "Lauren <lauren>" -m "merge 2"

Things will look a little bit better this time around, but not quite up to the standard we’re looking for.

$ git blame music
00242b52 music   (Lauren 2020-05-16 20:31:41 -0500 1) Drake
00242b52 music   (Lauren 2020-05-16 20:31:41 -0500 2) Lil Wayne
00242b52 music   (Lauren 2020-05-16 20:31:41 -0500 3) Kanye West
8d8c3573 country (Dallas 2020-05-16 20:07:06 -0500 4) George Strait
a4261c52 country (Eric   2020-05-16 20:07:34 -0500 5) Tim McGraw
224ceb36 country (Gabe   2020-05-16 20:08:02 -0500 6) Kenny Chesney

Error

This time, we see that the history of the country file was preserved – great! However, the rap file’s history wasn’t maintained – not great.

What happened here is that git saw two files disappear and one file appear, and its rename logic was activated. Basically, from git’s perspective, all we did was rename the country file to “music,” deleted the rap file, and add three new rows to the top of the music (formerly country) file.

To get around this, there are several options that we could pass to the blame command that will give us the results we want (specifically, messing with the -M and -C options should do it). However, there are still problems with this approach.

The biggest of these problems is that oftentimes, we can’t control how the blame command is run. If an IDE extension is doing the work or if it’s being done through a web interface, we probably won’t be able to control what options are used.

Trial Three

So, how can we get blame to show the correct history? Let’s reset and try something else.

$ git reset --hard premerge

This time around, what if we try making our changes in separate branches and then merging?

$ git checkout -b rename-country
$ git mv country music
$ git commit --author "Lauren <lauren>" -m "rename country to music"
$ git checkout -
$ git mv rap music
$ git commit --author "Lauren <lauren>" -m "rename rap to music"
$ git merge -m "combine rap and country to music" rename-country

From first glance, it doesn’t appear to work; git doesn’t like this either.

CONFLICT (rename/rename): Rename rap->music in HEAD. Rename country->music in rename-country
Auto-merging music
Automatic merge failed; fix conflicts and then commit the result.

However, there is good news: we can definitely fix this. All we need to do is concatenate the files we created.

$ git cat-file --filters HEAD:music >music
$ git cat-file --filters rename-country:music >>music
$ git add music
$ git merge --continue

Now, after saving our merge message, let’s check out the blame on our new music file.

$ git blame music
^90ad259 rap     (Adi    2020-05-16 19:55:54 -0500 1) Drake
43b0617f rap     (Brad   2020-05-16 19:59:20 -0500 2) Lil Wayne
92ba9683 rap     (Casey  2020-05-16 19:59:55 -0500 3) Kanye West
8d8c3573 country (Dallas 2020-05-16 20:07:06 -0500 4) George Strait
a4261c52 country (Eric   2020-05-16 20:07:34 -0500 5) Tim McGraw
224ceb36 country (Gabe   2020-05-16 20:08:02 -0500 6) Kenny Chesney

Great! This is what we wanted to preserve.

An important note: you’ll want to make sure that you don’t change the original contents of the files in this step. Otherwise, git will have to use its “similar file” logic instead of its “rename” logic. The similar file logic is more complex, and it’s also harder to predict what the resulting outcome will look like.
So, since this approach worked to merge two files, we should just be able to follow the same logic to merge all three into one, right?

(Make sure to delete the branches we created earlier before completing this!)

$ git reset --hard premerge
$ git checkout -b rename-rap
$ git mv rap music
$ git commit --author "Lauren <lauren>" -m "rename rap to music"
$ git checkout -
$ git checkout -b rename-country
$ git mv country music
$ git commit --author "Lauren <lauren>" -m "rename country to music"
$ git checkout -
$ git checkout -b rename-alternative
$ git mv alternative music
$ git commit --author "Lauren <lauren>" -n "rename alternative to music"
$ git checkout -
$ git merge rename-rap rename-country rename-alternative

Nope! Our result indicates that we are wrong … yet again.

Fast-forwarding to: rename-rap
Trying simple merge with rename-country
Simple merge did not work, trying automatic merge.
Added music in both, but differently.
fatal: unable to read blob object e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
error: Could not stat : No such file or directory
ERROR: content conflict in music
fatal: merge program failed
Automated merge did not work.
Should not be doing an octopus.
Merge with strategy octopus failed.

Error

It blew up, but why? You may notice the new term octopus here. As the git-merge documentation says, octopus is the default strategy used by git for merges with more than one branch. However, octopus can only be used in cases where there are no conflicts (hence the “should not be doing an octopus” message above).

See Also:  How and Why to Containerize Your Development

Trial Four

So from here, let’s try a more complex strategy that will also preserve our commit history. We still have our 3 branches with the renamed music file, and we’ll still need them for the upcoming steps. In this attempt, let’s create the merge manually and then have git create the tree that would be created with a successful merge.

$ git reset --hard
$ cat rap country alternative >music
$ git add music
$ git rm rap country alternative
$ git write-tree

You may notice the command write-tree. More information about write-tree can be found in the git documentation, but basically, this command creates the tree that would have been created if a commit was done at this point. The command returns a hash that we’ll use in the next step, where we’ll create the merge that we want.

$ git commit-tree <tree-hash> -p HEAD -p rename-rap -p rename-country -p rename-alternative -m "combine rap, country, and alternative into music"

So, what exactly are we telling git to do here? Well, we’re telling it to …

  1. Start with the files in HEAD (in this case, the music file we just created from the cat command),
  2. Apply the git history, metadata, etc. from rename-rap’s music file onto the HEAD file,
  3. Do the same for rename-country’s music file,
  4. And finally, again for rename-alternative’s music file.

The command from above will also return a hash that we will use in the final step.

Now, we can perform our merge.

$ git merge <commit-hash>
Updating 26e254a..b656522
Fast-forward
 alternative | 3 ---
 country     | 3 ---
 music       | 9 +++++++++
 rap         | 3 ---
 4 files changed, 9 insertions(+), 9 deletions(-)
 delete mode 100644 alternative
 delete mode 100644 country
 create mode 100644 music
 delete mode 100644 rap
$ git blame music
^90ad259 rap         (Adi    2020-05-16 19:55:54 -0500 1) Drake
43b0617f rap         (Brad   2020-05-16 19:59:20 -0500 2) Lil Wayne
92ba9683 rap         (Casey  2020-05-16 19:59:55 -0500 3) Kanye West
8d8c3573 country     (Dallas 2020-05-16 20:07:06 -0500 4) George Strait
a4261c52 country     (Eric   2020-05-16 20:07:34 -0500 5) Tim McGraw
224ceb36 country     (Gabe   2020-05-16 20:08:02 -0500 6) Kenny Chesney
05ad5847 alternative (Haaris 2020-05-16 20:15:13 -0500 7) Red Hot Chili Peppers
d41478b5 alternative (Jaime  2020-05-16 20:15:40 -0500 8) Foo Fighters
26e254a7 alternative (Keith  2020-05-16 20:16:05 -0500 9) Green Day

Success!

Finally, after four tries, success! We were able to merge three files into one and maintain all the history from the original files.

I suppose we could also accomplish the three or more file merge by merging two files and then merging the next file into that file and so forth. However, this method gives us a clean history while still successfully merging multiple files.

$ git log --pretty=oneline
b656522165b7b447723b17417d25e37d5a19c829 (HEAD -> master) combine rap, country, and alternative into music
ede97e4387b6248218651a08ab3ab1138b943baf (rename-alternative) rename alternative to music
28d6fc2cae3035e782bb5eff9100b690ea6418b3 (rename-country) rename country to music
c65626c1e5b31e20cc3854bd04ac05ee1a47c6b7 (rename-rap) rename rap to music
26e254a71ffd594d622d4e34d448ea286ad9e939 (tag: premerge) add Green Day
d41478b5c9d40e27c530b1c57ca21e79af67d143 add Foo Fighters
05ad58479fb3787bdfc502b9db182a255237dc51 create alternative
224ceb3663fa6155ee67d0cd969b009d7df7c4b5 add Kenny Chesney
a4261c5292860edcdb89393b6cc412e465c38b32 add Tim McGraw
8d8c35732f9b4730ce1fc83547e0d13c1249104d create country
92ba9683f09226e249e0965c1801ccfba292e5e7 add Kanye West
43b0617f80f58fca4b8010ec928d059cacc22ae7 add Lil Wayne
90ad259ded8bff5046a81226abc75dc1c05c168b create rap

Refactoring can be a long and tedious process, and it’s easy to get blamed for just moving code written by others around in circles. Hopefully, this will help you to avoid that and will make life a little easier for those of you currently working on a refactor.

References:

0 0 vote
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments