How to quickly carve out a subdirectory from git and use it as a git submodule?
In this Git tutorial, you will learn about git submodules and the five simple steps you need to carve out a subdirectory from one git repository and reuse it as a git submodule.
Just recently, I developed a plugin for a large, open-source home automation platform called Home Assistant. My goal was to release the plugin as a standalone git repository so that it easily integrates with other peoples’ home automation workflows. Shipping the plugin as an isolated piece of code was a good choice, but during development, I needed the entire Home Assistant codebase to be wrapped around my plugin code for debugging, testing, etc.
Git submodules sounded like a perfect match to achieve code separation and -integration all at once, but I was overlooking one critical caveat: by default, git submodules is only capable of creating a link to the root folder of another git repository. In my case, however, the root folder wasn’t where my plugin’s source code was sitting. Instead, I wanted the submodule to carve out only one specific subfolder from my plugin repository.
On Stackoverflow, I found two solutions for my subfolder problem, but I wasn’t happy with either of them. One was proposing to create a symbolic link (aka symlink) from the root folder to where my source code was sitting. Since I was developing inside a Docker container, though, symlinks were incompatible. The other solution suggested to carve out a subfolder using git filter-branch, but I quickly decided not to go down that path. Git filter-branch can cause a lot of painful hick-ups and even git’s official documentation advises against using it.
Hence, I spent countless hours looking for other solutions and finally figured out a surprisingly simple one. It just takes the following five steps to carve out a subdirectory and use it as a git submodule:
- Make sure that you’re at the root folder of your master branch (or however your git repository’s main branch is called) and temporarily remove all files from the git working tree with the git command below. It is important to pass the --cached-flag to your git CLI command. The --cached flag is crucial because it tells git to only remove the files from its working tree and to not actually delete them.
~$ git rm -rf --cached ./
- Remain on the master branch and move files and folders around so that the source code sits at the root of the git repository. The file structure should now resemble the target file structure for your submodule, but once again, DO NOT DELETE ANY FILES!
- Add ONLY THE SOURCE CODE FILES back to the git working tree with the following git command:
~$ git add <filename>
- Now, with all the files moved around and added back to the git working tree, checkout a new branch off of the master branch (I’ll refer to this new branch as the “submodule” branch going forward)
~$ git checkout -b submodule
- Right after having checked out the submodule branch, switch back to the master branch, move all files into their original location, add ALL files back to the git working tree and commit your changes to the master branch.
~$ git checkout master & git add -A & git commit -m “<your commit message>”
DONE! Well, almost — We’ve put everything in place and we are now ready to actually create the git submodule. There is just one little tweak which we’ll make, compared to the default git submodule workflow: instead of pointing the git submodule to the master branch of our submodule repository, we’ll add a flag which points the submodule to the newly created submodule branch.
git submodule add -b submodule <git-repository-URI>
Remember, in the submodule branch, we’ve moved all source code files to the root folder such that the submodule will directly ingest these files into the parent codebase.
With this setup in place, you can now commit your code changes straight to the submodule branch, just as you’d otherwise commit them to a master, development, or whatever branch. Once the code is ready to be released, you merge the submodule branch back into the master branch without any headache. Git will automatically remember that files on both branches are sitting in different locations.
It’s as simple as that and while you might think that the above example sounds like a very specific use case, the description actually applies to any project where you need to integrate bits of code from one git repository into another.
Let me know whether you find this tutorial helpful and if you did, please give it a hand. Just in case you’re working with git submodules for the first time and run into troubles, here’s a very useful description for how to safely remove a (messed up) submodule from git