Convert git repository file encoding
You can do this with git filter-branch
. The idea is that you have to change the encoding of the files in every commit, rewriting each commit as you go.
First, write a script that changes the encoding of every file in the repository. It could look like this:
#!/bin/sh
find . -type f -print | while read f; do
mv -i "$f" "$f.recode.$$"
iconv -f iso-8859-1 -t utf-8 < "$f.recode.$$" > "$f"
rm -f "$f.recode.$$"
done
Then use git filter-branch
to run this script over and over again, once per commit:
git filter-branch --tree-filter /tmp/recode-all-files HEAD
where /tmp/recode-all-files
is the above script.
Right after the repository is freshly upgraded from CVS, you probably have just one branch in git with a linear history back to the beginning. If you have several branches, you may need to enhance the git filter-branch
command to edit all the commits.
Bertram Nudelbach
Updated on July 14, 2022Comments
-
Bertram Nudelbach almost 2 years
I have a large CVS repository containing files in
ISO-8859-1
and want to convert this to git.Sure I can configure git to use
ISO-8859-1
for encoding, but I would like to have it inutf8
.Now with tools such as
iconv
orrecode
I can convert the encoding for the files in my working tree. I could commit this with a message likeconverted encoding
.My question now is, is there a possibility to convert the complete history? Either when converting from cvs to git or afterwards. My idea would be to write a script that reads each commit in the git repository and to convert it to
utf8
and to commit it in a new git repository.Is this possible (I am unsure about the hash codes and how to walk through the commits, branches and tags). Or is there a tool that can handle something like this?
-
KingCrunch about 12 yearsYes, you can rewrite the history, but probably you shouldn't: You should never rewrite a repository, that you already pushed somewhere. My opinion is:
iconv
and a normal commit is the way to go. -
Bertram Nudelbach about 12 yearsOkay, thanks @KingCrunch. But since I newly create the git repository it is pushed to nowhere. And also I would accept to create a second repository with the
utf8
encoding based on the history of the first. Which is basically the same instead that I wouldn't modify the existing repo.
-