Removing sensitive data from a repository in Github

作死把API key传到Public Repository里了,然后邮箱里就收到一封邮件告诉我apikey exposed

然后为了删掉这条commit我百度又谷歌,但直接搜删除commit搜不出什么(都不能直接删除Initial Commit,佛了,我在Initial Commit就上传这API了,但又懒得删除这个Repository)。换了个关键词删除敏感信息,再次尝试。


从今天开始,告别Command,拥抱PowerShell😂

移除Git敏感文件问题记录

从Git中移除文件*

解决方法:[5]

1
2
3
4
5
6
7
git rm -r -n --cached "FILE OR FOLDER TO BE REMOVED" #-n:加上这个参数,执行命令时,是不会删除任何文件,而是展示此命令要删除的文件列表预览。 例子:git rm -r --cached  "bin/"

git rm -r --cached "FILE OR FOLDER TO BE REMOVED" #最终执行命令.

git commit -m "TYPE COMMIT MESSAGE HERE" #提交

git push origin master #提交到远程服务器

参数解释:

-n –dry-run

Don’t actually remove any file(s). Instead, just show if they exist in the index and would otherwise be removed by the command.

-r

Allow recursive removal when a leading directory name is given.

–cached

Use this option to unstage and remove paths only from the index. Working tree files, whether modified or not, will be left alone.

补充

针对.gitignore文件,再贴两个网站[6]、[7] 。

我在Android Studio中添加了我要忽略的文件,但好像没有用处……把搜到的资料贴在这,以备不时之需。

BFG

BFG使用方法[4]

First clone a fresh copy of your repo, using the --mirror flag:

1
$ git clone --mirror git://example.com/some-big-repo.git

This is a bare repo, which means your normal files won’t be visible, but it is a full copy of the Git database of your repository, and at this point you should make a backup of it to ensure you don’t lose anything.

Now you can run the BFG to clean your repository up:

1
$ java -jar bfg.jar --strip-blobs-bigger-than 100M some-big-repo.git

The BFG will update your commits and all branches and tags so they are clean, but it doesn’t physically delete the unwanted stuff. Examine the repo to make sure your history has been updated, and then use the standard git gc command to strip out the unwanted dirty data, which Git will now recognise as surplus to requirements:

1
2
$ cd some-big-repo.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive

Finally, once you’re happy with the updated state of your repo, push it back up (note that because your clone command used the --mirror flag, this push will update all refs on your remote server):

1
$ git push

At this point, you’re ready for everyone to ditch their old copies of the repo and do fresh clones of the nice, new pristine data. It’s best to delete all old clones, as they’ll have dirty history that you don’t want to risk pushing back into your newly cleaned repo.

到这里被筛选出来的历史Commit文件应该都已经被一串数字字符替换了(新从git版本控制中移除的文件似乎不会被BFG处理),但使用BFG会给Git的使用带来麻烦,使之Update和Merge都出现问题:

1
2
3
4
5
6
7
Could Not Merge origin/master
You have not concluded your merge (MERGE_HEAD exists).
Please, commit your changes before you merge.

Error merging
You have not concluded your merge (MERGE_HEAD exists).
Please, commit your changes before you merge.

总之是些分支合并问题,我这里是通过允许合并Unrelated Histories,并在IDE中resolve冲突后再次Commit来强行解决分支合并冲突。(这应该不是正常解决的方法吧😂)

使用BFG实验结果

最新Commit之前的所有Commit内容,被一行不知道是啥的东西替换。

Commit记录仍然保留

Using filter-branch清除Git中敏感文件的所有历史和TAG*

通过本方法我得到了我想要的结果,而且本方法不会引起冲突。

1
2
3
$ git filter-branch --force --index-filter "git rm --cached --ignore-unmatch PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA" --prune-empty --tag-name-filter cat -- --all

$ git push origin --force --all

基本上使用这两行命令就可以解决问题。详细步骤参见[2](中文),[1](英文)。

Update时出现Error

Error merging: refusing to merge unrelated histories

原因:因为两个分支没有取得关系[3]

解决方案

1
$ git merge master --allow-unrelated-histories

后记

到这我终于让GitGuardian变绿了。这些应该就是我今天遇到的所有问题和其对应的解决方案了。

我又在这些问题上浪费了一天(应该是一下午,包括了我追剧和吃饭的时间)。

在解决这些问题的过程中,我老是想,我这学期的“软件配置管理”到底学了没有;因为这些问题有的是上机报告涉及到的,有的虽然是没涉及到的,但也是版本控制中常见的问题。

课程使用的是SVN,我使用的是Git;这次的问题让我注意到了几点SVN和Git之间的区别(虽然只是表面上形式上的差别),比如SVN依赖于服务器上的版本库,但Git似乎就没有这种依赖,TortoiseSVN的GUI似乎比IDE中的Git插件提供更多选项……但相同的是,对于Merge问题我都是乱操作一通糊弄过去的😂

参考资料

  1. Removing sensitive data from a repository
  2. github删除敏感信息
  3. 解决Git中fatal: refusing to merge unrelated histories
  4. BFG Repo-Cleaner
  5. git如何移除某文件夹的版本控制
  6. Git忽略提交规则 - .gitignore配置运维总结
  7. 忽略特殊文件