1

I am trying to use jGit to get all commits in a repository, not just the ones I can reach via heads or tags, but all the ones that were not yet garbage collected. Is there a way to do this with jGit in an efficient manner?

Update to better describe the actual use-case

I am working on a FUSE based filesystem which provides a filesystem-view of the Git history, see https://github.com/centic9/JGitFS/ for a first version (Linux/Mac only).

With this I am providing "virtual" sub-directories for commits, i.e. I am creating a directory structure like the following

/commit
   00
     abcd..
     bcde..
   ae
     bdas..

And beneath the commit-id the virtual filesystem provides the source-files "as-of" that commit.

Refs/Tags are provided as symbolic links to the actual commit the HEAD of that ref/tag:

/branch
   master -> ../commit/00/abcd...
   bugfix -> ../commit/ae/bdas...
/tag
   version_1 -> ../commit/00/bcde...

In order to make this filesystem fast, I need a way to iterate all commits in a repository very quickly. Looking at each tag and ref separately as I do now is sub-optimal as this way I look at the same commits many times if refs share a common history (which they do almost always!).

Preferably I would like to get a simple list of all commits that are still available, not just ones that are part of a branch, this way you can even look at versions that are not reachable any more by refs/tags.

centic
  • 15,565
  • 9
  • 68
  • 125
  • possible duplicate of [Git - get all commits and blobs they created](http://stackoverflow.com/questions/1314950/git-get-all-commits-and-blobs-they-created) – squadette Jun 18 '13 at 21:36
  • Thanks for the link, I am lokking for a solution using jgit, not commandline tools, though. – centic Jun 20 '13 at 05:27

1 Answers1

2

If finding commits that are referenced via reflog is enough, use ReflogCommand (I recommend using JGit 3.0 once it's released, which should be on 2013-06-26).

If you want to also find commits that are not referenced by reflog anymore, you need something like git fsck. JGit does not yet have an implementation of that. It does have an implementation of git gc though, which also has to find unreferenced objects.

See the source code of GC.java in the JGit repository. What you could do is to call GC#repack(), after which all referenced objects should be in pack files. Then you could do something similar to GC#prune, which find the loose objects that are unreferenced. Please note that GC is currently internal (not API), so don't rely on it staying like this.

robinst
  • 30,027
  • 10
  • 102
  • 108
  • hmm, for ReflogCommand, I still need to specify some "startRef()" to parse anything other than HEAD, so that is not doing it much better than doing a RevWalk from the head of each branch/tag. I will look at GC.java, however that sound quite complicated :( – centic Jun 24 '13 at 16:18
  • 1
    Walking the reflog and walking the commits of a branch is not the same, as the reflog also includes commits that were discarded and are no longer part of the history. By the way, it would help if you described the use case you try to solve in your question, maybe there is a better solution. – robinst Jun 24 '13 at 17:37
  • yeah, you are right, I have described what I am actually trying to do here – centic Jun 25 '13 at 05:57
  • I have now tried to use something similar to what GC.java does by looking at the object directory directly and reading the ids from the packfiles, however as I am only interested in actual commits and not other object-types, I have to actually read the type of each object, which makes it aproximately 10 times slower than reading the commit log of each ref, so it's not a viable solution performance-wise... – centic Jun 25 '13 at 20:11
  • This limitation is inherent in the way Git stores data, there is no fast way here. See also [Git Internals - Git Objects](http://git-scm.com/book/en/Git-Internals-Git-Objects). – robinst Jun 26 '13 at 09:29
  • Ok, thanks for the help, I ended up using some advanced features of JGit and some internal tuning to make the access quite fast, the limitation with not being able to access unreferenced commits remains... – centic Jun 27 '13 at 12:25