Why is git objects taking up a lot of space? I only want a specific file

Question

I download and update the https://github.com/gmatuz/inthewilddb repository using the following commands:

git clone https://github.com/gmatuz/inthewilddb ./inthewilddb/
git -C ./inthewilddb/ pull

The problem is that from all of the repo, I only need the inthewild.db file which is about 90 megabytes, but when I run the clone it downloads about 5 gigabytes. It turns out that the .git/objects/ directory takes so much space after downloading.

Since I don't use git very often, I'm not sure why it takes up so much space. Can someone explain why this is the case?

Do I understand correctly that the concept of git itself does not involve the possibility of working with a specific file; in other words, you can't download one file and "follow" it, because here we work with directories? Maybe there is some way to download and keep track of updates only for the required file?

Of course I can use curl instead of git and just download the file I need; but then I have to download it each time instead of only when it's updated.

`git -C ./inthewilddb/ log --all` to see what is there. Most probably the binary file is committed on every update. Git stores all commits. — phd, Apr 12 '22 at 14:45
Please see [ask] and revise your post title to ask a clear, specific question in sentence format. — isherwood, Apr 12 '22 at 14:48
This recent question is very closely related: https://stackoverflow.com/q/71831522/157957 — IMSoP, Apr 12 '22 at 14:59
You can, theoretically at least, use the newfangled "promisor remote" *partial clone* feature to get what you want. It's not set up for normal people to use yet, though. You're probably best off just using `git archive` or curl as you suggest here, with some way of knowing when to obtain an updated version. — torek, Apr 13 '22 at 03:42

score 2 · Answer 1 · answered Apr 12 '22 at 14:47

2

This database file is a binary file: https://github.com/gmatuz/inthewilddb/commits/master/inthewild.db.

For edits to binary files Git stores one full version for each committed edit. Each file corresponding to each version is stored in your clone.

answered Apr 12 '22 at 14:47

TheIceBear

2,912
9
23

Why is git objects taking up a lot of space? I only want a specific file

1 Answers1