'loose object file': bad file descriptor while doing git pull

Question

I am using BitBucket. I am trying to pull master branch,using git pull origin master but I am getting errors:

I see this was asked as similar issue on SO, so I tried following methods, but it didnt worked.

git config --global pack.packSizeLimit 50m
git config --global pack.windowMemory 50m
git config --global core.compression 9

Another method I tried

2.    git gc

I tried to change buffer limit also it didnot worked.
```
git config --global http.postBuffer 524288000
```

Why is this issue coming? Currently I am at master branch.

That's some sort of weird internal error having to do with an `fsync` system call that your local Git ran. "Bad file descriptor" indicates that your local Git ran `fsync` on an invalid file descriptor. This is certainly a bug, and not something that any of the operations you ran will fix. You can, however, turn it *off* by running `git config core.fsyncObjectFiles false`. This may let you proceed in spite of whatever this bug is. — torek, Jun 03 '21 at 11:42
Note that `core.fsyncObjectFiles` *defaults* to `false`, so you must have something that has set it to `true`. — torek, Jun 03 '21 at 11:46
What's the result of `git config --get --bool core.fsyncObjectFiles`? — torek, Jun 04 '21 at 08:09
That means you didn't configure `core.fsyncObjectFiles` to `false`. It should *default* to false though. What Git build are you using? Perhaps someone changed the source code. — torek, Jun 04 '21 at 08:39

score 4 · Accepted Answer · answered Jun 14 '21 at 07:20

As per comments, you can run:

git config core.fsyncObjectFiles false

to force Git to stop calling fsync for objects in this particular repository database. It seems odd that this should make any difference though, as the source code for Git has false as the default; something in your particular Git installation must have changed this to true. It may be worth investigating what changed it to true and why. The code to call fsync at all was new in Git 1.6.0 but the default has been false ever since it was introduced in commit _{^{aafe9fbaf4f1d1f27a6f6e3eb3e246fff81240ef}}. Whoever set it to true on your system must have had some reason for doing that.

VonC · Answer 2 · 2022-06-05T13:37:20.363

Note that since Git 2.36 (Q2 2022), core.fsyncObjectFiles is deprecated, and replaced with two new configuration variables, core.fsync and core.fsyncMethod.

See commit b9f5d03 (15 Mar 2022), and commit ba95e96, commit 844a8ad, commit 020406e, commit abf38ab, commit 19d3f22 (10 Mar 2022) by Neeraj Singh (neerajsi-msft).
^{(Merged by Junio C Hamano -- gitster -- in commit eb804cd, 25 Mar 2022)}

core.fsyncmethod: add writeout-only mode

^{Signed-off-by: Neeraj Singh}

This commit introduces the core.fsyncMethod configuration knob, which can currently be set to fsync or writeout-only.

The new writeout-only mode attempts to tell the operating system to flush its in-memory page cache to the storage hardware without issuing a CACHE_FLUSH command to the storage controller.

Writeout-only fsync is significantly faster than a vanilla fsync on common hardware, since data is written to a disk-side cache rather than all the way to a durable medium.
Later changes in this patch series will take advantage of this primitive to implement batching of hardware flushes.

And:

core.fsync: introduce granular fsync control infrastructure

^{Helped-by: Patrick Steinhardt}
^{Signed-off-by: Neeraj Singh}

This commit introduces the infrastructure for the core.fsync configuration knob.
The repository components we want to sync are identified by flags so that we can turn on or off syncing for specific components.

If core.fsyncObjectFiles is set and the core.fsync configuration also includes FSYNC_COMPONENT_LOOSE_OBJECT, we will fsync any loose objects.
This picks the strictest data integrity behavior if core.fsync and core.fsyncObjectFiles are set to conflicting values.

The error message is (See commit f12f3b9 (30 Mar 2022), and commit e5ec440 (29 Mar 2022) by Neeraj Singh (neerajsi-msft).
^{(Merged by Junio C Hamano -- gitster -- in commit 27dd460, 04 Apr 2022)}
^{Reported-by: Jiang Xin}
^{Signed-off-by: Neeraj Singh})

Warning: core.fsyncObjectFiles is deprecated; use core.fsync instead

So:

git config now includes in its man page:

core.fsync

A comma-separated list of components of the repository that should be hardened via the core.fsyncMethod when created or modified.

You can disable hardening of any component by prefixing it with a '-'.

Items that are not hardened may be lost in the event of an unclean system shutdown. Unless you have special requirements, it is recommended that you leave this option empty or pick one of committed, added, or all.

When this configuration is encountered, the set of components starts with the platform default value, disabled components are removed, and additional components are added. none resets the state so that the platform default is ignored.

The empty string resets the fsync configuration to the platform default. The default on most platforms is equivalent to core.fsync=committed,-loose-object, which has good performance, but risks losing recent work in the event of an unclean system shutdown.

none clears the set of fsynced components.

loose-object hardens objects added to the repo in loose-object form.

pack hardens objects added to the repo in packfile form.

pack-metadata hardens packfile bitmaps and indexes.

commit-graph hardens the commit graph file.

index hardens the index when it is modified.

objects is an aggregate option that is equivalent to loose-object,pack.

derived-metadata is an aggregate option that is equivalent to pack-metadata,commit-graph.

committed is an aggregate option that is currently equivalent to objects. This mode sacrifices some performance to ensure that work that is committed to the repository with git commit or similar commands is hardened.

added is an aggregate option that is currently equivalent to committed,index. This mode sacrifices additional performance to ensure that the results of commands like git add and similar operations are hardened.

all is an aggregate option that syncs all individual components above.

It comes with one more option:

With Git 2.36 (Q2 2022), updates to refs traditionally weren't fsync'ed, but we can configure using core.fsync variable to do so.

See commit bc22d84 (11 Mar 2022) by Patrick Steinhardt (pks-t).
See commit 0099792 (15 Mar 2022) by Junio C Hamano (gitster).
^{(Merged by Junio C Hamano -- gitster -- in commit 6e1a895, 25 Mar 2022)}

core.fsync: new option to harden references

^{Signed-off-by: Patrick Steinhardt}

When writing both loose and packed references to disk we first create a lockfile, write the updated values into that lockfile, and on commit we rename the file into place.
According to filesystem developers, this behaviour is broken because applications should always sync data to disk before doing the final rename to ensure data consistency (here, there (What are the crash guarantees of overwrite-by-rename), and in this documentation (see auto_da_alloc).
If applications fail to do this correctly, a hard crash of the machine can easily result in corrupted on-disk data.

This kind of corruption can in fact be easily observed with Git when the machine hard-resets shortly after writing references to disk.
On machines with ext4, this will likely lead to the "empty files" problem: the file has been renamed, but its data has not been synced to disk.
The result is that the reference is corrupt, and in the worst case this can lead to data loss.

Implement a new option to harden references so that users and admins can avoid this scenario by syncing locked loose and packed references to disk before we rename them into place.

git config now includes in its man page:

reference hardens references modified in the repo.

With Git 2.37 (Q3 2022), introduce a filesystem-dependent mechanism to optimize the way the bits for many loose object files are ensured to hit the disk platter.

git config --global core.fsyncMethod batch

See commit 112a9fe, commit 5dccd91, commit d42bab4, commit fb2d0db, commit 8a94d83, commit 425d290, commit 23a3a30, commit b4a0c6d, commit 4d33e2b, commit c0f4752, commit 2c23d1b, commit 897c9e2 (04 Apr 2022) by Neeraj Singh (neerajsi-msft).
See commit fca8598 (06 Apr 2022) by Junio C Hamano (gitster).
^{(Merged by Junio C Hamano -- gitster -- in commit 83937e9, 03 Jun 2022)}

core.fsyncmethod: batched disk flushes for loose-objects

^{Signed-off-by: Neeraj Singh}

When adding many objects to a repo with core.fsync=loose-object, the cost of fsync'ing each object file can become prohibitive.

One major source of the cost of fsync is the implied flush of the hardware writeback cache within the disk drive.

This commit introduces a new core.fsyncMethod=batch option that batches up hardware flushes.

It hooks into the bulk-checkin odb-transaction functionality, takes advantage of tmp-objdir, and uses the writeout-only support code.

When the new mode is enabled, we do the following for each new object:

1a. Create the object in a tmp-objdir.

2a. Issue a pagecache writeback request and wait for it to complete.

At the end of the entire transaction when unplugging bulk checkin:

1b. Issue an fsync against a dummy file to flush the log and hardware writeback cache, which should by now have seen the tmp-objdir writes.

2b. Rename all of the tmp-objdir files to their final names.

3b. When updating the index and/or refs, we assume that Git will issue another fsync internal to that operation. This is not the default today, but the user now has the option of syncing the index and there is a separate patch series to implement syncing of refs.

On a filesystem with a singular journal that is updated during name operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS we would expect the fsync to trigger a journal writeout so that this sequence is enough to ensure that the user's data is durable by the time the git command returns.
This sequence also ensures that no object files appear in the main object store unless they are fsync-durable.

Batch mode is only enabled if core.fsync includes loose-objects.
If the legacy core.fsyncObjectFiles setting is enabled, but core.fsync does not include loose-objects, we will use file-by-file fsyncing.

In step (1a) of the sequence, the tmp-objdir is created lazily to avoid work if no loose objects are ever added to the ODB.
We use a tmp-objdir to maintain the invariant that no loose-objects are visible in the main ODB unless they are properly fsync-durable.
This is important since future ODB operations that try to create an object with specific contents will silently drop the new data if an object with the target hash exists without checking that the loose-object contents match the hash.
Only a full 'git-fsck'^(man) would restore the ODB to a functional state where dataloss doesnt occur.

In step (1b) of the sequence, we issue a fsync against a dummy file created specifically for the purpose.
This method has a little higher cost than using one of the input object files, but makes adding new callers of this mechanism easier, since we don't need to figure out which object file is "last" or risk sharing violations by caching the fd of the last object file.

Performance numbers:

Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
Windows - Same host as Linux, a preview version of Windows 11.

Adding 500 files to the repo with 'git add'^(man) Times reported in seconds.
object file syncing | Linux | Mac   | Windows 
--------------------|-------|-------|--------
    disabled        | 0.06  |  0.35 | 0.61
       fsync        | 1.88  | 11.18 | 2.47
       batch        | 0.15  |  0.41 | 1.53

git config now includes in its man page:

batch enables a mode that uses writeout-only flushes to stage multiple updates in the disk writeback cache and then does a single full fsync of a dummy file to trigger the disk cache flush at the end of the operation.

Currently batch mode only applies to loose-object files.

Other repository data is made durable as if fsync was specified.
This mode is expected to be as safe as fsync on macOS for repos stored on HFS+ or APFS filesystems and on Windows for repos stored on NTFS or ReFS filesystems.

'loose object file': bad file descriptor while doing git pull

2 Answers2

`core.fsyncmethod`: add writeout-only mode

`core.fsync`: introduce granular `fsync` control infrastructure

`core.fsync`

`core.fsync`: new option to harden references

`core.fsyncmethod`: batched disk flushes for loose-objects

Performance numbers:

Linked

'loose object file': bad file descriptor while doing git pull

2 Answers2

core.fsyncmethod: add writeout-only mode

core.fsync: introduce granular fsync control infrastructure

core.fsync

core.fsync: new option to harden references

core.fsyncmethod: batched disk flushes for loose-objects

Performance numbers:

Linked

`core.fsyncmethod`: add writeout-only mode

`core.fsync`: introduce granular `fsync` control infrastructure

`core.fsync`

`core.fsync`: new option to harden references

`core.fsyncmethod`: batched disk flushes for loose-objects