Content-defined chunking
Files are split into content-defined chunks and deduplicated chunk-by-chunk — even across repositories. Register the same content twice and the bytes are stored once.
Bale for Git splits large files into content-defined chunks and stores them in a content-addressed server — so only the bytes that actually changed ever move. Chunk-level dedup, a native Git filter driver, and a server you can host yourself.
Bale's already enabled on our demo Gitea forge at demo.baleforgit.com — clone a repo and push.
$ git-bale track '*.psd' '*.bin'
tracking 2 patterns · filter.bale wired into .gitattributes
$ git add hero-scene.psd # 2.4 GB
chunked → 18,204 chunks staged
$ git commit -m "new hero composite"
[main 9f3a1c0] new hero composite
$ git push
bale ▸ dedup against server… 94% already present
bale ▸ uploading 1,140 new chunks (148 MB)
✓ pushed. 2.4 GB on disk → 148 MB on the wire
Only changed chunks travel — ~94% deduped
Chunk-level
dedup, not whole-file
Zero-network
cached-checkout hot path
Native filter
speaks Git's own protocol
Self-hosted
your storage, your forge
// What's in the bale
Git LFS treats a binary as one opaque blob. Bale looks inside it — chunking by content so a one-pixel edit re-uploads kilobytes, not gigabytes.
Files are split into content-defined chunks and deduplicated chunk-by-chunk — even across repositories. Register the same content twice and the bytes are stored once.
git-bale implements Git's long-running filter protocol directly — no wrapper around Git, no shelling out. Install it per-user or per-repo and it just works inside git add / checkout.
At git push, a pre-push hook dedups against the server and uploads just the chunks it's missing — never the whole file. git-bale gc later reclaims staging for changes you abandon.
A cached manifest plus a shared chunk cache means a warm checkout makes no reconstruction calls to the server. Switch branches at local-disk speed.
baleforgit-server is a single binary with swappable backends — filesystem or S3 for storage, SQLite or Postgres for metadata — plus pluggable auth and per-owner quotas.
No shared secrets. Your forge owns users and mints short-lived JWTs; the server answers one access callback. Implement the Bale forge protocol and clients auto-resolve the server.
// The flow
Bale lives inside the commands you already type. There's no separate tool to remember and no special syntax — just point it at the file patterns you want baled.
git-bale track '*.bin' writes a filter rule into .gitattributes.
As you git add, the clean filter splits each tracked file into chunks under .git/bale/staging/.
The pre-push hook dedups against the server and uploads just the chunks it's missing.
# one-time, per repo
$ git-bale install --local
$ git-bale track '*.bin'
$ git add .gitattributes && git commit -m 'enable bale'
# from here on, plain git
$ git add render.bin && git commit -m 'add render'
$ git push
add
chunks the file
push
dedup + upload
gc
reclaims orphans
// No checkout required
Browse a tag, an old commit, or another branch as a read-only directory — without ever checking it out. mount-diff goes further and exposes both sides of a diff at once, with the revision folded into each filename. Pipe it straight into Beyond Compare, Meld, or vimdiff.
git diff — revs, -- paths, custom labels.
$ git-bale mount-diff main feature-x --mount /tmp/diff
mounted · Ctrl-C to unmount
$ ls /tmp/diff/src/
foo__main.rs foo__feature-x.rs
bar__feature-x.rs (added)
$ git-bale mount v1.0.0 --mount /tmp/v1
v1.0.0 tree, as committed — no checkout
$ git-bale init-local # or: init-local --shared
bale ▸ local mode on · store at .git/bale/store
$ git-bale track '*.bin'
$ git add big.bin && git commit -m "add"
bale ▸ 9,540 chunks written to store · stays on disk
$ git push
✓ pushed pointer commit · large-file data never left the machine
// No server required
Don't want to stand up a server? git-bale init-local keeps every chunk on disk. git add and checkout work with no network, and git push moves only the small pointer commits — your large-file data never leaves the machine.
Per-repo store
Objects live in .git/bale/store, reclaimed by gc. Perfect for one working copy — note it isn't carried by git clone.
Shared store
--shared puts the store at ~/bale-local, reused and deduped across every local repo on the machine. git-bale prune --shared compacts it.
// Run your own bale
baleforgit-server puts everything that touches storage, metadata, or authorization behind a pluggable backend — so the same server runs on a laptop's filesystem with SQLite, or an S3 bucket with Postgres, with auth that's static or delegated to your forge.
Storage backend
Swap the storage backend that holds your content-addressed chunks — on-disk blobs for a single box, or an S3 bucket for scale.
Metadata backend
Files, chunks, and per-owner / per-repo accounting live in the metadata backend — SQLite to get started, Postgres for multi-tenant scale. Soft quotas and usage endpoints included.
Auth backend
HS256 tokens with scope enforcement and per-repo read checks. Use static tokens, or delegate the user model to your forge over a single callback.
// Nothing to set up
We run a Bale-enabled Gitea instance at demo.baleforgit.com. Clone the sample repo, or sign up and create your own — your big files auto-resolve the server, no token wrangling, no server to stand up.
Curious how much you saved? Each repo's Settings → Bale page shows live usage and dedup stats.
$ git clone https://demo.baleforgit.com/baleforgit/demo.git
$ cd demo && git-bale install --local
# big files smudge straight from the demo CAS
$ git add scene.bin && git commit -m 'edit'
$ git push # only new chunks upload
# → see savings in Settings → Bale
// In the lineage of Git LFS
| Git LFS |
Bale for Git
|
|
|---|---|---|
| Dedup granularity | Whole file | Content-defined chunks |
| Small edit to a big file | Re-uploads the whole file | Uploads only changed chunks |
| Cross-repo reuse | Stored per repo | Shared chunk dedup |
| Browse a rev without checkout | — | Mount as a filesystem |
| Server | Self-host or hosted | Self-host: filesystem or S3 |
| No-server mode | — | Fully local: per-repo or shared store |
On-disk format and chunk-dedup algorithm derived from Hugging Face's open-source Xet project.
Grab the client for your platform, drop git-bale on your PATH, and tell it what to track.