// Technology

How Bale keeps
big files small.

No magic — just one idea applied well: store every unique piece of your files once, and only ever move the pieces that actually changed. Here's how that works, in plain English.

Why big files break Git

Git was built for source code — thousands of small text files. Drop in a 2 GB design file, model, or dataset and every version, every branch, and every teammate's clone pays the full size again. Storage balloons and clones crawl. Bale fixes that by never storing or sending the same bytes twice.

// The big idea

Three steps, one payoff.

1 Chunking

Files become pieces.

Bale slices each file into smaller pieces, and the cuts follow the file's own contents — like splitting a book at paragraph breaks instead of every 100 words. Edit one line and the surrounding pieces stay identical; only the piece you touched is new.

Content-defined chunking — where Bale cuts depends on what's inside, so small edits stay small.

2 Fingerprints

Each piece gets an ID.

Every piece is run through a fingerprint that's derived from its contents. Identical pieces get identical fingerprints — so Bale recognizes a piece it has seen before and keeps just one copy, whether it shows up in another file, an older version, or a different project.

Content-addressed storage (CAS) — pieces are filed by what they contain, not where they sit. Duplicates collapse automatically.

3 Sync

Only the difference moves.

Because every piece has a fingerprint, Bale knows exactly which pieces the other side already has. A push or pull moves only the genuinely new pieces — so a small change to a huge file goes over the network in kilobytes, not gigabytes.

Deduplication — the same bytes are never stored or sent twice, across files, versions, and repos.

// The whole picture

Unique pieces, shared by everyone.

Your file

A 2 GB asset is sliced into fingerprinted pieces on your machine.

The store (CAS)

Each unique piece is kept exactly once. Duplicates — across files, versions, repos — collapse into a single copy.

Everyone else

Teammates and CI fetch only the pieces they're missing — and reuse a local cache for the rest.

That local cache is why it feels fast: switching branches or checking out an old version usually means reassembling files from pieces you already have — no download, no waiting.

// The engine

Built on Xet.

Bale's piece-and-fingerprint engine is built on Xet, an open-source format created by Hugging Face — home to the world's largest collection of AI models and datasets. It's the same approach proven on files measured in terabytes, now wired natively into the Git commands your team already uses.

See it for yourself.

Clone the demo repo and watch a big file move in kilobytes — or grab the client and try it on your own.

Install the client Try the live demo