The Debate That Misses the Point
In early 2024, MinIO published a provocative article titled "Filesystem on Object Store is a Bad Idea". JuiceFS responded with their rebuttal, defending POSIX filesystems on object storage.
Both make valid points. Both miss the bigger picture. There's a third approach that sidesteps the entire FUSE debate: native ZFS integration with object storage via objbacker.io.
MinIO's Argument: FUSE Is Fundamentally Broken
MinIO's Position
MinIO argues that layering POSIX over object storage creates fundamental incompatibilities:
- Performance degradation — POSIX is "IOPS-centric, chatty, expensive and hard to scale"
- Semantic incompatibility — Object storage relies on atomic, immutable operations
- Data integrity risks — Uncommitted data can be lost during crashes
- Security gaps — POSIX permissions can't map to S3 IAM
To prove their point, MinIO benchmarked s3fs-fuse copying a 10GB file. The result?
Over 5 minutes with I/O errors, versus their native S3 API.
Here's the problem: they generalized an entire architectural approach based on ONE implementation—and they picked the worst one. s3fs-fuse is widely known to be rudimentary. It's a rookie mistake to dismiss an entire category based on its weakest example.
They also never showed their own mc cp timing for comparison. Classic strawman argument.
The Fundamental Flaw in MinIO's Argument
MinIO claims: "There is simply no need for a filesystem utility in the middle of MinIO and your application!"
This ignores a fundamental reality: Linux syscalls are POSIX.
read(), write(),
open(), stat(). These are POSIX syscalls. Until Linux provides native
S3 syscalls (it doesn't), applications need a POSIX interface. Period.
Not every application can be rewritten to use S3 SDKs. Consider:
- HPC workloads — Supercomputers and scientific computing run on Lustre, GPFS, BeeGFS—all POSIX
- Legacy applications — Millions of lines of code expecting file I/O
- Databases — PostgreSQL, MySQL, Oracle all use file I/O
- ML frameworks — PyTorch, TensorFlow checkpoint to files
- Analytics tools — Spark, Hadoop expect HDFS/POSIX interfaces
- Media workflows — Video editing, rendering pipelines use files
The world's fastest supercomputers—Frontier, Aurora, LUMI—all use POSIX filesystems. National labs, research institutions, and enterprises have decades of code built on POSIX I/O. Telling them to "just use S3" is disconnected from reality.
And what about decades of data management practices?
- Snapshots — Instant, space-efficient point-in-time copies
- Clones — Writable snapshots for dev/test
- Replication — Efficient block-level sync
- Compression — Transparent, algorithm-selectable
- Checksums — End-to-end data integrity
- Scrubbing — Proactive corruption detection
"But S3 has versioning!" Yes—at the object level, not the bucket level.
| Capability | S3 Versioning | ZFS Snapshots |
|---|---|---|
| Scope | Per-object | Entire filesystem (atomic) |
| Point-in-time consistency | No — each object versioned independently | Yes — all files consistent at snapshot moment |
| Rollback entire dataset | No — must restore objects individually | Yes — instant rollback |
| Space efficiency | Full copy per version | Copy-on-write (only deltas stored) |
| Writable clones | No | Yes — instant, space-efficient |
| Send/Receive replication | No | Yes — incremental block-level sync |
With S3 versioning, you can't say "restore my bucket to how it was at 3pm yesterday." You'd have to script through every object, find the right version, restore each one—and hope nothing was added or deleted in between. That's not data management. That's data archaeology.
Object storage has none of this natively. MinIO's answer? "Rewrite your applications." That's not a solution—it's an abdication.
JuiceFS's Rebuttal: Implementation Matters
JuiceFS's Position
JuiceFS correctly points out that s3fs-fuse is not a real filesystem—it's a protocol converter. Their approach:
- Separate metadata — Redis/TiKV for fast metadata operations
- Intelligent chunking — Optimized data splitting and caching
- POSIX compatibility — Built from the ground up, not bolted on
JuiceFS ran the fair comparison MinIO avoided:
| Method | 10GB Write Time | Notes |
|---|---|---|
| MinIO mc cp | 27.65s | Native S3 multipart upload |
| JuiceFS POSIX | 28.10s | FUSE + Redis metadata |
| s3fs-fuse | 3m 6s 6x slower | Temp file + upload |
JuiceFS proved that a well-implemented POSIX layer can match native S3 performance. But they're still using FUSE. They still need external metadata (Redis). They're still working around the fundamental architecture rather than solving it.
The Third Way: Native ZFS Integration
MayaNAS + objbacker.io
What if you didn't need FUSE at all? What if the filesystem natively understood object storage as a block device tier?
- No FUSE — objbacker.io is a native ZFS VDEV (kernel-level)
- No external metadata — ZFS special VDEV keeps metadata on SSD
- Already POSIX — ZFS is battle-tested POSIX, not retrofitted
- Hybrid architecture — Hot data on NVMe, cold data on object storage
MayaNAS doesn't put a filesystem on top of object storage. It extends ZFS to use object storage as a block device tier. The distinction is fundamental.
Why objbacker.io Is Different
┌─────────────────────────────────────────────────────────────────┐
│ APPLICATION LAYER │
│ (NFS/SMB Clients) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ZFS (Kernel) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ ARC Cache │ │ L2ARC SSD │ │ ZIL (SLOG on SSD) │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
│ │ │
│ ┌───────────────────────────┴───────────────────────────────┐ │
│ │ VDEV Layer │ │
│ │ ┌─────────────────┐ ┌─────────────────────────┐ │ │
│ │ │ Special VDEV │ │ objbacker.io VDEV │ │ │
│ │ │ (NVMe SSD) │ │ (Object Storage) │ │ │
│ │ │ │ │ │ │ │
│ │ │ • Metadata │ │ • 1MB blocks │ │ │
│ │ │ • Small blocks │ │ • Large files │ │ │
│ │ │ • Dedup tables │ │ • Cold data │ │ │
│ │ └────────┬────────┘ └────────────┬────────────┘ │ │
│ └───────────┼───────────────────────────────┼───────────────┘ │
└──────────────┼───────────────────────────────┼─────────────────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────────┐
│ Local NVMe │ │ S3/GCS/Azure │
│ SSD │ │ Object Storage │
└──────────────┘ └──────────────────┘
Key Architectural Differences
| Aspect | s3fs-fuse | JuiceFS | MayaNAS + objbacker.io |
|---|---|---|---|
| FUSE Required | Yes | Yes | No Native |
| Metadata Storage | Object storage | Redis/TiKV | ZFS Special VDEV (SSD) Integrated |
| External Dependencies | None | Redis/TiKV cluster | None Self-contained |
| POSIX Compliance | Partial | High (worked for it) | Full (it's ZFS) Native |
| Data Integrity | Limited | Good | ZFS checksums + CoW Best |
| Caching | Basic | Client-side | ARC + L2ARC (kernel) Optimized |
| Snapshots | No | Yes | ZFS native snapshots Instant |
The Metadata Problem—Solved by ZFS Architecture
MinIO's core criticism is valid: metadata operations on object storage are slow.
Every ls, every stat, every directory traversal becomes an HTTP request.
JuiceFS solved this by adding Redis. Now you need a Redis cluster. More infrastructure. More failure modes.
The ZFS special VDEV stores:
- All metadata — Directory entries, file attributes, extended attributes
- Small blocks — Files smaller than a configurable threshold (e.g., 64KB)
- Deduplication tables — If dedup is enabled
Only large data blocks (1MB recordsize) go to object storage via objbacker.io. This means a 10GB file becomes ~10,000 objects—but metadata operations hit local NVMe at microsecond latency.
Performance: Real Numbers
Understanding the Benchmark Methodology
MinIO's 10GB file copy test is equivalent to running:
fio --name=test --size=10G --numjobs=1 --rw=write --bs=1M --direct=1
That's a single-threaded sequential write. It's the simplest possible storage benchmark. And here's the critical limitation they don't mention:
Real workloads don't operate this way. They have concurrent I/O, multiple files, parallel threads. This is where architecture matters.
MayaNAS Validated Throughput
3.7 GB/s
Read (AWS, 6 buckets)
2.5 GB/s
Write (AWS, 6 buckets)
8.14 GB/s
Read (GCP, 20 buckets)
6.2 GB/s
Write (GCP, 20 buckets)
Test configurations:
AWS: c5n.9xlarge, 6 S3 buckets, 1MB recordsize, special VDEV on NVMe
GCP: n2-standard-48, 20 GCS buckets, 75 Gbps TIER_1 networking
Test 1: 10GB File Copy (mc cp equivalent)
MinIO's benchmark used mc cp with a 10GB taxi dataset CSV. We replicated this
with a simple Linux cp command—the most straightforward comparison possible:
# Create 10GB test file (random data, incompressible) dd if=/dev/urandom of=/tmp/10gb-testfile bs=1M count=10240 # Copy to MayaNAS (ZFS + objbacker.io) time cp /tmp/10gb-testfile /minio-pool/testfs/
| Method | Time | Throughput | How It Works |
|---|---|---|---|
| MinIO mc cp | ~28s | ~360 MB/s | Multipart to single object in single bucket |
| JuiceFS POSIX | ~28s | ~360 MB/s | FUSE + chunking + Redis coordination |
| s3fs-fuse | 3m 6s | ~55 MB/s | Temp file → single upload |
| MayaNAS cp | 3.97s 7x faster | 2.52 GB/s | Parallel 1MB blocks across multiple buckets |
Same test with fio for precise measurement:
fio --name=test --size=10G --numjobs=1 --rw=write --bs=1M \
--ioengine=psync --directory=/pool/test --end_fsync=1
# Result: 10GB in 5.13s = 2.09 GB/s (with explicit fsync)
The difference? objbacker.io writes to multiple buckets in parallel. Each bucket handles ~400 MB/s, and they aggregate. No FUSE context switches. No userspace overhead. No Redis round-trips.
The Scaling Advantage mc cp Can Never Match
Here's what MinIO can't do: mc cp writes a single object to a single bucket.
Even with multipart upload, it's still one destination. You can't stripe across buckets.
You can't use multiple prefixes in parallel. The architecture has a hard ceiling.
| Configuration | mc cp | MayaNAS objbacker.io |
|---|---|---|
| 1 bucket | ~360 MB/s (max) | ~400 MB/s |
| 6 buckets | ~360 MB/s (can't use them) | 2.5 GB/s (6x parallel) |
| 20 buckets | ~360 MB/s (can't use them) | 6.2 GB/s (20x parallel) |
With objbacker.io, adding more buckets directly increases throughput. The ZFS VDEV layer stripes I/O across all configured buckets automatically. This is the architectural advantage of treating object storage as block devices rather than as a file destination.
Read Performance: Where ZFS Really Shines
What about reading? S3 supports byte-range GET requests, and mc cp can use
~4 parallel connections for downloads. But again—it's reading one object from one bucket.
ZFS has aggressive read-ahead prefetching for sequential workloads:
- Pattern detection — ZFS detects sequential access and prefetches ahead
- Parallel bucket reads — With objbacker.io, prefetch triggers concurrent GETs across all buckets
- ARC cache — Hot data stays in memory, eliminating repeated object fetches
- L2ARC on SSD — Warm data cached on local NVMe for sub-millisecond access
10GB Cold Read Benchmark (cache dropped)
# Cache dropped before test to ensure cold read from object storage:
echo 3 | sudo tee /proc/sys/vm/drop_caches
fio --name=test --size=10G --numjobs=1 --rw=read --bs=1M \
--ioengine=psync --directory=/pool/test
| Method | 10GB Cold Read | Throughput |
|---|---|---|
| mc cp (S3 GET) | ~28s | ~360 MB/s |
| MayaNAS + objbacker.io | 8.4s | 1.27 GB/s 3.5x faster |
Peak throughput reached 1.98 GB/s before the 10GB file ended—ZFS prefetch was still ramping up. With larger files, throughput stabilizes even higher as prefetch fully engages across all buckets.
| Read Behavior | mc cp | MayaNAS objbacker.io |
|---|---|---|
| Parallel connections | ~4 (single object) | Unlimited (across all buckets) |
| Prefetching | None | ZFS automatic read-ahead |
| Caching | None (re-fetch every time) | ARC (memory) + L2ARC (SSD) |
| Repeat reads | Full object fetch | Cache hit (microseconds) |
For sustained sequential reads with larger datasets, ZFS prefetch combined with multi-bucket striping delivers 8.14 GB/s on GCP—far beyond what any single-object download can achieve.
Why ZFS 1MB Recordsize Matters
ZFS with objbacker.io uses 1MB recordsize. A 10GB file becomes ~10,000 S3 PUT operations. This is optimal for object storage because:
- Reduced API costs — Fewer PUT requests than smaller blocks
- Better throughput — Each PUT transfers meaningful data
- TRIM support — Deleted blocks are removed from object storage (cost savings)
- Compression efficiency — 1MB gives LZ4/ZSTD good compression ratios
Test 2: Pandas Small-File Iterations (JuiceFS benchmark)
JuiceFS also tested a pandas workload: 100 iterations of read-modify-write on a small CSV file. This tests metadata operations and small-file handling:
# 100 iterations: read CSV → append row → write CSV
for i in range(100):
df = pd.read_csv(test_file)
df = pd.concat([df, new_row])
df.to_csv(test_file)
| Method | 100 Iterations | Per Operation |
|---|---|---|
| MinIO direct | 0.83s | 8.3ms |
| s3fs-fuse | 0.78s | 7.8ms |
| JuiceFS POSIX | 0.43s | 4.3ms |
| MayaNAS | 0.12s 3.6x faster | 1.2ms |
Why is MayaNAS so fast? ZFS ARC (Adaptive Replacement Cache):
100 pandas iterations (0.12s total):
↓
All reads/writes hit ZFS ARC (memory)
↓
TXG commit at end → single write to storage
↓
File size: 1,508 bytes → tiny, unnoticeable commit
ZFS keeps hot data in ARC during the 100 iterations. The final commit is fast because of layered intelligence:
- ZFS ARC — Hot data stays in memory during iterations
- ZFS Special VDEV — If configured, small files commit directly to local NVMe SSD
- objbacker.io — Even without special VDEV, handles small/unaligned writes efficiently
No 100 round-trips to object storage. No API cost explosion.
This is intelligent tiering. MinIO's approach (direct S3 writes) means every iteration hits object storage—100 PUT operations, 100 latency penalties, 100x the API cost. ZFS + objbacker.io batches intelligently at multiple layers. You get POSIX semantics without the overhead.
When Each Approach Makes Sense
Use MinIO Directly When:
- Your application is S3-native (data lakes, analytics)
- You don't need POSIX semantics
- Simple archival storage
Use JuiceFS When:
- You need POSIX but can manage Redis infrastructure
- Kubernetes-native deployments
- You're already invested in their ecosystem
Use MayaNAS + objbacker.io When:
- You need maximum performance with POSIX
- You want no external dependencies (no Redis, no FUSE)
- You need enterprise features: snapshots, clones, replication, compression
- You have mixed workloads: hot data on SSD, cold data on object storage
- You want data integrity: ZFS checksums, scrubbing, self-healing
- You're running NFS/SMB for traditional file sharing
Conclusion: Architecture Matters
MinIO's argument boiled down to: "Don't put a filesystem between your application and object storage." But here's what our benchmarks prove—having a filesystem in between doesn't break anything. It makes everything better.
Pandas doesn't slow down with MayaNAS between it and object storage. It speeds up. cp doesn't become a bottleneck. It becomes 7x faster than mc cp. Why? Because an intelligent filesystem layer does what applications cannot:
- Intelligent caching: Hot data stays in ARC, cold data lives in object storage
- Transaction batching: 100 writes become a handful of efficient TXG commits
- Parallel streaming: Multi-bucket striping saturates network bandwidth
- Aggressive read-ahead: ZFS prefetches what your application will need next
- Write coalescing: Small random writes become large sequential objects
MinIO generalized from one bad implementation (s3fs-fuse) to condemn all filesystems. That's like condemning all databases because one SQL implementation was slow.
The question was never "filesystem or object storage?" The question is "Why would you talk directly to object storage when such an intelligent filesystem layer is available?"
objbacker.io answers that question.
Try MayaNAS
Deploy MayaNAS on AWS, Azure, or GCP with Terraform. Full ZFS functionality with object storage economics.
GitHub Learn More