MayaNASGCPPerformance

Unleashing 13 GB/s from GCS on a Single VM — The Power of ZFS on Object Storage

March 4, 2026 10 min read Sam Sammandam
Unleashing 13 GB/s from GCS on a Single VM — The Power of ZFS on Object Storage

The Untapped Goldmine

Every hyperscaler has abundant object storage at $0.02/GB/month with 11 nines of durability. Petabytes of AI training data, genomic datasets, media archives — all sitting in GCS, S3, and Azure Blob. Cheap. Durable. Available.

And largely untapped for high-performance workloads. Because applications need POSIX file access, and object storage doesn't speak POSIX.

The POSIX Bridge Problem

The industry agrees a bridge is needed. Google routes even its premium storage tiers through a filesystem mount so that "common AI frameworks such as TensorFlow and PyTorch can access object storage without having to modify any code."

The question is what kind of bridge. Today, the common answer is FUSE — Filesystem in Userspace. Every major cloud offers one: mount a bucket, get a filesystem. Simple in concept.

FUSE was the simple, kneejerk choice — and it shows. Performance is the first casualty: kernel-to-userspace context switches on every I/O operation are an architectural tax you can't optimize away. Linus Torvalds was direct about this on linux-fsdevel:

"People who think that userspace filesystems are realistic for anything but toys are just misguided."

A USENIX FAST '17 study ("To FUSE or Not to FUSE") confirmed: FUSE overhead caused up to 83% performance degradation, with 31% higher CPU utilization — even when optimized.

But performance is only the start. The deeper problem is that a FUSE bridge never becomes a real filesystem — it just accumulates caveats. No hard links. No file locking. Non-atomic renames. Whole-object rewrites for any modification. No POSIX access control. Concurrent writers get stale file handle errors instead of serialized I/O. The documentation itself recommends "retry strategies" for transient errors.

Full list of FUSE-on-object-storage limitations →
  • No metadata preservation — object metadata is not transferred on upload (mtime and symlinks excepted)
  • Concurrent write conflicts — two mounts writing the same object: first to flush wins, others get ESTALE. Not serialized, not locked — just broken.
  • No hard links — not supported
  • No file locking or patching — version control systems (git) cannot safely operate on FUSE mounts
  • Non-atomic directory renames — unless you pay for a special hierarchical namespace tier
  • Whole-object rewrites — modify one byte, reupload the entire file. No partial writes.
  • No POSIX access control — authorization is bucket-level IAM, not file permissions
  • High latency — documentation warns it "shouldn't be used as the backend for storing a database"
  • Transient errors expected — documentation recommends implementing "retry strategies"
  • No object versioning support — using versioned buckets produces "unpredictable behavior"
  • Compressed object hazards — reading or modifying gzip-encoded objects produces "unpredictable behavior"
  • No retention policy writes — buckets with retention policies are read-only via FUSE
  • Local temp storage required — new/modified files stored entirely in local temp before upload
  • 1,024 file handle limit — the Linux kernel default; concurrent connections must stay under this
  • Slow list operationsls on large directories paginates via API, becomes "expensive and slow"
  • Single-file rsync — FUSE latency cripples rsync, which processes one file at a time

Every item on this list is something a proper kernel filesystem handles correctly from day one. Source: Cloud Storage FUSE documentation.

Cloud providers know the performance gap. Their response has been to build faster backends — premium storage tiers, NVMe caching layers, managed parallel filesystems — so the backend is fast enough that FUSE overhead becomes tolerable. Impressive engineering. But it solves the performance problem from the wrong end, and none of it fixes the limitations above.

A Different Approach: ZFS as the Bridge

What if you fix the access layer instead of retrofitting the backend?

ZFS is a battle-tested kernel filesystem trusted for two decades. It natively stripes data across multiple block devices. The insight: make each object storage bucket a block device. This is what MayaNAS does — a cloud-native ZFS storage solution that presents object storage as a standard filesystem. Deploy via Terraform on GCP, AWS, or Azure — or self-serve from cloud marketplaces.

Hybrid Storage Architecture

  • ZFS special vdev (local NVMe) — metadata and small blocks
  • Object storage vdevs (GCS/S3/Azure Blob buckets) — bulk data, striped across multiple buckets

This is not NVMe caching with copy-in/copy-out. ZFS places data automatically based on block size — metadata lands on NVMe, large data blocks land directly on object storage. One filesystem, one namespace, one mount point.

What you get — the whole platter:

  • Real POSIXopen(), read(), seek(), stat(). Every application works unmodified.
  • End-to-end checksums — every block verified on read. Silent corruption caught automatically.
  • Snapshots and clones — instant, zero-cost, backed by object storage durability.
  • In-kernel I/O — ZFS prefetch, ARC caching, and I/O scheduling all run in the kernel. No FUSE tax.

Test Configuration

Hardware Specification

Compute Instance c4-highcpu-144 (72 cores / 144 threads, Intel Xeon 6985P-C, 282 GB RAM)
Network 150 Gbps Tier_1 networking, MTU 8896 (jumbo frames)
Storage Backend 12 GCS Standard buckets ($0.02/GB/month), striped
Special Vdev 10 GB local NVMe persistent disk (metadata only)
Zone GCP us-central1

ZFS Configuration

Recordsize 1 MB (aligned with GCS object size for optimal throughput)
Compression Disabled (compression breaks 1 MB alignment)
ARC Max 80 GB (dataset is 210 GB = 2.6× ARC, ensuring reads hit GCS)
Prefetch Distance 512 MB (default 8 MB — 64× deeper read-ahead)
Async Read Max Active 96 per vdev (default 3 — 32× more concurrent reads per bucket)

The Proof: 13 GB/s from Standard GCS

13.1 GB/s

Cold Read (zero cache)

10.7 GB/s

Sustained 5-minute Read

Cold Read: 13.1 GB/s

Zero cache — every byte from GCS. 210 GB in 17 seconds:

READ: bw=12.2GiB/s (13.1GB/s), io=210GiB, run=17s

dstat (1-second intervals) confirms pure network, zero local disk:

-net/total-  -dsk/total-
 recv  send | read  writ
  13G   28M |   0     0
  15G   32M |   0     0    ← peak
  14G   31M |   0     0
  14G   31M |   0     0
  13G   28M |   0     0

13–15 GB/s network receive, zero disk reads. Every byte is live from GCS. (GCS Cloud Monitoring has a 60-second minimum granularity — a 17-second burst can't be captured cleanly, so dstat network telemetry is the proof for the cold run.)

Sustained 5-Minute Read: 10.7 GB/s — with Google's Own Proof

Continuous read for 5 minutes, 3 TB total:

READ: bw=10.0GiB/s (10.7GB/s), io=3000GiB, run=300s

For this run, we have third-party proof. Google Cloud Monitoring — Google's own infrastructure telemetry — confirms per-minute GCS throughput:

Time (UTC) GB/s from GCS Requests/sec
06:13 10.7 10,954
06:14 9.8 10,000
06:15 9.6 9,810
06:16 9.3 9,493

Key findings:

  • Each request is exactly 1 MB — one ZFS block, one GCS object
  • 95.3% load balance across 12 buckets (~900 MB/s each)
  • Zero NVMe reads — no pre-caching, no local disk activity
  • 2.9 TB total transferred from GCS in 5 minutes

Not /dev/null. Real files on a real POSIX filesystem with checksums on every block.

Google Cloud Monitoring - GCS Egress Throughput

GCS data egress: ~10 GiB/s sustained throughput from object storage

Google Cloud Monitoring - GCS ReadObject Request Rate

~11,000 ReadObject requests/sec — each exactly 1 MB (= ZFS recordsize)

Google Cloud Monitoring - VM Network Bandwidth (us-central1)

VM network: ~80 Gbps sustained from GCS against 200 Gbps capacity ceiling

The Key: ZFS Prefetch

A single GCS bucket delivers 300–400 MB/s for sequential 1 MB reads. Twelve buckets should give ~4 GB/s. We got 13 GB/s — over 3× that ceiling.

ZFS prefetch detects sequential patterns and issues read-ahead across all 12 buckets simultaneously. While the application reads block N, ZFS is already fetching N+1 through N+96. The GCS connections stay continuously saturated.

Parameter Default Tuned Effect
zfetch_max_distance 8 MB 512 MB 64× deeper read-ahead
zfs_vdev_async_read_max_active 3 96 32× more concurrent reads per bucket

Without tuning: 3–4 GB/s. With tuning: 13 GB/s. Same code, same buckets — just telling ZFS to think bigger.

How It Compares

Approach Throughput Real POSIX Checksums Data Goes To
CLI tools (gcloud storage, gsutil) ~660 MB/s No No Disk files
FUSE mounts (gcsfuse, s3fs, goofys) ~100–200 MB/s Partial No FUSE mount
DIY parallel Range GETs — 176 workers, ramdisk (beginswithdata) 9 GB/s No No /dev/null
ZFS on object storage (MayaNAS) 13.1 GB/s Yes Yes Real filesystem

Ideal Use Cases

Perfect Fit

  • AI/ML Training: 500 GB ImageNet dataset streamed at 13 GB/s = 38 seconds per epoch. Keep $100K of GPUs fully utilized.
  • Media Production: 4K/8K video editing, rendering farms — stream 120+ concurrent 4K streams from object storage
  • Scientific Computing: Genome sequencing, climate models, simulation data — 1 TB in 77 seconds
  • Big Data Analytics: Parquet file reads, data lake processing at wire speed

Not Ideal For

  • Random I/O workloads: Databases with small random reads (use MayaScale instead)
  • Ultra-low latency: Sub-millisecond requirements (object storage adds latency)

Closing Thought

When Amazon launched S3 in March 2006 — the very first AWS service — object storage was meant to be the primary storage tier of the cloud. Cheap, durable, infinitely scalable.

Twenty years later, it's largely relegated to backups and data lakes. Active workloads run on block storage and managed filesystems at 10–50× the cost — because applications need POSIX, and nobody built a proper bridge. FUSE was the wrong shovel for the goldmine.

ZFS is the right one.

What's Next

This post covers single-node local filesystem performance. In Part 2, we add NFS to the picture.

MayaNAS supports Active-Active deployment — two nodes, each serving its own NFS share from the same object storage backend. At 10+ GB/s per node, a single client mounting both shares gets 20+ GB/s aggregate — from standard $0.02/GB object storage over NFS. That's managed Lustre territory, without Lustre pricing or complexity.

Getting Started

GCP Marketplace

One-click deployment from Google Cloud Marketplace with pre-configured performance tiers. Automated setup of compute instances, GCS buckets, and networking.

Deploy on GCP Marketplace
Open Source Terraform

Deploy using our open-source Terraform modules. Full control over configuration with automated testing framework included.

View on GitHub

Need help? Contact us for architecture design, deployment, and performance optimization.

Conclusion

13 GB/s cold read, 10.7 GB/s sustained for 5 minutes — from standard GCS object storage at $0.02/GB. No premium tiers. No NVMe caching layers. No FUSE.

ZFS striped across 12 GCS buckets, with prefetch tuning that unlocks 3× the naive per-bucket throughput ceiling. Google Cloud Monitoring independently confirms every byte came from GCS.

All numbers independently verifiable. GCP c4-highcpu-144, us-central1, 12 GCS Standard buckets, OpenZFS with object storage backend. February 2026.

Contact Sales View on GitHub

Related Articles