Sky Computing: Rethinking How We Use the Cloud

Cloud computing has been the dominant model for nearly two decades, but its limitations have become increasingly obvious: vendor lock-in, fragmented APIs, inconsistent services, and price differences that force teams into long-term dependencies they’d prefer to avoid. Over the last few years, a new idea has started to emerge — first in academia, and now slowly in industry: Sky Computing.

Sky Computing isn’t another layer in the cloud-native buzzword stack. It’s a shift in how we think about where workloads run and how they move. Instead of optimizing inside a single provider, Sky Computing treats the cloud ecosystem as a loosely connected “sky” of compute infrastructure, with a broker deciding where workloads execute based on cost, performance, and availability.

This post introduces the core concepts, explains how Sky Computing differs from multicloud, and looks at early tools and research shaping the space.

Why Sky Computing Exists

The idea stems from two real-world pressures:

  1. Vendor lock-in hurts flexibility
    Each provider exposes its own storage APIs, networking semantics, and managed services. Moving workloads between clouds is usually expensive and operationally painful.

  2. Regulation and reliability push workloads across borders
    Data residency laws and large-scale outages (which are no longer theoretical) create pressure to run workloads across independent trust and failure domains.

The research community (led by UC Berkeley and others) observed that while cloud services differ, the interfaces used by modern applications are converging. Kubernetes, containers, object stores, ML frameworks, databases, workflow engines — most now exist as open-source projects or portable abstractions.

This creates a kind of limited compatibility: not a full standard, but enough overlap to build on.

The question becomes: how do we turn this partial compatibility into a usable platform?

Cloud vs. Sky: The Conceptual Difference

Traditional cloud computing assumes:

  • You choose a provider
  • You deploy workloads inside that provider
  • Moving providers is a manual, expensive event
  • Pricing and performance differences create lock-in

Sky Computing introduces two major shifts:

  1. Workloads target a broker, not a single cloud
    Instead of deploying “on AWS” or “on GCP,” you submit a job or pipeline to an intercloud broker, which evaluates pricing, availability, regulations, and hardware availability.

  2. Placement becomes dynamic and programmable
    The broker decides where each task runs — similar to how interdomain routing on the Internet routes packets across independent networks.

This is not multicloud as practiced today (“we use both AWS and Azure”).
It’s closer to:

Write once, run anywhere — automatically — with the ability to split or migrate workloads across clouds on demand.

Why This Isn’t Just Multicloud

Multicloud means:
“Different teams or services happen to use different providers.”

Sky Computing means:
“A single application or pipeline can span or migrate across clouds automatically.”

Multicloud:

  • Manual placement
  • Provider-specific APIs
  • No automatic fallback
  • Limited workload mobility

Sky Computing:

  • Automatic placement by a broker
  • Optimization for price/performance/availability
  • Built-in fault tolerance across clouds
  • Fine-grained portability across environments

Academic papers describe this as creating a two-sided market for compute:
users supply tasks and constraints, while clouds supply hardware. A broker matches the two.

Core Components of a Sky Computing Architecture

1. Compatibility Layer

Abstracts cloud differences using open-source standards:

  • Kubernetes
  • Container runtimes
  • Object storage interfaces
  • ML and data frameworks
  • Workflow engines

This works similarly to how Linux abstracts away hardware differences.

2. Intercloud Broker

The core of Sky Computing. A broker:

  • Catalogs hardware, pricing, and availability
  • Understands accelerators, VM types, regions, spot markets
  • Applies constraints (latency, locality, regulatory boundaries)
  • Optimizes placement
  • Provisions and executes tasks across clouds
  • Handles failover when capacity is unavailable

SkyPilot is currently the most mature example, described in recent papers, with real-world benefits for ML training and data workloads.

3. Peering Layer

A potential future step where clouds provide reciprocal data peering to reduce cross-cloud data transfer costs.
This mirrors peering agreements on the Internet.

Why Sky Computing Matters

1. Reduced Lock-In

Workloads can move at the task level, not through massive migrations.

2. Better Price/Performance

Different clouds specialize in different hardware: TPUs, Trainium, MI300, spot GPU markets, etc.

3. Higher Availability

Cloud outages rarely align. Brokers can route around failures.

4. Compliance and Locality

Placement can follow data residency or enterprise compliance rules automatically.

5. Rise of Specialized Clouds

Sky Computing allows niche clouds (HPC, GPUs, confidential compute) to compete on equal footing without requiring proprietary APIs.

Tools in the Ecosystem

SkyPilot

Reference implementation of an intercloud broker.
Supports GPU scheduling, DAG-based pipelines, spot fallback, multi-cloud runs, and data movement.

Ray / Airflow / Prefect / Flyte

Workflow engines that can be used under the Sky Computing model for task orchestration.

S3-Compatible Storage Layers

S3Proxy, MinIO, and native cloud S3 endpoints act as part of the compatibility layer.

Kubernetes Hybrid Platforms

Solutions like Anthos or OpenShift unify management but do not provide full dynamic placement.

ML Platforms

Run:ai, MosaicML, Lightning AI — early examples of multi-cloud GPU scheduling and marketplace-like systems.

Where This Is Heading

Sky Computing is still early. Like the early Internet, it exists but is fragmented, research-driven, and limited by current economics.

But the trajectory is clear:

  • The tooling exists.
  • The need (GPU supply, regulation, outages) is increasing.
  • The incentives for portability are stronger every year.

Sky Computing won’t replace the cloud — it reframes it as a globally addressable marketplace rather than a set of isolated silos.

Summary

Sky Computing is a natural evolution of cloud computing as workloads become more portable and specialized hardware becomes more fragmented. By introducing a compatibility layer, an intercloud broker, and potentially future data-peering agreements, Sky Computing enables dynamic multi-cloud execution with better cost, availability, and flexibility.

It’s early, but the research and tooling already show real momentum — and for engineers building distributed systems, it’s an area worth understanding now.