InfoQ Architecture·May 23, 2026

Google Cloud Enhances BigQuery with Cross-Engine Apache Iceberg Support for Lakehouses

Google Cloud has introduced significant enhancements to BigQuery, providing cross-engine support for Apache Iceberg. This allows for seamless interoperability, enabling teams to use various compute engines like Spark, Flink, and Trino with the same Iceberg tables in BigQuery without data duplication or proprietary lock-in. The update aims to reduce operational complexity and cost for building modern data lakehouses by offering managed metadata, table maintenance, and centralized access controls.

Databases & Storage Cloud & Infrastructure Distributed Systems

Read original on InfoQ Architecture

The Rise of Apache Iceberg and Data Lakehouses

Apache Iceberg has emerged as a critical open table format for building data lakehouses, gaining popularity for its ability to support multiple compute engines accessing the same data. This addresses a common challenge in modern data architectures where different workloads (e.g., analytics, ETL, AI/ML) require diverse processing tools, but all need a consistent view of the underlying data lake. Iceberg offers features like ACID transactions, hidden partitioning, and time travel, which are crucial for data reliability and governance in complex environments.

Google Cloud's Interoperability Push

Google Cloud's new BigQuery features focus on increasing interoperability with Iceberg. By introducing a preview of a serverless Iceberg REST catalog and expanding managed support, Google aims to simplify data management. This allows teams to create, update, and query Iceberg tables directly from BigQuery while also enabling other engines like Spark, Flink, and Trino to work on the exact same datasets. This eliminates the need for data duplication and mitigates the challenges associated with proprietary formats and manual metadata management.

💡

Key Architectural Benefits

Data Portability: Open formats like Iceberg provide long-term optionality by ensuring data is not locked into a specific vendor or tool. Operational Simplicity: Managed services for metadata, table maintenance, and synchronization reduce the overhead typically associated with self-managed Iceberg deployments. Cross-Cloud Strategy: Support for querying Iceberg catalogs across AWS, Azure, Databricks, and Snowflake facilitates multi-cloud data strategies.

Addressing Data Lakehouse Challenges

Before these announcements, customers often had to choose between Google-managed Iceberg REST Catalog tables or BigQuery-managed tables based on their primary ETL engine, leading to fragmented workflows and missed opportunities for leveraging BigQuery's storage management. The new integration addresses this by extending BigQuery's infrastructure to fully support Iceberg tables, including: managed metadata, automatic table maintenance, transactions, and change data replication.

Centralized Access Controls: Consistent permission management across different query engines.
Multimodal Analysis: Combining structured Iceberg data with unstructured files in Cloud Storage using BigQuery ObjectRefs for AI/ML workflows.
Enhanced Governance: Knowledge Catalog (formerly Dataplex) provides a unified governance layer for metadata, lineage, and access controls across systems.

These features reduce the "hidden tax" of Iceberg adoption, which often manifests as friction around compaction, metadata management, and orchestration. By streamlining these aspects, Google Cloud aims to make data lakehouses more cost-effective and operationally efficient, allowing enterprises to focus on extracting intelligence from their data, especially for AI initiatives.

Apache IcebergBigQueryData LakehouseCloud DataInteroperabilityData GovernanceETLAI/ML Infrastructure

Comments

Loading comments...

Google Cloud Enhances BigQuery with Cross-Engine Apache Iceberg Support for Lakehouses

The Rise of Apache Iceberg and Data Lakehouses

Google Cloud's Interoperability Push

Addressing Data Lakehouse Challenges

Comments

Related Lessons