Today, many organizations are moving towards lakehouse architectures to have a single copy of their data and use multiple engines for different workloads — without having to copy or move the data. However, managing a data lakehouse can be complex, often requiring custom pipelines that are hard to operate and that aren’t interoperable between query engines. Further, governance can be challenging when you have independent systems in multiple, local silos.
One way to succeed with a lakehouse architecture is to implement a metadata layer across your data engines. BigLake metastore is Google Cloud’s fully-managed, serverless, and scalable runtime metastore based on the industry-standard Apache Iceberg REST Spec, providing a standard REST interface for wider compatibility and interoperability across OSS engines like Apache Spark, as well as Google Cloud native engines such as BigQuery. Today, we’re excited to announce that support for the Iceberg REST Catalog is now generally available.
Now your users can query using their engine of choice across open-source engines such as Apache Spark and Trino, as well as native engines like BigQuery, all backed with the enterprise security offered by Google Cloud. For example, Spark users can utilize the BigLake metastore as a serverless Iceberg catalog to share the same copy of data with other engines, including BigQuery.
BigLake metastore also provides support for key authorization mechanisms such as credential vending, allowing users to access their tables without having direct access to the files in the underlying Google Cloud Storage bucket. Finally, BigLake metastore is integrated with Dataplex Universal Catalog so you get end-to-end governance complete with comprehensive lineage, data quality, and discoverability capabilities for BigLake Iceberg tables in BigQuery. Powered by Google’s planet-scale metadata management infrastructure based on Spanner, BigLake metastore removes the need to manage custom metastore deployments, giving you the benefits of an open and flexible lakehouse with the performance and interoperability of an enterprise-grade managed service.
Leading organizations building their lakehouses with Google’s Data Cloud are already seeing the benefits of BigLake metastore.
“Spotify is leveraging BigLake and BigLake metastore as part of our efforts to build a modern lakehouse platform. By utilizing open formats and open APIs, this platform provides an interoperable and abstracted storage interface for our data. BigLake helps us make our data accessible for processing by BigQuery, Dataflow and open-source, Iceberg-compatible engines.” – Ed Byne, Product Manager, Spotify
Simplify data management and unify governance
BigLake metastore has a new UX console in which you can create and update your Iceberg Catalog. For easy access, the console lets you access all your Cloud Storage and BigQuery storage data across multiple runtimes, including BigQuery, and open-source, Iceberg-compatible engines such as Spark and Trino. For example, a data engineer can create Iceberg tables in Spark and the same data can be accessed by a data analyst in BigQuery. This gives you a single view of all of your Iceberg tables across Google Cloud, whether they’re managed by BigLake or self-managed in Cloud Storage.
Source Credit: https://cloud.google.com/blog/products/data-analytics/biglake-metastore-now-supports-iceberg-rest-catalog/
