
BigQuery DataFrames adoption
We launched BigQuery DataFrames last year as an open-source Python library that scales Python data processing without having to add any new infrastructure or APIs, transpiling common Python data science APIs from Pandas and scikit-learn to various BigQuery SQL operators. Since its launch, there’s been over 30X growth in how much data it processes and, today, thousands of customers use it to process more than 100 PB every month.
During the last year we evolved our library significantly across 50+ releases and worked closely with thousands of users. Here’s how a couple of early BigQuery DataFrames customers use this library in production.
Deutsche Telekom has standardized on BigQuery DataFrames for its ML platform.
“With BigQuery DataFrames, we can offer a scalable and managed ML platform to our data scientists with minimal upskilling.” – Ashutosh Mishra, Vice President – Data Architecture & Governance, Deutsche Telekom
Trivago, meanwhile, migrated its PySpark transformations to BigQuery DataFrames.
“With BigQuery DataFrames, data science teams focus on business logic and not on tuning infrastructure.” – Andrés Sopeña Pérez, Head of Data Infrastructure, Trivago
What’s new in BigQuery Dataframes 2.0?
This release is packed with features designed to streamline your AI and machine learning pipelines:
Working with multimodal data and generative AI techniques
-
Multimodal DataFrames (Preview): BigQuery Dataframes 2.0 introduces a unified dataframe that can handle text, images, audio, and more, alongside traditional structured data, breaking down the barriers between structured and unstructured data. This is powered by BigQuery’s multimodal capabilities enabled by ObjectRef, helping to ensure scalability and governance for even the largest datasets.
When working with multimodal data, BigQuery DataFrames also abstracts many details for working with multimodal tables and processing multimodal data, leveraging BigQuery features behind the scene like embedding generation, vector search, Python UDFs, and others.
Source Credit: https://cloud.google.com/blog/products/data-analytics/a-closer-look-at-bigquery-dataframes-2-0/