Advanced capabilities
For advanced users, Python UDF adds a set of capabilities to tune the performance as well as monitor the usage. Here are some examples.
Vectorized processing with Pandas PyArrow
To maximize throughput, the GA release supports direct processing of vectorized input as PyArrow RecordBatches. By processing columns of data in bulk rather than row-by-row, PyArrow eliminates Python serialization and conversion overhead, boosting performance by up to 10x for data-intensive calculations.
Configurable container resources
For heavy-duty data science and ML data preparation, you can now provision container memory (up to 16 GB) and CPU (up to 4 vCPUs) per function. This enables memory-intensive workloads (such as loading large serialized models or geospatial datasets) to run directly within the sandbox.
Customizable concurrency
Optimize your throughput and resource efficiency by configuring concurrent requests per container (up to 1,000 concurrent operations). This helps ensure that your scale-out execution is highly cost-effective and performs exceptionally well under heavy parallel loads.
Streaming logs and real-time metrics
Easily debug and monitor your production workloads. The BigQuery console now features a direct link from your query results to real-time CPU, memory, and concurrency metrics in Cloud Monitoring.
Billing
BigQuery Managed Python UDF are billed with BigQuery Services SKU. This SKU is fully eligible for BigQuery spend commitment-based usage discounts (CUDs), allowing you to maximize budget efficiency.
You can also get cost observability through INFORMATION_SCHEMA.JOBS as well as using billing labels MANAGED_ROUTINE_EXECUTION and MANAGED_ROUTINE_BUILD).
See more details in the Pricing section of the documentation.
Getting started
To get started with BigQuery Python UDFs, first check out product documentation.
Then, try out the functions published in the public BigQuery dataset. For example, run the following code in a BigQuery project to tokenize country names data from BigQuery public data. Under the hood, the token UDF utilizes the o200k_base tokenizer library.
Source Credit: https://cloud.google.com/blog/products/data-analytics/python-udf-in-bigquery-now-generally-available/
