
Analyze AI coding assistance adoption and impact for your team with Gemini Code Assist metadata logs and BigQuery

Organizations that are starting with AI coding assistance often look for ways to extract usage data to track adoption, impact on developer productivity, and also combine these data with other sources into a broader evaluation framework.
Gemini Code Assist provides usage data through Metrics and Logs. At time of writing, logs provide the most complete set of information. Logs are generated for requests, responses and metadata.
This medium article, from my teammate Daniel Strebel, provides an easy to follow walkthrough that will help you to enable logging and configure a log analytics dashboard to get a view on important metrics such as code exposures, acceptance rate, lines of code accepted.
In addition to using log analytics, you can also export Code Assist logs to BigQuery to analyze usage data, that can be helpful to:
- Join Code Assist usage data with other metrics from different tools used in your software development life cycle. For example, you want to track which percentage of the total code produced by your team is generated by AI.
- Perform more complex queries that might go beyond the scope or performance of Log Analytics for massive datasets.
- Build highly customized BI Dashboards and Reports using your currently adopted BI tool
- Use Machine Learning on Log Data through BigQuery ML
In this article I provide some example queries you can use on your BigQuery dataset after you have configured export of Code Assist logs to it.
Replace @project_id with your project id and @dataset_id with your BigQuery dataset name.
This is not intended to be an exhaustive list, you can further customize the queries or create new ones if you need to access different data or with different aggregation.
You can use the examples below to get code exposure, acceptance rate and lines of code accepted data from the BigQuery table hosting your metadata logs.
-- Metadata logs: Acceptance rate with lines count by user
WITH
Acceptance AS (
SELECT
labels.user_id AS user_id,
COUNT(DISTINCT jsonpayload_v1_metadatalog.codeacceptance.originalrequestid) AS acceptance_count,
SUM(jsonpayload_v1_metadatalog.codeacceptance.linescount) AS total_lines_accepted
FROM
`@project_id.@dataset_id.cloudaicompanion_googleapis_com_metadata_*`
WHERE
jsonpayload_v1_metadatalog.codeacceptance.originalrequestid IS NOT NULL
GROUP BY
user_id
),
Exposure AS (
SELECT
labels.user_id AS user_id,
COUNT(DISTINCT jsonpayload_v1_metadatalog.codeexposure.originalrequestid) AS exposure_count
FROM
`@project_id.@dataset_id.cloudaicompanion_googleapis_com_metadata_*`
WHERE
jsonpayload_v1_metadatalog.codeexposure.originalrequestid IS NOT NULL
GROUP BY
user_id
)
SELECT
e.user_id,
e.exposure_count,
COALESCE(a.acceptance_count, 0) AS acceptance_count,
CAST(COALESCE(a.acceptance_count, 0) AS NUMERIC) / NULLIF(e.exposure_count, 0) AS acceptance_rate,
COALESCE(a.total_lines_accepted, 0) AS total_lines_accepted
FROM
Exposure e
LEFT JOIN
Acceptance a ON e.user_id = a.user_id
WHERE e.user_id IS NOT NULL;
-- Metadata logs: Acceptance rate with lines count by user and date
WITH
Acceptance AS (
SELECT
DATE(timestamp) AS event_date,
labels.user_id AS user_id,
COUNT(DISTINCT jsonpayload_v1_metadatalog.codeacceptance.originalrequestid) AS acceptance_count,
SUM(jsonpayload_v1_metadatalog.codeacceptance.linescount) AS total_lines_accepted
FROM
`@project_id.@dataset_id.cloudaicompanion_googleapis_com_metadata_*`
WHERE
jsonpayload_v1_metadatalog.codeacceptance.originalrequestid IS NOT NULL
GROUP BY
event_date,
user_id
),
Exposure AS (
SELECT
DATE(timestamp) AS event_date,
labels.user_id AS user_id,
COUNT(DISTINCT jsonpayload_v1_metadatalog.codeexposure.originalrequestid) AS exposure_count
FROM
`@project_id.@dataset_id.cloudaicompanion_googleapis_com_metadata_*`
WHERE
jsonpayload_v1_metadatalog.codeexposure.originalrequestid IS NOT NULL
GROUP BY
event_date,
user_id
)
SELECT
e.event_date,
e.user_id,
e.exposure_count,
COALESCE(a.acceptance_count, 0) AS acceptance_count,
CAST(COALESCE(a.acceptance_count, 0) AS NUMERIC) / NULLIF(e.exposure_count, 0) AS acceptance_rate,
COALESCE(a.total_lines_accepted, 0) AS total_lines_accepted
FROM
Exposure e
LEFT JOIN
Acceptance a ON e.user_id = a.user_id AND e.event_date = a.event_date
WHERE e.user_id IS NOT NULL;
-- Metadata logs: Acceptance rate with lines count by user, client and programming language
WITH
Acceptance AS (
SELECT
labels.user_id AS user_id,
jsonpayload_v1_metadatalog.clientname AS client_name,
jsonpayload_v1_metadatalog.codeacceptance.programminglanguage AS programming_language,
COUNT(DISTINCT jsonpayload_v1_metadatalog.codeacceptance.originalrequestid) AS acceptance_count,
SUM(jsonpayload_v1_metadatalog.codeacceptance.linescount) AS total_lines_accepted
FROM
`@project_id.@dataset_id.cloudaicompanion_googleapis_com_metadata_*`
WHERE
jsonpayload_v1_metadatalog.codeacceptance.originalrequestid IS NOT NULL
GROUP BY
user_id,
client_name,
programming_language
),
Exposure AS (
SELECT
labels.user_id AS user_id,
jsonpayload_v1_metadatalog.clientname AS client_name,
jsonpayload_v1_metadatalog.codeexposure.programminglanguage AS programming_language,
COUNT(DISTINCT jsonpayload_v1_metadatalog.codeexposure.originalrequestid) AS exposure_count
FROM
`@project_id.@dataset_id.cloudaicompanion_googleapis_com_metadata_*`
WHERE
jsonpayload_v1_metadatalog.codeexposure.originalrequestid IS NOT NULL
GROUP BY
user_id,
client_name,
programming_language
)
SELECT
e.user_id,
e.client_name,
e.programming_language,
e.exposure_count,
COALESCE(a.acceptance_count, 0) AS acceptance_count,
CAST(COALESCE(a.acceptance_count, 0) AS NUMERIC) / NULLIF(e.exposure_count, 0) AS acceptance_rate,
COALESCE(a.total_lines_accepted, 0) AS total_lines_accepted
FROM
Exposure e
LEFT JOIN
Acceptance a ON e.user_id = a.user_id AND e.client_name = a.client_name AND e.programming_language = a.programming_language
WHERE e.user_id IS NOT NULL;
Since code exposures, acceptance and lines of code in metadata logs currently don’t include code generated by chat responses, while developers also use chat to get code suggestions, I put together the following query to count code characters and lines produced by chat interactions from data in the table hosting response logs.
Code in chat responses is identified by looking at the backticks ““`” separating code blocks and line counts are calculated by counting the “\n” newline characters, this is not always perfectly accurate but currently provides a reasonable approximation.
-- Total Code chars and lines in Chat response
WITH DATA AS (
SELECT
DATE(timestamp) AS use_date,
labels.user_id AS user,
message.content AS message_content
FROM
`@project_id`.`@dataset_id`.`cloudaicompanion_googleapis_com_response_*`,
UNNEST(jsonpayload_v1_responselog.taskcompletionresponse.output.messages) AS message -- replace with your date/period
WHERE
jsonpayload_v1_responselog.taskcompletionresponse.output.messages IS NOT NULL ),
ExtractedCode AS (
SELECT
use_date,
user,
message_content,
REGEXP_EXTRACT_ALL(message_content, r'```(?:\w+)?\n([\s\S]+?)\n```') AS extracted_code
FROM
`DATA` ),
UnnestedCode AS (
SELECT
use_date,
user,
message_content,
code
FROM
`ExtractedCode`,
UNNEST(extracted_code) AS code )
SELECT
use_date,
user,
SUM(LENGTH(code)) AS chat_code_chars,
SUM(ARRAY_LENGTH(REGEXP_EXTRACT_ALL(code, r'\n')) + 1) AS chat_code_lines
FROM
`UnnestedCode`
GROUP BY
use_date,
user
ORDER BY
use_date;
Source Credit: https://medium.com/google-cloud/analyze-gemini-code-assist-usage-with-bigquery-b821ed4f63fe?source=rss—-e52cf94d98af—4