To effectively operate and troubleshoot applications, developers and site reliability engineers (SREs) need to understand the full context of their system’s behavior, typically as part of their logging and observability tooling. Today, we’re excited to announce a variety of new capabilities in our Google Cloud Observability suite:
-
Log Analytics is now Observability Analytics.
-
Trace data within Observability Analytics is generally available (GA).
-
The Observability API for management and configuration is GA.
Together, these bring logs and traces together into a unified experience, helping you go from viewing high-level trends to deep, contextual, root-cause analysis for agentic as well as traditional workloads, and to configure and manage those workloads programmatically, as part of observability buckets.
Further, support for SQL in Cloud Trace is an important new tool in your toolbelt. You can, for instance, write a single SQL query that joins your application logs with your distributed trace spans and find any checkout requests that took longer than 5 seconds, to instantly see which internal microservice spent the most time processing them. Or, for AI agents, you can analyze telemetry across thousands of runs to identify which tool calls most frequently fail, or calculate the aggregated P95 response time for all external tool executions to pinpoint performance bottlenecks. The possibilities are endless!
In this blog, let’s take a closer look at Observability Analytics, and a few key use cases leveraging traces and logs, so you can put these new capabilities to work in your environment right away.
What is Observability Analytics?
Observability Analytics, formerly Log Analytics, brings the power of BigQuery and SQL to your telemetry data directly within Cloud Observability. It allows you to run complex analytical queries joining high-volume log and trace data to identify patterns, troubleshoot issues, and generate insights into your agent and application’s health and performance without having to move or duplicate data. This brings a number of important benefits:
-
Unified telemetry: Run SQL queries to analyze and JOIN high-volume log and trace data in a single place.
-
Business correlation: Join your observability datasets with business-critical data stored in BigQuery (e.g., conversion rates, revenue, operational costs) to quantify the business impact of technical issues.
-
In-place analysis: Analyze your data where it’s already stored (in Cloud Logging and Cloud Trace), reducing duplicate export storage costs and complexity.
For instance, with Cloud Observability, you can analyze how application latency impacts conversion rates or identify the financial implications of service outages, transforming raw telemetry into actionable business intelligence.
Unlock deeper insights with traces and logs
Correlating logs and traces in a single analytics view breaks down data silos and accelerates troubleshooting. You can now analyze performance trends from trace data and directly correlate them with corresponding application or infrastructure logs to understand the “why” behind the “what.” Let’s take a couple of examples.
Use case 1: AI agent optimization (analyzing tool failures and latency at scale)
AI agents often perform complex, multi-step tasks by executing various external tools (e.g., database queries, web searches, API calls). When optimizing agents at scale, inspecting individual trace graphs in a UI often isn’t enough. You need to answer systemic questions like “Which tools are failing most frequently?” and “Which ones are causing latency bottlenecks?”
With Observability Analytics, you can run aggregate queries across millions of span events to calculate failure rates and latency percentiles (like P95) for every tool in your system.
Example query: Rank agent tools by failure rate and 95th percentile latency over the last 7 days.
Source Credit: https://cloud.google.com/blog/products/management-tools/query-logs-and-traces-with-sql-in-observability-analytics/
