
From standard data warehouse to an all-in-one data platform.
If you’ve ever tried to trace connections across 5 or 6 “handshakes” in a relational database, you know the pain. Standard SQL, with its recursive CTEs (WITH RECURSIVE) and dozens of JOINs, quickly turns such tasks into slow, unreadable monsters.
But what if your data warehouse could natively understand complex relationship chains?
In this article, we’ll explore how the new capabilities of BigQuery Graph and Vector Search (AI) are changing the game. Using a practical Anti-Money Laundering (AML) scenario, we’ll unravel a complex cryptocurrency laundering scheme and catch a hacker using just a couple of SQL queries.
The Tech Dictionary: Explaining the Complex Simply
Before we dive into the code, let’s sync up on the terminology:
- Property Graphs: A way of representing data where the relationships are just as important as the data itself. Think of a social network: people are the Nodes, and their friendships are the Edges. In graphs, finding a path from A to B is intuitive and fast.
- Vector Search: While classic SQL search looks for exact text matches (like LIKE '%text%'), vector search looks for matches based on meaning or behavior.
- The AI.EMBED Function (and the broader AI ecosystem): AI.EMBED is just one of the many powerful AI functions recently introduced in BigQuery (alongside tools for text generation, translation, and entity extraction). For vector search to work, data must be converted into an array of numbers (an embedding). Previously, data engineers had to build complex Python pipelines for this. Now, BigQuery handles it natively during query execution by calling Vertex AI models.
Where Else Does This Work?
This technology is incredibly versatile. It’s used in Logistics (finding bottlenecks in supply chains), Cybersecurity (tracking a hacker’s lateral movement across a corporate network), and E-commerce (building recommendation engines based on co-purchases).
Our Story: The Hunt for Stolen Crypto
Let’s imagine a detective scenario. A malicious actor has stolen funds and is trying to launder them through a chain of wallets before cashing out into fiat currency on a major exchange.
The Actors (Node Types):
- Hacker: The initial wallet holding the stolen crypto.
- Mixer: A crypto-blender service that mixes funds from thousands of users to obscure the trail.
- Mule: A burner wallet used solely to receive funds and pass them along.
- Exchange: The final destination — a centralized exchange where the hacker cashes out.
Our mission: write an algorithm that uncovers this laundering path.
Step 1: Laying the Foundation (Tables & AI Magic)
First, we create standard relational tables for wallets and transactions. In the wallets table, we add an embedding column—this will store the "digital footprint" of the wallet's behavior.
CREATE SCHEMA IF NOT EXISTS crypto_aml_demo OPTIONS(location="US");
-- Nodes Table (Wallets)
CREATE OR REPLACE TABLE crypto_aml_demo.wallets (
wallet_id STRING,
wallet_type STRING, -- Roles: hacker, mixer, personal, exchange
owner_name STRING,
behavior_description STRING,
embedding ARRAY<FLOAT64>,
PRIMARY KEY (wallet_id) NOT ENFORCED
);
-- Edges Table (Transactions)
CREATE OR REPLACE TABLE crypto_aml_demo.transactions (
tx_hash STRING,
from_wallet STRING,
to_wallet STRING,
amount_usd FLOAT64,
block_timestamp TIMESTAMP,
PRIMARY KEY (tx_hash) NOT ENFORCED
);
Now, let’s load our demo data. Pay close attention to how we generate the embeddings. We simply describe the wallet’s behavior in plain text, and the AI.EMBED function converts it into a mathematical vector on the fly.
INSERT INTO crypto_aml_demo.wallets
(wallet_id, wallet_type, owner_name, behavior_description, embedding)
WITH raw_wallets AS (
SELECT
'0x_HACKER_1' AS id, 'hacker' AS type, 'Lazarus Group' AS owner,
'Receives massive single deposits and immediately splits them into hundreds of rapid, small transactions to unknown addresses.' AS _desc
UNION ALL
SELECT
'0x_MIXER_SERVICE', 'mixer', 'Tornado Clone',
'Acts as a dark pool. Receives funds from various sources and sends out mixed funds at random intervals.' AS _desc
UNION ALL
SELECT
'0x_MULE_WALLET', 'personal', 'John Doe (Mule)',
'Receives medium-sized transfers from mixing services and forwards exactly 99% of the balance to major exchanges within 24 hours.' AS _desc
UNION ALL
SELECT
'0x_EXCHANGE_HOT', 'exchange', 'Binance Central',
'Centralized exchange hot wallet. High volume of incoming and outgoing transactions round the clock.' AS _desc
UNION ALL
SELECT
'0x_SUSPICIOUS_ALGO', 'personal', 'Unknown Entity',
'Receives large deposits and rapidly fragments them into micro-transactions, mimicking automated laundering scripts.' AS _desc
)
SELECT
id, type, owner, _desc,
-- The magic of vector generation in a single line:
AI.EMBED(_desc, endpoint => 'text-embedding-005').result
FROM raw_wallets;
-- Insert Edges (Transactions)
INSERT INTO crypto_aml_demo.transactions (tx_hash, from_wallet, to_wallet, amount_usd, block_timestamp)
VALUES
('tx_1', '0x_HACKER_1', '0x_MIXER_SERVICE', 500000, CURRENT_TIMESTAMP()),
('tx_2', '0x_MIXER_SERVICE', '0x_MULE_WALLET', 150000, CURRENT_TIMESTAMP()),
('tx_3', '0x_MULE_WALLET', '0x_EXCHANGE_HOT', 145000, CURRENT_TIMESTAMP()),
('tx_4', '0x_SUSPICIOUS_ALGO', '0x_EXCHANGE_HOT', 10000, CURRENT_TIMESTAMP());
Key Takeaway: The AI.EMBED function eliminates the need to spin up external Python microservices. Vectorization happens directly inside the data warehouse.
Step 2: Putting on the “Graph Glasses”
We don’t need to copy, duplicate, or transform our data. We simply tell BigQuery how to logically link these two tables using the CREATE PROPERTY GRAPH statement.
CREATE OR REPLACE PROPERTY GRAPH crypto_aml_demo.CryptoGraph
NODE TABLES (
-- The wallets table becomes the Nodes
crypto_aml_demo.wallets KEY(wallet_id) LABEL Wallet
)
EDGE TABLES (
-- The transactions table becomes the Edges
crypto_aml_demo.transactions
KEY(tx_hash)
SOURCE KEY(from_wallet) REFERENCES wallets(wallet_id)
DESTINATION KEY(to_wallet) REFERENCES wallets(wallet_id)
LABEL Transfer
);
Now, BigQuery understands that from_wallet and to_wallet aren't just strings, but directed bridges between entities.
Step 3: Following the Trail (Visualizing the Path)
Let’s find the shortest path from the hacker to the exchange. In the real world, hackers generate thousands of micro-transactions (often just 1 cent) to confuse tracking algorithms. To prevent our query from blowing up with useless data, we use Inline Edge Filtering — discarding tiny transactions right at the moment of path traversal.
GRAPH crypto_aml_demo.CryptoGraph
MATCH p = ANY SHORTEST (start_wallet:Wallet)
-- Dynamic pattern: follow the Transfer edge (if amount > $100k) from 1 to 5 times
( -[e:Transfer WHERE e.amount_usd > 100000]-> (n) ){1, 5} (end_wallet:Wallet)
WHERE start_wallet.wallet_id IN ("0x_HACKER_1")
-- Ensure the final node 'n' in our chain is an exchange
AND n[OFFSET(ARRAY_LENGTH(n) - 1)].wallet_type = 'exchange'
RETURN
TO_JSON(p) AS paths, -- Return the serialized path for rendering
ARRAY_LENGTH(e) AS hop_count; -- Count the number of hops
Breaking Down the Code:
- ANY SHORTEST: A smart optimization. As soon as the engine finds the shortest path, it stops searching that branch.
- ( -[e…]-> (n) ){1, 5}: The heart of graph traversal (Quantified Path Patterns). We tell the algorithm to look for a chain of 1 to 5 hops. All traversed edges are automatically collected into array e, and the nodes into array n.
When you execute this Standalone Graph query in the Google Cloud Console, the built-in UI automatically parses TO_JSON(p) and draws a visual schema right in your browser:

Step 4: The Smart Ambush (Vector Search + Graph)
Now for the best part — hybrid analysis. Imagine we know the hacker’s behavioral footprint (exactly how they split up their funds).
We can combine AI and Graphs in a single query:
- The AI Step: Find all wallets in the database whose behavior mathematically resembles the hacker’s (even if they aren’t directly connected to them).
- The Graph Step: Check if these “suspicious” wallets have a direct route to cash out at known exchanges.
WITH SimilarBehaviors AS (
-- 1. AI Step: Find wallets with similar behavior
SELECT base.wallet_id, distance
FROM VECTOR_SEARCH(
TABLE crypto_aml_demo.wallets, 'embedding',
(SELECT embedding FROM crypto_aml_demo.wallets WHERE wallet_id = '0x_HACKER_1'),
top_k => 3,
distance_type => 'COSINE'
)
-- Exclude the original hacker from the search results
WHERE base.wallet_id != '0x_HACKER_1'
)
-- 2. Graph Step: Check the connections of the found wallets
SELECT
sb.wallet_id AS suspicious_wallet,
ROUND(sb.distance, 4) AS ai_similarity_score,
gt.destination_exchange,
gt.hops_to_exchange
FROM SimilarBehaviors sb
JOIN GRAPH_TABLE(
crypto_aml_demo.CryptoGraph
-- Check if there is a short path (1-2 transfers) to an exchange
MATCH p = (suspect:Wallet)-[:Transfer]->{1,2}(exchange:Wallet {wallet_type: 'exchange'})
RETURN
suspect.wallet_id AS start_node,
exchange.owner_name AS destination_exchange,
ARRAY_LENGTH(EDGES(p)) AS hops_to_exchange
) gt ON sb.wallet_id = gt.start_node
ORDER BY sb.distance ASC;
(Note: In BigQuery Graph, to correctly count the hops of a path variable p, we use the EDGES(p) function, which extracts the array of edges from the path object).
What Are the Results?

We caught 0x_SUSPICIOUS_ALGO & 0x_MIXER_SERVICE .
For 0x_SUSPICIOUS_ALGO Vector search discovered that its behavior is practically identical to the known hacker group (a score of 0.1005 indicates a highly similar Cosine Distance). The graph search then confirmed that this wallet is already funneling funds to a centralized exchange in just 1 hop.
Security teams no longer need to sift through terabytes of raw logs. They get actionable, ready-to-use insights for immediate blocking.
Conclusion
BigQuery Graph and built-in AI functions erase the boundaries between the classic data warehouse, Data Science environments, and graph analytics.
Analyses that just a year ago required extracting data, spinning up a Neo4j cluster, and writing complex Python pipelines can today be executed with a few elegant SQL queries right where your data already lives. And for data engineers and analysts, that is a true revolution.
Beyond SQL: How BigQuery Evolved into a Complete AI and Graph Platform was originally published in Google Cloud – Community on Medium, where people are continuing the conversation by highlighting and responding to this story.
Source Credit: https://medium.com/google-cloud/beyond-sql-how-bigquery-evolved-into-a-complete-ai-and-graph-platform-335a72c70789?source=rss—-e52cf94d98af—4
