Declarative LangGraph Agents from YAML and deployment to GCP Agent Engine

Declarative LangGraph Agents from YAML — Schema, Graph Construction, and LLM/Tool Abstraction. Stop writing LangGraph boilerplate. Declare your agent in YAML and let the builder wire the graph.

The Problem: LangGraph Is Powerful but Verbose

LangGraph gives you fine-grained control over agentic workflows. But building even a simple tool-using agent means writing a lot of code every time: defining state, constructing the graph, wiring conditional edges, instantiating LLM providers, registering tools, and managing the tool-call routing loop.

Every agent ends up with the same structural scaffolding. When you want to swap from OpenAI to Vertex AI, or add a new tool, you’re back editing Python. When a non-developer wants to create an agent, they’re locked out entirely.

A declarative YAML approach solves this. You describe what the agent does — its LLM, its tools, its workflow topology — and the builder handles how to wire it into a LangGraph ‘StateGraph’.

The Architecture

Declarative LangGraph Agents from YAML and deployment to GCP Agent Engine

The YAML Schema

The YAML is the entire contract. An agent author only needs to write this — no code required:

metadata:
  name: research-assistant
  version: 1.0.0
  description: Searches the web and summarises results
  author: platform-team
  tags:
    - research
    - web-search
 
spec:
  llms:
    - id: primary-llm
      provider: vertexai
      model: gemini-2.0-flash
      temperature: 0.3
      project_id: my-gcp-project
      location: us-central1
 
  tools:
    - id: web-search
      type: search
      config:
        max_results: 5
 
  observability:
    log_level: INFO
    trace_enabled: true
 
workflow:
  nodes:
    - id: start
      type: start
 
    - id: researcher
      type: llm
      config:
        llm_id: primary-llm
        tool_ids:
          - web-search
        system_prompt: 
          You are a research assistant. Use the web search tool to find
          accurate, up-to-date information. Summarise your findings clearly.
 
    - id: end
      type: end
 
  edges:
    - from: start
      to: researcher
    - from: researcher
      to: end

Three top-level sections govern everything:

metadata — identity and versioning
spec — the resources the agent can use (LLMs, tools, observability settings)
workflow — the graph topology (nodes and edges)

Part 1: Schema Validation — Catching Problems Before Runtime

The first step when loading any agent is parsing the YAML and validating its structure before any graph construction, LLM connection, or tool instantiation begins. Every section of the YAML maps to a typed schema model.

What each model validates

LLM config — each entry under ‘spec.llms’:

LLMConfig:
id → string, required
provider → one of: openai | vertexai | anthropic
model → string, required
temperature → float, 0.0–2.0, default 0.7
max_tokens → integer or null
api_key_env → env-var name holding the API key (openai, anthropic)
project_id → GCP project ID (vertexai only, required)
location → GCP region, default "us-central1"

Tool config — each entry under ‘spec.tools’:

ToolConfig:
id → string, required (used as reference in node config)
type → string, must match a registered tool type
config → key-value pairs passed to the tool factory

Node config — each entry under ‘workflow.nodes’:

NodeConfig:
id → string, unique across the workflow
type → one of: start | end | llm | tool | custom
config:
llm_id → references an id in spec.llms
tool_ids → list of ids from spec.tools
system_prompt → freeform string
inputs → list of declared input parameters
outputs → list of declared output parameters

Edge config — each entry under ‘workflow.edges’:

EdgeConfig:
from → source node id (YAML keyword; accessed as .source in code)
to → target node id (YAML keyword; accessed as .target in code)

Note: ‘from’ is a reserved keyword in most languages, but it is the natural word in YAML.

The schema maps ‘from’ → an internal ‘source’ field so authors write readable YAML while the builder works with unambiguous field names.

Root config — the full document:

AgentConfig:
metadata → MetadataConfig
spec → SpecConfig (llms, tools, knowledge, secrets, observability)
workflow → WorkflowConfig (nodes, edges)

The validation call

function validate_yaml(content):
  if content is a string:
    parse it as YAML into a dictionary
    validate the dictionary against AgentConfig schema
    return a typed AgentConfig object
# on any violation, raise a descriptive error:
# "spec.llms[0].provider: value 'gpt' is not one of [openai, vertexai, anthropic]"
# "workflow.nodes[2].config: llm_id is required for llm node type"

Every field violation produces a structured error pointing to exactly which section and field failed. The agent never starts with a broken or ambiguous configuration.

Part 2: Topology Validation — Before Any LLM or Tool Is Touched

Schema validity is necessary but not sufficient. A YAML file can be structurally correct and still describe a graph that can never run. Topology validation is a second, independent pass:

function validate_topology(config):
  count start nodes → must be exactly 1
  count end nodes → must be exactly 1
  build a set of all declared node IDs
  for each edge:
    assert edge.source exists in node IDs
    assert edge.target exists in node IDs
      add edge.target to adjacency list of edge.source
      reachable = depth_first_search(start_node, adjacency_map)
      unreachable = all_node_ids - reachable
      if unreachable is not empty:
        raise error: "Node(s) not reachable from start: <list>"

This catches authoring mistakes before a single network connection is made:

A node declared in YAML but with no path leading to it from start
An edge whose source or target references a node ID that was mistyped
Two `start` nodes (the builder would not know which to use as the entry point)
An agent with no `end` node (the graph would never terminate)

Part 3: LLM Abstraction — The Factory Pattern

Once the config is validated, the builder needs a live LLM instance for each llm node.

The provider field drives which backend is instantiated:

function create_llm(llm_config):
    if provider is "openai":
        read API key from environment variable named by api_key_env
        return OpenAI chat model (model, api_key, temperature, max_tokens)
    if provider is "vertexai":
        require project_id → raise error if missing
        return Vertex AI chat model (model, project, location, temperature, max_tokens,
            content safety settings)
    if provider is "anthropic":
        read API key from environment variable named by api_key_env
        return Anthropic chat model (model, api_key, temperature, max_tokens)
    raise error: "Unknown provider: <value>"

All provider-specific initialisation — safety settings, regional endpoints, authentication mechanisms — lives inside this one function. Everything else in the builder treats the result as a generic “chat model that accepts messages and returns messages”.

Switching an agent from OpenAI to Vertex AI is a single YAML change:

# Before
- id: primary-llm
provider: openai
model: gpt-4o
# After
- id: primary-llm
provider: vertexai
model: gemini-2.0-flash
project_id: my-gcp-project

No code changes. No redeployment of shared infrastructure. The rest of the workflow YAML, including tool bindings and system prompts, stays exactly the same.

LLM instances are cached by ‘id’ inside the builder. If two different `llm` nodes declare the same ‘llm_id’, they reuse one instance rather than opening duplicate connections.

Part 4: Tool Registry — Register Once, Resolve by Type

Tools follow a registry pattern. The registry is populated once at application startup. At build time, the builder resolves ‘tool_ids’ from node configs into live tool instances.

class ToolRegistry:
    factories: map of tool_type → factory_function
    register(tool_type, factory_function):
        factories[tool_type] = factory_function
    get_tool(tool_config):
        factory = factories.get(tool_config.type)
        if factory not found:
            raise ToolNotFoundError(tool_config.type, available_types)
        return factory(tool_config)

At startup, each supported tool type is registered alongside a factory that knows how to build it from a ‘ToolConfig’:

registry.register("search", → factory that builds a web search tool)
registry.register("web_browser", → factory that builds a browser navigation tool)
registry.register("code_runner", → factory that builds a sandboxed code executor)
registry.register("openapi", → factory that builds an OpenAPI-driven HTTP tool)

When the builder processes a node with ‘tool_ids: [web-search]’ it:

Finds the ‘ToolConfig’ with ‘id: web-search’ in ‘spec.tools’
Passes that config to ‘registry.get_tool()’ — the ‘type’ field selects the factory.
Caches the resulting tool instance by its ‘id’.
Binds it to the LLM node so the model can call it during inference.

An agent author never writes tool instantiation code. They declare the tool type and its configuration options in YAML, and the registry handles the rest. Adding a new tool type to the platform means registering one new factory — existing agents remain unchanged.

Part 5: Graph Construction — Walking Nodes and Edges

With topology validated, LLMs instantiated, and tools resolved, the builder constructs the LangGraph ‘StateGraph’. It walks the validated config in two passes: nodes first, then edges.

Pass 1 — Add nodes

function build(config):
    validate_topology(config)
    graph = new StateGraph(AgentState)
    tool_executor_map = {} ← tracks which LLM nodes got a hidden executor
    for each node in config.workflow.nodes:
        if node.type is "start":
            graph.add_node(node.id, start_node_function)
        if node.type is "end":
            graph.add_node(node.id, end_node_function)
        if node.type is "llm":
            graph.add_node(node.id, llm_node_function(node))
            tool_ids = node.config.tool_ids (default [])
            knowledge_ids = node.config.knowledge_ids (default [])
            if tool_ids or knowledge_ids are not empty:
                executor_id = node.id + "_tool_executor"
                graph.add_node(executor_id, tool_executor_function(node.id, tool_ids, knowledge_ids))
                tool_executor_map[node.id] = executor_id

Pass 2 — Wire edges

for each edge in config.workflow.edges:
    if edge.source has an entry in tool_executor_map:
        executor_id = tool_executor_map[edge.source]
        graph.add_conditional_edges(
            source = edge.source,
            condition = "route to executor if last message has tool_calls, else next",
            routes = { "tools": executor_id, "next": edge.target }
        )
        graph.add_edge(executor_id → edge.source) ← loop back after tool runs
    else:
        graph.add_edge(edge.source → edge.target)
set entry point = start node
connect end node → graph END terminal
return graph.compile()

The hidden routing loop

The key insight: the agent author never declares the tool-call routing loop. The builder detects that an ‘llm’ node has ‘tool_ids’ and silently:

Creates a companion ‘<node_id>_tool_executor’ node
Adds a conditional edge — if the model’s last message contains tool calls, route to the executor; otherwise proceed to the declared next node
Adds a return edge from the executor back to the LLM node

In the YAML, this entire mechanism is invisible. The author simply writes:

- id: researcher
type: llm
config:
llm_id: primary-llm
tool_ids:
- web-search

The compiled graph looks like this internally:

The researcher node runs, the model decides whether to call a tool, the executor runs it and feeds the result back, and the model runs again — all from three lines of YAML.

Part 6: AgentState — The Shared Execution Contract

Every agent built from YAML uses the same state shape. All nodes read from and write to this shared structure as the graph executes:

AgentState:
messages → full conversation history (append-only, accumulates across turns)
inputs → structured input parameters declared in node YAML
outputs → structured output results populated after execution
tool_secrets → per-tool resolved secrets, injected at execution time
file_uploads → list of uploaded file references available to tools
credentials → runtime-injected platform credentials for enterprise integrations

The first three fields are what agent authors interact with indirectly through their YAML definitions. The last three are platform-managed — injected by the execution layer before the graph runs and never visible in the YAML.

The append-only ‘messages’ field is what makes multi-turn tool use work without any special wiring: each time the tool executor adds a tool result message and the graph loops back to the LLM node, the model sees the full history and decides what to do next.

What You Get

Two calls turn a YAML string into a running LangGraph agent:

```
config = validate_yaml(yaml_content)
graph = build(config)
# Single-turn invocation
result = graph.invoke(initial_state)
# Streaming
for each chunk in graph.stream(initial_state):
process(chunk)
```

The compiled graph is a standard LangGraph object. It works with any LangGraph-compatible checkpointer for persistence, any memory store for long-term recall, and any observability integration — including the OpenTelemetry tracing setup covered in Blog 1 of this series.

No special wrappers needed.

Key Takeaways

Declare, don’t code. A YAML file is the complete agent definition — topology, LLM choice, tool bindings, and observability settings in one place. No Python required to create or modify an agent.
Validate in two stages. Schema validation (are all fields correct?) and topology validation (is the graph actually runnable?) are separate concerns. Both must pass before any LLM or tool is touched.
The factory and registry patterns isolate provider-specific code. Everything outside the factory treats LLMs and tools as generic interfaces. Switching providers or adding new tool types requires no changes to the builder or the state machine.
The tool-call routing loop is auto-injected. Any ‘llm’ node with ‘tool_ids’ gets a hidden executor node and conditional edges wired in automatically. Agent authors never write routing logic.
A single shared state schema is the execution contract. It connects the YAML declaration, the graph runtime, and the platform’s credential and secret injection. Stability here is what makes the whole system composable.
The output is a plain LangGraph compiled graph. It inherits full LangGraph compatibility — checkpointers, streaming, and any instrumentation layer you already have.

Declarative LangGraph Agents from YAML and deployment to GCP Agent Engine was originally published in Google Cloud – Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source Credit: https://medium.com/google-cloud/declarative-langgraph-agents-from-yaml-and-deployment-to-gcp-agent-engine-4ca05803f93d?source=rss—-e52cf94d98af—4