AI SecOps: A Weekend’s Hackathon against the MCP tools and reflections on the findings & Secure AI…

AI SecOps: A Weekend’s Hackathon against the MCP tools and reflections on the findings & Secure AI Solution Architecture planning.

I’ve been raving about the ‘Applied AI Engineers’ for a wee while, but I was wrong. AI Skills are actually PlatformEngineer’s New Skill, Security Engineer’s New Skill, and so on. It’s just the new normal. Now that this record is set straight, I can calmly recap what I’ve been up-to lately.

Be warned, we may be about to get …side tracked first, before we come back to the main point. Bear with me — as there is so much to talk about.

You see, One recent Friday evening, as I wrapped up my day, closed the laptop on the day-job — another week of client architecture reviews, threat-model whiteboards, identity-controls conversations — then opened my own system the minute later, to commence my new experiment. This weekend, ti was to seek answers, prove idea — regarding conversations had been circling for weeks:

How much of what I was advising actually held up if someone serious pointed (proverbial gun) at it — albeit a very smart adversarial agent at it?
(Thanks Zahid Saleem for idea, that lack of sleep that followed)

I was curious how much attack surface these new agentic designs expose, and vulnerability in action. And obv Nothing like learning-by-doing.

So by Sunday evening the harness had brought back findings.

Including MCP tools with READ on resources the USER should not have had access to, and WRITE on areas they should not have had access to either. We’ll come back to this.

Well, None of these were what i expected. You see, — once again — i built ADK Based — this time — a security ‘system’ audit/scanner/evaluation/dedupe/researcher/thinker of a multi-agent system of it.

AI SecOps: A Weekend’s Hackathon against the MCP tools and reflections on the findings & Secure AI… — some of the taster. more at the end.

This Security System — Harness, required way more tuning — more tools to install, MCP capabilities to register, and so on. But story short, to my surprise, — ‘we’ DID find vulnerabilities, proven impact and system paid out. Hmm, Interesting (+ Super timely, appreciated for paying my mounting Cloud bill btw)

By the end of the weekend, The Harness, The System — because I DID start with skills — found issues, and bugs — some which were not even in the hypothesis space I had set the auditor running against — they were adjacent paths the reasoning loop wandered into.

And I really wanted to figure out vulnerabilities in MCP/AI Systems — dont worry all Safe Harbor, so I’m sharing my experience, alas no finite details of vulnerabilities.

I brought receipts. Though still under non-disclosure about vendor vulnerability details. (none of this I make $14k/month with my silly little app here, but hope you accept the real deal)

You see, as I am just some AI Solution Architect, certainly not a red teamer by trade, but I do know how to ask THE RIGHT QUESTIONS.

It’s all about the good old #ContextEngineering principles you may have read about many times over by now.
And while I do not own a Burp Suite licence — What I own is the my very own Agentic Ways of working — A Harness — the same hyper-personalised Claude / Gemini / ADK bootstrap I seed every new project with — and a weekend.

But wait, Let me tell you about The reflections from this experience, — and this is not even the key point I want to make in this blog post. 😅

The Observation from different personnas

I wear two hats on this agentic piece of work, and they pull in slightly different directions, in terms of what I focus on.

By day I am an enterprise agentic solution architect.

I sit in client conversations, work through AI Platform Governance models, walk teams through controls matrices, and ensure I ship architectures that will survive procurement, security review, compliance gates, and the inevitable Day 2 incident response. This effort is High level context, comprehensive but engineering steps removed, abstracted, as the architecture stops at the diagram boundary.

On The other hand — like many of you — I am very much guns blazing, Agentic Hackathon participant, — an agentic solution builder.

It’s hard, Not To, get building and we’d be remiss if we start to promote and spill virtues of our work As a Google Developer Expert (GDE) — Not getting hands dirty. In the latter, I push the boundaries of what these primitives can actually do, build the systems I am writing and consulting about, and stress-test them in all creative ways possible — living the ideas/dreams out in real time, thanks to AI Empowerment.

These are personal projects ‘solutions based on ADK agents, MCP servers, deployment pipelines on Cloud Run and Agent Runtime, Workload Identity Federation configurations, the lot.

As such, The architect (Hat, POV) asks “how do I design this so it is secure?”. The builder/Hackathon asks “how would I Build it, Test it, even Break this?”

And what im realising, is that we tend to either tread AI Solution Designs quite carefully, cautiously — naturally. What this all means is that we’re simply not Adversarial enough — as my colleague Zahid says — playing the happy path (intentionally, or not) to meet one standard or another, but rarely have the full focus or view of the full solution stack — by the same person who built it, or architected it. Basically, The Point above 👆
Personally, as this is still an on-going challenge of AI Adoption, AI Ready vs AI First, vs AI Native — the AI Maturity gap is still massive.

Many enterprise architectures will inevitably risk failing in the gap between those two perspectives, and the most interesting work I have done so far.

So. ‘What’s the Point JP — where are you going with this?

Well, If you are a Google Cloud engineer, AI Solution Architect, Applied AI Solution Developer, or SecOps practitioner shipping agentic systems, the rest of this post is for you.

What I find and reflect on — the identity gap

88% of organisations had a confirmed or suspected AI agent security incident in the past twelve months. Old news, he Gravitee State of AI Agent Security 2026 reading the room as it is still today.

We credential humans to the millisecond. We may be handing agents the over-privileged key. Because we may be evolving agent, still validating things, and perhaps ‘Permission may be required to do x,y,z in the future’. Sure.

The other factor is that IAM Service Account Keys, while relevant for the deterministic App/Service which always performs things a certain way — ideally well-scoped — are no longer fit for AI Agent purpose.

This is the New Production Risk. I tried thinking through this recently;

The user works with AI Assistant/Agent to do XYZ… so Is the agent delegating to Agent to do certain activity with their own delegated Access Credentials? (Think Gemini Enterprise accessing Data on Federated Data access control plane — from Sharepoint, say) Sure. That’s user Access still, through ad through…, but
What if user requests agent to go operate, fetch, or ‘figure out’ and generate/Write/output something… it’s the agent with OWN identity control which will need to authenticate, get authorized and perform READ and eventually WRITE activities as their own self… almost like a user but not a Human user. Things get a bit more complicated from here.

Why does this Identity really, matter — why is it just not all-SA?

Well, I have some bad news and good news.

Rewind back. Remember those vulnerabilities I discovered with the BugBountyADK setup? Oh, they were quite real alright. Now imagine that some vulnerability included MCP tools and access allowed access to .. tools which had READ permission to resources the USER should not have had access to, and WRITE permissions to areas which USER should not have had access to either.

Well, Agent DID have those permissions. intentionally or not, — and this just multiplied the Attack Vector and ergo The Headache for your SecOps team.

Traditional application security gives you a relatively clean per-product threat model. Agentic systems do not work that way. I give you three Basic Reasons.

Agents reason. A traditional application follows coded paths. The developer decided which database queries to run, which APIs to call, which data to return. Non-deterministic. Goal-directed. Unpredictable. A defect in the MCP boundary that traditional AppSec might rate “exploitable in narrow conditions” becomes “exploitable by any agent that decides the path advances its task.”
Agents chain Thoughts. A user triggers Agent A. Then Agent A delegates to Agent B. Agent B invokes Agent C. And Agent C reads the system of record using its service account, and so on. The user’s original permission scope is now three identity boundaries away from the data access. The difference matters when an agent decides at runtime to invoke a sub-agent that the system designer may not have anticipated.
Agents scale. When you deploy one MCP-integrated agent, you are not deploying one client. You are deploying N clients, one per agent instance, each capable of triggering the same defect through normal operation. The blast radius is not one user; it is every active session.

You get the point. Now, onto the good news.

Gemini Agent Platform & the controls to reduce attack vectors

The good news is that Google Shipped the Gemini Agent Platform, which comes shipped with the capabilities to get you really thinking and tackling these Risks, and ensuring that architecture you buildout for production — Enterprise-purposed or not — actually makes it into Production Day 2 Ops.

This really is the crux. We’ve had the AI Demo/PoC/Pilot — ‘you name it’ banquet by now over the last 2 years and frankly feeling a bit sick of it.

And my personal hands-on discovery in the MCP vulnerability scope means that despite going into production with this shiny new AI and Agentic ecosystem — the risk is not just ‘more of the same’ but rather multiplied many times over, as per Agentic (and exponential) Use Cases highlights earlier.

This list below — of services and capabilities available out of the box with Google Gemini Agent Platform is a good starting point, and yet this table is not exhaustive, and will be subject to change — as always.

The shape of the mapping since every agentic-era attack vector lands on one or more Google Cloud Agent Platform controls, and these controls map back to the same Enterprise SecOps requirements the architecture review board already enforces.

How do we have the new named primitives wired into the existing review gates? Well, you have a few considerations to mull over. See the handy visual table for your perusal. Heck, save it, print it or share it with your organisation.

it is intentionally high-level since each row represents is a *named architectural decision, that needs to be had*

If you have a plan for your own Weekend Agent Project, i highly recommend you getting started with a codelab or two.

Govern Agentic Workloads with Agent Gateway — is a good start
Securing Cross-Cloud Agentic Deployments — is surely next
READ. Agent/ MCP Registry — if you want to govern and track the Agent reusability within your organisation, and ensure you track the MCP Servers and endpoints alike. (And want to promote production grade compliant, MCP Tool use within the organisation, team.

I want to be real about the architectural discipline I would recommend to a Google Cloud customer building on MCP in 2026. This is what — I think — ‘what good looks like’, what the gaps are, what risks still you should consider when architecting your very own enterprise solution architecture, ready for production. These of course, will evolve over time.

What good looks like

Per-agent identity, not shared service accounts. Agent Identity issues a first-class platform identity per agent instance. Combined with Workload Identity Federation, this eliminates long-lived keys and gives you per-instance attribution in Cloud Audit Logs.
Centralised tool-call enforcement. Agent Gateway intermediates every agent-to-tool call, validates parameters against declared schemas, strips and reissues tokens, and integrates Model Armor inline. Agents never talk to MCP servers directly; they talk to the gateway.
Defence in depth at the MCP boundary. Authentication at the connection, authorization at the row, descriptor signing for the tool manifest, tenant-scope re-validation on every call. Each layer assumes the layers above it have failed.
Disclosure feeds are wired into the security stack. Library-level disclosures treated as same-week incidents, not quarterly housekeeping. The lead time between the first vendor disclosure of an upstream defect and the wave of follow-ups across vendors who share the library is your patch window. Six to ten weeks, typically. Use it.

TLDR; How to reduce vulnerability scope with Google Cloud

Agent Gateway as the single enforcement point for MCP tool calls.
Agent Identity + Workload Identity Federation for per-instance attribution.
ModelArmor + Agent Anomaly Detection for runtime screening and behavioural deviation alerts.
VPC Service Controls as the data-exfiltration boundary.
Cloud Audit Logs + AI-BOM for provenance and post-incident forensics.
Disclosure feed subscriptions (CrowdStream, HackerOne hacktivity, GitHub Advisories) as your leading-indicator threat intelligence layer.

These are good to have, especially disclosure subscription or schemes. We’re simply moving too fast, — thanks to pace of vulnerabilities, complexity of exploits and time-to-attack (as opposed to time to market)

While architecture design you may have input, say or influence over, the Governance and Security Posture at the enterprise level will certainly need a leg up to level up it’s game to not just ticket the controls boxes, but proactively seek-and-remediate vulnerabilities before they wreck havoc — which would be significant commercial and reputational ramifications.

Imagine what I’ve conducted above — as an experiment, and Not The Security Researcher by Day. — became your team’s default Adversarial Gate. Suddenly the value and priority of Security Assessment is not just a SecOps team’s area of interest.

And if you’re not on google cloud, wanting to keep things all-in-house — Mandiant is the one I’d point at for actionable threat intelligence.

What’s Next & MultiAgent Security Vuln Scanner…

So. This was fun, right? Perhaps there will be more talks or a book on this later. tbc. Follow, like, comment if you have any questions.

And read my latest TL Post on Agentic AI Hybrid Delivery teams — The New Ways of working.

And if you’re still here and remember that multi-agent vulnerability scanning architecture I casually name-checked at the top, — is this something of interest?

Sneak Preview. Multi-Agent Bug Discovery Project.

Now that I battle tested the system efficacy, I hope that it got your attention enough that a full architectural deep-dive. 👇

Drop a comment, hit a like, or message me if you have questions. And If the community wants the long cut, that’s next.

Regs,
JP

#ThisIsTheWay
#AgentPlatform #Gemini #Governance #SecOps #Production

AI SecOps: A Weekend’s Hackathon against the MCP tools and reflections on the findings & Secure AI… was originally published in Google Cloud – Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source Credit: https://medium.com/google-cloud/ai-secops-a-weekends-hackathon-against-the-mcp-tools-and-reflections-on-the-findings-secure-ai-60eb648509aa?source=rss—-e52cf94d98af—4