Automate software supply chain auditing using Gemini’s semantic reasoning and BigQuery’s (deps_dev_v1) massive open-source dataset.
Whenever we start a new project, we add many library dependencies to it. Usually, there is no time to check the licenses or compliance rules for every single one of these libraries. This is also true when adding a new library into an existing project — sometimes the compliance issues are missed entirely. If the project is large, manual checking is a very tedious and time-taking process.
Checking license types and compliance is usually a manual task. While security scanning (VAPT) is a standard for security, license checking is still often done by hand. Most of the time, the licenses of “transitive dependencies” (the libraries your libraries use) are not checked. A single “copyleft” license hidden deep inside a transitive dependency can create big legal risks for proprietary software. Expensive tools are available in the market for this, but we all cannot afford them.
To solve this, the License Compliance Agent was created by leveraging Google Open Source Insights (deps.dev) via BigQuery and the reasoning power of Gemini and Vertex AI.
Development Spotlight: From Requirements to Reality with Antigravity
To build this agent, I leveraged Antigravity to move from idea to execution at lightning speed.
- Verified Coding: I provided the requirements, and Antigravity handled the implementation, verifying every logic gate and recursive scan along the way to ensure perfection.
- Browser-Led Debugging: Instead of standard log-chasing, the agent used its built-in browser tools to interact with the UI and backend, catching complex cloud integration errors in real-time.
- Reduced Friction: It turned a complex multi-file architecture into a working reality, allowing me to focus on the overall strategy while it handled the technical heavy lifting.

Tech Stack & Tools
Technical Stack:
- Runtime: Python 3.12, Flask, Hypercorn (HTTP/2)
- Frontend: Vanilla JS, HTML5, CSS3 (No frameworks)
- AI Engine: Google Vertex AI (Gemini 3.0 Flash — Preview)
- Data Source: Google BigQuery (Public Dataset: deps_dev_v1) & Cloud Datastore for caching
- Infrastructure: Cloud Run (Serverless), Github Actions, Cloud Build, Artifact Registry
- Security (Optional): Hybrid Model (IAP + API Key), Secret Manager, and Workload Identity Federation (Keyless Auth)
Development Tools:
- Antigravity: The agentic IDE used for autonomous planning, multi-file coding, and browser-based debugging.
- IntelliJ IDEA: The primary IDE for Java/Python development and project management.
Project Architecture: How the Agent Works
The system is designed to be a “Sentinel” that watches over your code. Users can interact with the system via a standalone web dashboard to perform deep analysis of their projects.
- A Strict Policy Enforcement Checker:
– A central policy.json file acts as a strict gatekeeper. It contains a list of allowed and banned licenses.
– Users can add more license types into the respective categories as per their requirements. The code relies on this list to decide whether to process the build, ensuring non-compliant code never reaches production.
– Cost Optimization: By keeping this file, the system reduces AI costs by skipping known license types. - Using BigQuery to Solve the Dependency Problem:
– The system uses the deps_dev_v1 dataset in BigQuery. In the code written for bq_service.py, batch queries find the full list of dependencies, including transitive ones. This removes the need for manual checking. - Gemini + Vertex AI as a “Legal Brain”:
– For “Unknown” or “non-standard” licenses, Gemini 3.0 Flash reads messy legal text and identifies the correct license name using smart reasoning. It includes a Self-Healing feature to fix its own data if the format is wrong. - A Special Scanner for “Fat JARs”:
– Java developers often pack hundreds of libraries inside a single “Fat JAR,” which standard tools miss. A special scanner in jar_checker.py goes deep into hidden folders like BOOT-INF/lib/ to find the correct metadata. - Secure and Fast GCP Infrastructure:
– The system is hosted on Google Cloud Run for serverless scaling and uses Workload Identity Federation (WIF) for a “keyless” and secure connection.
– A caching layer using Cloud Datastore remembers results for 30 days to keep the system lightning-fast.
Deployment Options
The License Compliance Agent supports a variety of deployment methods to fit any environment:
- CI/CD Integration: Seamlessly integrate with GitHub Actions, Docker, or Cloud Build to block non-compliant releases.

- Standalone Web Dashboard: Host a visual interface for manual uploads and quick audits.

- Local Use: Run locally via localhost or Docker for pre-commit verification.
- Interactive Deployment: Utilize a “Zero-Config” script for rapid setup and configuration.
Key Use cases
- SBOM Generation: Automatically generates a full Software Bill of Materials for any project.
- Legacy Auditing: Deep-scan older projects or compiled JARs to identify forgotten risks.
- Pre-Release Compliance: Acts as a final CI/CD checkpoint to prevent restricted licenses from entering production builds.
- Policy Control: Enforce uniform licensing standards across the entire organization.
A Note on AI Reliability
While AI is a sophisticated “Legal Brain,” LLMs can occasionally struggle with highly ambiguous legal language. By leveraging BigQuery to provide the vast majority of verified license data and using AI specifically for “edge cases,” this tool transforms an manual audit into a manageable final human review.
Future Improvements
- Enhanced Memory Banking: Implementing a vector-based memory bank to “remember” and associate similar license snippets without re-running full analysis.
- Cost-Optimized Predictive Pre-Fetching: Implementing logic to scan top-level manifests early, allowing for batch processing and early-exit strategies that reduce overall API and compute costs.
- Enhanced PDF/Excel Reporting: Automated generation of formal compliance certificates and legal reports.
- Real-time IDE Integration: Developing plugins for IntelliJ IDEA or VS Code to flag licenses during active coding.
- Security Scanning Integration: Merging license compliance with CVE vulnerability data for holistic risk analysis.
Conclusion
By combining BigQuery’s massive data with Gemini’s intelligence and Antigravity’s rapid development, this project turns a difficult manual task into a smooth, automated part of the development workflow.
Explore the project on GitHub: arunshinde/license-compliance-agent
This project was architected and built with the assistance of Antigravity, an advanced AI coding agent.
How Gemini, Antigravity, and BigQuery Help Create a License Compliance Checker was originally published in Google Cloud – Community on Medium, where people are continuing the conversation by highlighting and responding to this story.
Source Credit: https://medium.com/google-cloud/license-compliance-agent-gemini-bigquery-5b9d2fa3a3ba?source=rss—-e52cf94d98af—4
