

In today’s data-driven world, effective data governance isn’t just a buzzword; it’s a critical process. Organizations are grappling with ever-increasing volumes of data scattered across diverse systems, making it a challenge to discover, understand, and govern their data assets effectively. This is where robust metadata management comes into play, and Google Cloud’s Dataplex is continually evolving to meet these complex demands.
I am excited to highlight one of the newest and most impactful additions to GCP’s data governance services within Dataplex: Aspects and Aspect Types. These features are set to significantly enhance Dataplex’s metadata capabilities, empowering data professionals to achieve different levels of data understanding and control.
At its core, Dataplex aims to unify and manage distributed data across your organization. With the introduction of aspect
and aspect.type
, Dataplex is taking metadata enrichment to a new level of granularity and flexibility.
- Aspect Type: Think of an Aspect Type as a blueprint or a template for a specific category of metadata attributes. These types define the structure and fields that a particular piece of metadata will contain. Dataplex provides several system-defined aspect types (e.g., for data quality or data governance policies), but crucially, you can also define your own custom aspect types to capture unique business or technical metadata relevant to your organization. For example, an Aspect Type could be “Data Quality Metrics” or “Compliance Information.”
- Aspect: An Aspect is an actual instance of metadata that adheres to an
Aspect Type
. It’s a specific attribute that describes a data entry in Dataplex. So, if “Data Quality Metrics” is an Aspect Type, an “Aspect” under it might be{"data_freshness_score": 95, "data_completeness_rate": 99}
. Similarly, under a “Compliance Information” Aspect Type, an Aspect might be{"pii_classification": "confidential", "gdpr_compliant": true}
. These aspects can be associated with an entire data entry (like a table) or even individual columns within that entry.
For those familiar with Dataplex’s existing metadata capabilities, you can think of Aspects and Aspect Types as a significant evolution and enhanced upgrade over the earlier Catalog Tags and Tag Templates. While tags provided a basic way to categorize and label entries with simple key-value pairs, Aspects offer a far more structured, flexible, and programmatic approach to attaching rich metadata directly to your data assets, similar in concept but more powerful in execution.
One of the strengths of Google Cloud is the seamless integration between its powerful APIs and the intuitive user interface. Dataplex Aspects and Aspect Types are no exception, offering both programmatic control and visual management capabilities.
Effortless Exploration via the GCP Console UI
Getting started with understanding Aspects and Aspect Types is incredibly easy through the Google Cloud Console:
- Navigate to the Dataplex service in the Google Cloud Console.
- From the left-hand navigation menu, click on Catalog.
- Here, you’ll see tabs like “Entries,” “Entry groups,” and “Aspect types & tag templates.” Click on the “Aspect types & tag templates” tab.
- You can explore the existing system-provided Aspect Types (e.g.,
bigquery-table
,data-quality-scan-result
) to see the metadata schemas they offer. - You can also create your custom Aspect Types directly from this UI, defining the fields and their types (string, integer, enum, etc.) that you want to capture. This visual interface makes it incredibly straightforward to design your metadata schema without writing a single line of code initially.
Once data entries (like BigQuery tables or Cloud Storage filesets) are ingested and cataloged by Dataplex, you can view the attached Aspects for those entries directly in the “Entries” tab, providing an at-a-glance view of their enriched metadata.
Programmatic Power with REST APIs
While the UI is great for exploration and initial setup, the true power of Aspects and Aspect Types shines when integrating them into your automated data pipelines and governance workflows using APIs. Here are simplified examples demonstrating how you might interact with these features using curl
(which represents the underlying REST API calls):
1. Creating a Custom Aspect Type:
You can define your own Aspect Type to capture specific business metadata, like “DataStewardshipInfo”.
curl -X POST \
"https://dataplex.googleapis.com/v1/projects/YOUR_PROJECT_ID/locations/YOUR_LOCATION/aspectTypes?aspectTypeId=data-stewardship-info" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{
"displayName": "Data Stewardship Information",
"description": "Metadata related to data ownership and stewardship.",
"metadataTemplate": {
"name": "DataStewardshipTemplate",
"type": "record",
"recordFields": [
{
"name": "data_owner_email",
"type": "string",
"annotations": {
"displayName": "Data Owner Email",
"description": "Email address of the data owner."
},
"index": 1,
"constraints": { "required": true }
},
{
"name": "steward_team",
"type": "string",
"annotations": {
"displayName": "Stewardship Team",
"description": "Team responsible for data stewardship."
},
"index": 2
},
{
"name": "last_reviewed_date",
"type": "datetime",
"annotations": {
"displayName": "Last Reviewed Date",
"description": "Date when the data asset was last reviewed for governance."
},
"index": 3
}
]
}
}'
YOUR_PROJECT_ID
: Your Google Cloud project ID.YOUR_LOCATION
: The GCP region (e.g.,us-central1
,global
). Aspect Types can be regional or global.aspectTypeId
: A unique ID for your new Aspect Type (e.g.,data-stewardship-info
).- The
metadataTemplate
defines the JSON schema for the aspects belonging to this type. Here, we define fields likedata_owner_email
,steward_team
, andlast_reviewed_date
.
Below is sample :
curl --request POST \
'https://dataplex.googleapis.com/v1/projects/meghagd-test/locations/us-central1/aspectTypes?aspectTypeId=data-stewardship-info&key=[YOUR_API_KEY]' \
--header 'Authorization: Bearer [YOUR_ACCESS_TOKEN]' \
--header 'Accept: application/json' \
--header 'Content-Type: application/json' \
--data '{
"displayName": "Data Stewardship Information",
"description": "Metadata related to data ownership and stewardship.",
"metadataTemplate": {
"name": "DataStewardshipTemplate",
"type": "record",
"recordFields": [
{
"name": "data_owner_email",
"type": "string",
"annotations": {
"displayName": "Data Owner Email",
"description": "Email address of the data owner."
},
"index": 1,
"constraints": { "required": true }
},
{
"name": "steward_team",
"type": "string",
"annotations": {
"displayName": "Stewardship Team",
"description": "Team responsible for data stewardship."
},
"index": 2
},
{
"name": "last_reviewed_date",
"type": "datetime",
"annotations": {
"displayName": "Last Reviewed Date",
"description": "Date when the data asset was last reviewed for governance."
},
"index": 3
}
]
}
}' \
--compressed
To easily execute above command you can use this link :
Once you execute the above command you will find the below success message :
On GCP UI , you can find the aspect type as below :
Source Credit: https://medium.com/google-cloud/gcp-data-governance-with-dataplex-aspects-a-taste-of-experimentation-with-rest-apis-f2775783a8bf?source=rss—-e52cf94d98af—4