
Managing vast amounts of data in cloud storage can be a challenge. While Google Cloud Storage offers strong scalability and durability, storage admins sometimes sometimes struggle with questions like:
-
What’s driving my storage spend?
-
Where is all my data in Cloud Storage and how is it distributed?
-
How can I search across my data for specific metadata such as age or size?
Indeed, to achieve cost optimization, security, and compliance, you need to understand what you have, where it is, and how it’s being used. That’s where Storage Insights datasets, a feature of Storage Intelligence for Cloud Storage, comes in. Storage Intelligence is a unified management product that offers multiple powerful capabilities to analyze large storage estates and easily take actions. It helps you explore your data, optimize costs, enforce security, and implement governance policies. Storage insights datasets help you deeply analyze your storage footprint and you can use Gemini Cloud Assist for quick analysis in natural language. Based on these analyses, you can take action, such as relocating buckets and performing large-scale batch operations.
In this blog, we focus on how you can use Insights datasets for cost management and visibility, exploring a variety of common use cases. This is especially useful for cloud administrators and FinOps teams performing cloud cost allocation, monitoring and forecasting.
What are Storage Insights datasets?
Storage Insights datasets provide a powerful, automated way to gain deep visibility into your Cloud Storage data. Instead of manual scripts, custom one-off reports for buckets or managing your own collection pipelines, Storage Insights datasets generate comprehensive reports about your Cloud Storage objects and their activities, placing them directly in a BigQuery linked dataset.
Think of it as X-ray vision for your Cloud Storage buckets. It transforms raw storage metadata into structured, queryable data that you can analyze with familiar BigQuery tools to gain crucial insights, with automatic data refreshes delivered every 24hrs (after the initial set up, which could take up to 48hrs for the first load).
Key features
-
Customizable scope: Set the dataset scope to be at the level of the organization, a folder containing projects, a project / set of projects, or a specific bucket.
-
Metadata dataset: It provides a queryable dataset that contains bucket and object metadata directly in BigQuery.
-
Regular updates and retention: After the first load, datasets update with metadata every 24 hours and can retain data for up to 90 days.
Source Credit: https://cloud.google.com/blog/products/storage-data-transfer/storage-insights-datasets-optimizes-storage-footprint/