TPU Mythbusting: cost and usage

TPUs are foundational to Google’s AI capabilities and can be equally transformative for your projects. However, keeping track of a niche technology like Tensor Processing Units amidst the rapid evolution of AI can be challenging. In this installment of TPU Mythbusting, I tackle two common misconceptions about their cost and usage. If you are new to TPUs, check out the previous post for an introduction to these application-specific integrated circuits (ASIC).

Myth 3: You need to have lots of money to start using TPUs

If you are curious about TPU performance, how to program applications that use them, or simply testing a concept, you don’t need a deep wallet or a large investment to get started. TPUs are available, in a limited capacity, for free on two popular platforms.

Google Colab — You can configure your runtime to use a single v5e TPU. This environment is ideal for familiarizing yourself with the required libraries, application organization, and running basic benchmarks. While a single accelerator won’t tackle massive problems, it’s the perfect first step before moving to a paid solution.
Kaggle Notebooks — Kaggle provides access to an instance with 8 v5e chips, which is significantly more powerful than Colab and sufficient for running many mainstream LLMs. The primary restriction is the quota: 20 hours per month with a 9-hour daily limit, which cannot be increased.

With those free options, you can experiment with TPUs before you make any investments on Google Cloud Platform!

As a student and/or researcher, you may also apply for Google Cloud for Education GCP credits. This way, you can access the power of TPUs through Google Cloud Platform — without tight limitations enforced by Colab or Kaggle.

Myth 4: You can use TPUs only through Compute Engine and GKE

The use of TPU is getting friendlier over time. It’s no longer true that you can only use them through a manually managed Compute Instance or through Kubernetes Engine. Today, the main managed solution to make use of TPUs is Vertex AI with its three functions:

Vertex AI Training: You can submit “Custom Training Jobs” that run on TPU workers. You simply select the TPU type (e.g., v5e, v4) in your job configuration. The service provisions the TPUs, runs your code, and shuts them down automatically.
Vertex AI Pipelines: You can define pipeline steps (components) that specifically request TPU accelerators. This is ideal for MLOps workflows where training is just one step in a larger process.
Vertex AI Prediction (Online Inference): You can deploy trained models to endpoints backed by TPU nodes. This is one of the few ways to get “serverless-like” real-time inference on TPUs without managing a permanent VM, although you are billed for the node while the endpoint is active.

These managed solutions minimize expenditure by charging only for the resources consumed, unlike GCE or GKE where infrastructure can sit idle and generate unnecessary cost. Furthermore, Vertex AI simplifies operations management, substantially reducing the human-hours (and therefore cost) required to run and maintain your ML tasks.

Coming next

I’m not done with the myths that you can find around the TPUs. I still want to discuss the subject of vendor lock-in and that developing for TPUs makes your application incompatible with other platforms. The times of incompatibility are gone, as software solutions abstract away the differences between the two platforms.

To stay up to date with everything happening in the Google Cloud ecosystem, keep an eye on the official Google Cloud blog and GCP YouTube channel!

TPU Mythbusting: cost and usage was originally published in Google Cloud – Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source Credit: https://medium.com/google-cloud/tpu-mythbusting-cost-and-usage-1ca96c85df94?source=rss—-e52cf94d98af—4