This script provides a simple example for generating Text-To-Speech (TTS) using the Gemini API within Google Apps Script. The Gemini API generates audio data in the audio/L16;codec=pcm;rate=24000
format, which is not directly playable. Since there’s no built-in method to convert this to a standard audio/wav
format, this sample script includes a custom function to handle the conversion.
- The provided
convertL16ToWav_
function is specifically designed for theaudio/L16;codec=pcm;rate=24000
MIME type. Using it with other audio formats will result in an error. - The script uses a hardcoded WAV header. This header assumes specific audio parameters (e.g., sample rate, bit depth, number of channels) that match the Gemini API’s output for this format. If the Gemini API’s output format changes, this header might need adjustment.
Before running, replace "###"
with your actual Gemini API key in the myFunction
.
/**
* Convert "audio/L16;codec=pcm;rate=24000" generated by Gemini API to "audio/wav" using Google Apps Script.
* This can be used for only the mimeType of "audio/L16;codec=pcm;rate=24000". Please be careful about this.
*
* @param {Byte[]} data Byte array of the input data "audio/L16;codec=pcm;rate=24000"
* @param {String} mimeType MimeType of the input data.
* @returns {Byte[]} Converted data.
*/
function convertL16ToWav_(data, mimeType) {
if (mimeType != "audio/L16;codec=pcm;rate=24000") {
throw new Error(
`Sorry. As a simple sample, this can be used for only "audio/L16;codec=pcm;rate=24000".`
);
}
// This header is for a 24000 Hz, 16-bit, mono PCM WAV file.
const headerData =
"5249464632B1050057415645666D74201000000001000100C05D000080BB000002001000646174610EB10500";
const array = [...headerData];
const head = [...Array(Math.ceil(array.length / 2))]
.map((_) => array.splice(0, 2).join(""))
.map((e) =>
parseInt(e[0], 16).toString(2).length == 4
? parseInt(e, 16) - 256
: parseInt(e, 16)
);
return [...head, ...data];
}function myFunction() {
const apiKey = "###"; // Please set your API key here.
const text = [
"Create Text-To-Speech the following conversation.",
"User A: Hey there! How are you doing today?",
"User B: Hi! I'm doing well, thanks. How about you?",
"User A: I'm good too, thanks for asking!",
].join("\n");
const url = `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-preview-tts:generateContent?key=${apiKey}`;
const payload = {
contents: [{ role: "user", parts: [{ text }] }],
generationConfig: {
responseModalities: ["AUDIO"],
speechConfig: {
multiSpeakerVoiceConfig: {
speakerVoiceConfigs: [
{
speaker: "User A",
voiceConfig: { prebuiltVoiceConfig: { voiceName: "Kore" } },
},
{
speaker: "User B",
voiceConfig: { prebuiltVoiceConfig: { voiceName: "Leda" } },
},
],
},
},
},
};
const res = UrlFetchApp.fetch(url, {
contentType: "application/json",
payload: JSON.stringify(payload),
});
const obj = JSON.parse(res.getContentText());
const { data, mimeType } = obj.candidates[0].content.parts[0].inlineData;
const blob = Utilities.newBlob(
convertL16ToWav_(Utilities.base64Decode(data), mimeType),
"audio/wav",
"sample.wav"
);
DriveApp.createFile(blob);
}
Upon successful execution of the myFunction
, an audio file named sample.wav
will be created in the root folder of your Google Drive. You can then click this file to play the generated speech.
Source Credit: https://medium.com/google-cloud/text-to-speech-tts-using-gemini-api-with-google-apps-script-6ece50a617fd?source=rss—-e52cf94d98af—4