March 2024 ~ kutil.org

Sunday, March 31, 2024

Smart replacing images in Google Slides with Gemini Pro API and Vertex AI

8:02 AM cloud platform, gemini, google apps script, machine learning

Surely, you have also experienced having a presentation in which you needed to replace old content with new. Replacing text is very simple because you just need to use the Replace function, and you can either do it in the Google Slides user interface.

The problem arises when you need to replace one image with another, for example, if your corporate logo is updated to a new graphic design or if one of your favorite cloud services updates its icons (Gmail, blink blink ;-) It's still somewhat bearable with one presentation, but what do you do when, like me, you have thousands of Google Slides files on your Google Drive?

Fortunately, there are large language models and, specifically, multimodal models that allow input prompts to include images in addition to text. Specifically, with Gemini Pro, you can have up to 16 such images as input. And then the old saying applies that one picture is worth a thousand words :)

I used Gemini Pro for exactly this use case in the Vertex AI service with integration into Google Apps Script, which could connect to my presentation, go through all the slides, and if there was an image containing the old logo, it replaced it with the new logo. I will show you how you can replicate such a procedure yourself, and all you need for it is just a Google Cloud account."

1. Create a new Google Apps Script project https://script.new/

2) Go to Project settings -> and click the checkbox Show "appsscript.json" manifest file in editor

3) Copy the manifest.json below

4) Prepare a Google Cloud project and if you don't have one, create one here: https://console.cloud.google.com/projectcreate

Then enable the Vertex AI API https://console.cloud.google.com/marketplace/product/google/aiplatform.googleapis.com

The Gemini Pro Vision API takes as input data that consists of parts (parts as an array), where each item can be either text or binary data (either embedded or embedded via a URL link)

https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/gemini

We compose our prompt as you might be used to, only with the difference that we also load two images and tell it which is the old and which is the new logo. We will use the a-few-shot learning technique for examples.

Finally, all that's left is to create a function that can load all the slides in a presentation, load all the images in each slide, and then send each image to the Gemini Pro API to see if it's an old or new image. If it's an old image, then replace it directly in the presentation with the new image.

And that's all. Now you just need to run the getSlides() function, which will replace all the old Gmail logos with the new ones. Of course, the script can be modified to go through all your files. Or better yet, to go through all the files in the company through domain-wide delegation

Google Cloud credits are provided for this project
#GeminiSprint