Monday, June 30, 2025

Vibe Scraping with Google Apps Script and Gemini's URL Context

Nine years ago, I wrote an article describing how to scrape data from the internet in about five minutes. It featured a Google Apps Script library that allowed you to specify what to download from a webpage by identifying the text surrounding your target information. This became one of my most-read articles, and the library itself saved me a significant amount of time.

With the advent of large language models like Gemini, this entire paradigm is shifting. Just as "vibe coding" describes a more intuitive approach to programming, I'd say we're now entering an era of "vibe-scraping." 

You simply define what information you want and URL, and the Gemini API handles the retrieval.The new features available in the Gemini API through Google AI Studio take this concept even further. 



Let's explore this with a practical example I have recently wanted to solve..

I maintain a list of movies I'm interested in watching in a Google Sheet. I want this sheet to include details like current ratings, genre, movie length, and other information typically found on ČSFD (a popular movie database for Czech users, similar to IMDb).

It occurred to me: what if I could simply tell a model what information to fetch, and it would automatically populate the data, structured, into the respective cells in my spreadsheet?


1. This function is dedicated to interacting with the Gemini API. This refers to the API endpoint accessible through Google AI Studio


2. Extracting Structured Data from the URL.
Next second function then calls the Gemini API, utilizing the powerful URL Context parameter.https://ai.google.dev/gemini-api/docs/url-context
This parameter instructs Gemini to ground its responses on the actual content of the provided URL and significantly decreate the likelihood of the hallucinations.

I found that for simple text work, the Gemini 2.5 Flash model is sufficient.


4. To make this solution as universal as possible, I decided to define what information to extract using prompts in the first row of the Google Sheet.


For example, I'd have column headers like "Name", "Rating," "Genre," "Runtime," etc. This means a user can easily customize the data they want to pull by simply changing these header texts, without needing to modify any code. The script then reads these headers and instructs Gemini to find and place the corresponding information into the cells below for each movie.

In my case, I listed several pieces of information in the header row. The script then processes each movie title, finds its page, extracts the specified details using Gemini, and neatly places them into the correct cells in the Google Sheet. This approach elegantly combines the power of Gemini with the flexibility of Google Sheets for efficient, targeted web data extraction.






This approach elegantly combines the power of Gemini with the flexibility of Google Sheets for efficient, targeted web data extraction. You simply enter corresponding URL and the system works to populate your sheet. The script processes each movie entry, leveraging Gemini to extract the specified details, and automatically organizes them into your spreadsheet, streamlining what used to be a time-consuming manual or complex coding task.