Sunday, January 24, 2016

Easy data scraping with Google Apps Script in 5 minutes

I'm using Google Apps Script for a lot of things - from automate tasks to data analysis. I have discovered, that there was repetitive use-case: scrape data from web  and parse exact value from HTML source code. If you are novice in programming, you probably know, that's difficult to write and use regular expresion. For me too :) I have written Google Apps Script library, which helps you to parse data in 5 minutes.

Let's create a small example. In our company we have created a Google Apps application SignatureSatori to create and setup email signatures for all users in domain. Like a good growth hacker I benchmark the competitors how quickly get new users. There is a Google Apps Marketplace, which estimate number of users. I need save that numbers each day.

1) Create a new Google Apps Script and insert a new library (Resources -> Library)

Parser library takes three parameters - input text and pattern which bounds desired text.
Parser // name of library
    .data(content) // input text
    .from(fromText) // from text pattern
    .to(toText) // to text pattern
    .build(); // run parser and return value

2) Now open desired web-page (e.g Chrome Webstore in our case). Click on specify HTML element by right mouse button and select Inspect element.

3) Find the right part of HTML and copy fromText and toText

4) Now we have all required information to complete script
function getData() {
    var url = "";
    var fromText = '<span class="e-f-ih" title="';
    var toText = '">';
    var content = UrlFetchApp.fetch(url).getContentText();
    var scraped = Parser
    return scraped;

5) The last and the easiest step is copy parsed data into Spreadsheet
function SAVE_DATA() {
 var sheet = SpreadsheetApp.openById(spreadsheetId).getSheetByName(sheetName); // insert Spreadsheet Id and Sheet name
 sheet.appendRow([ new Date(), getData() ]);

6) If you want to log during scraping (e.g. if you want to debug wrong value), call .setLog() function before final .build() function:
Completed code of Parser library Enjoy!