Sunday, January 24, 2016

Easy data scraping with Google Apps Script in 5 minutes

I'm using Google Apps Script for a lot of things - from automate tasks to data analysis. I have discovered, that there was repetitive use-case: scrape data from web  and parse exact value from HTML source code. If you are novice in programming, you probably know, that's difficult to write and use regular expresion. For me too :) I have written Google Apps Script library, which helps you to parse data in 5 minutes.

Let's create a small example. In our company we have created a Google Apps application SignatureSatori to create and setup email signatures for all users in domain. Like a good growth hacker I benchmark the competitors how quickly get new users. There is a Google Apps Marketplace, which estimate number of users. I need save that numbers each day.

1) Create a new Google Apps Script and insert a new library (Resources -> Library)

Parser library takes three parameters - input text and pattern which bounds desired text.
Parser // name of library
    .data(content) // input text
    .from(fromText) // from text pattern
    .to(toText) // to text pattern
    .build(); // run parser and return value

2) Now open desired web-page (e.g Chrome Webstore in our case). Click on specify HTML element by right mouse button and select Inspect element.

3) Find the right part of HTML and copy fromText and toText

4) Now we have all required information to complete script
function getData() {
    var url = "";
    var fromText = '<span class="e-f-ih" title="';
    var toText = '">';
    var content = UrlFetchApp.fetch(url).getContentText();
    var scraped = Parser
    return scraped;

5) The last and the easiest step is copy parsed data into Spreadsheet
function SAVE_DATA() {
 var sheet = SpreadsheetApp.openById(spreadsheetId).getSheetByName(sheetName); // insert Spreadsheet Id and Sheet name
 sheet.appendRow([ new Date(), getData() ]);

6) If you want to log during scraping (e.g. if you want to debug wrong value), call .setLog() function before final .build() function:
Completed code of Parser library Enjoy!

Are you interested in this topic? Follow me on Twitter, Google+ or subscribe RSS


  1. Thanks for sharing such a great post.

  2. Google has made so many technical things so much easy to handle and understand.I really appreciate this effort.Your should write something for because students really follow this link.Thanks for this post.

  3. Google changed CSS, now from/to must be:

    var fromText = "div class="LCopac qlfxzd">";
    var toText = "

    But anyway I'm getting error:
    ReferenceError: "Parser" is not defined.

  4. It is true that day by day the demand for the MBA course is really rising and why not? Poeple love to make better career through this. homepage will give you some amazing idea and very effective for the papers writing.

  5. You can be sure that you will only get the best online copy watch. Remember that buying fake watches is very similar to buying any other typeReplica watchesReplica Watches Uk

  6. Amazing info for IT students. They will gain information regarding data scrapping with google apps. Well, here i'm going to introduced our recent work just check it once

  7. How can I scrap the headings ?

  8. The new google app makes things so easy for the data scrapping factors and you can really do your data scrapping task with those.

  9. to get a new job resume and cover writing is very important . a good resume and cover writing help to get a job quickly . your ideas about cover latter writing is very unique . i found something new from your writing . keep writing like this useful link for the students that is very helpful for the writing services.

  10. Why Google Drive ask permission when adding to the library?

    1. Hi, there is another function inside for better data caching, which is using Google Drive file as cache. Check code above (link is provided)

  11. The best and easy option provided by your article. Thank you for sharing this article.

  12. I am receiving the follow error: ReferenceError: "Parser" is not defined.

    Any thoughts?