Sunday, January 24, 2016

Easy data scraping with Google Apps Script in 5 minutes

I'm using Google Apps Script for a lot of things - from automate tasks to data analysis. I have discovered, that there was repetitive use-case: scrape data from web  and parse exact value from HTML source code. If you are novice in programming, you probably know, that's difficult to write and use regular expresion. For me too :) I have written Google Apps Script library, which helps you to parse data in 5 minutes.

Let's create a small example. In our company we have created a Google Apps application SignatureSatori to create and setup email signatures for all users in domain. Like a good growth hacker I benchmark the competitors how quickly get new users. There is a Google Apps Marketplace, which estimate number of users. I need save that numbers each day.


1) Create a new Google Apps Script and insert a new library (Resources -> Library)
M1lugvAXKKtUxn_vdAG9JZleS6DrsjUUV

Parser library takes three parameters - input text and pattern which bounds desired text.
Parser // name of library
    .data(content) // input text
    .from(fromText) // from text pattern
    .to(toText) // to text pattern
    .build(); // run parser and return value

2) Now open desired web-page (e.g Chrome Webstore in our case). Click on specify HTML element by right mouse button and select Inspect element.


3) Find the right part of HTML and copy fromText and toText

4) Now we have all required information to complete script
function getData() {
    var url = "https://chrome.google.com/webstore/detail/signaturesatori-central-s/fejomcfhljndadjlojamaklegghjnjfn?hl=en";
    var fromText = '<span class="e-f-ih" title="';
    var toText = '">';
  
    var content = UrlFetchApp.fetch(url).getContentText();
    var scraped = Parser
                    .data(content)
                    .from(fromText)
                    .to(toText)
                    .build();
    Logger.log(scraped);
    return scraped;
}

5) The last and the easiest step is copy parsed data into Spreadsheet
function SAVE_DATA() {
 var sheet = SpreadsheetApp.openById(spreadsheetId).getSheetByName(sheetName); // insert Spreadsheet Id and Sheet name
 sheet.appendRow([ new Date(), getData() ]);
}

6) If you want to log during scraping (e.g. if you want to debug wrong value), call .setLog() function before final .build() function:
 
Parser
    .data(content)
    .setLog()
    .from(fromText)
    .to(toText)
    .build();
Completed code of Parser library Enjoy!

Are you interested in this topic? Follow me on Twitter, Google+ or subscribe RSS

8 comments:

  1. Thanks for sharing such a great post.

    ReplyDelete
  2. Google has made so many technical things so much easy to handle and understand.I really appreciate this effort.Your should write something for www.summarizing.biz/auto-summarize-online/ because students really follow this link.Thanks for this post.

    ReplyDelete
  3. Google changed CSS, now from/to must be:

    var fromText = "div class="LCopac qlfxzd">";
    var toText = "

    But anyway I'm getting error:
    ReferenceError: "Parser" is not defined.

    ReplyDelete
  4. It is true that day by day the demand for the MBA course is really rising and why not? Poeple love to make better career through this. homepage will give you some amazing idea and very effective for the papers writing.

    ReplyDelete
  5. You can be sure that you will only get the best online copy watch. Remember that buying fake watches is very similar to buying any other typeReplica watchesReplica Watches Uk

    ReplyDelete
  6. Amazing info for IT students. They will gain information regarding data scrapping with google apps. Well, here i'm going to introduced our recent work just check it once https://www.bestghostwriters.net/academic-papers-ghostwriting

    ReplyDelete
  7. How can I scrap the headings ?

    ReplyDelete