Thursday, May 28, 2020

Machine learning in Google Sheet with Tensorflow.js and Google Apps Script


This article will show you how you can setup, train, and predict spreadsheet data with deep-learning framework Tensorflow.js. You don't need to call REST APIs or use other 3rd parties storage and algorithm. All your data stay in your secure Google Sheet.

There is a Google Spreadsheet with demo dataset + full Apps Script inside at the end of article.



Intro

Google has recently introduced a new JavaScript runtime (V8 engine) into Google Apps Script. It enhances G Suite platform for new use-cases of automation. It replaces the old Mozilla's Rhino JavaScript interpreter and allows you to include modern JavaScript libraries.

TensorFlow was originally for Python, but Google added support for more programming languages later. (nodejs. JavaScript, Swift,..). Keras is high-level neural networks API that is on top of TensorFlow. It is appropriate for beginners and helps you to build neural networks. Tensorflow.js is JavaScript-based framework for building neural networks and syntax is similar to Keras.

The whole machine learning topic is very complex. It contains a lot of use-cases, design of architectures, setups, and tiny tweaking. My aim is not to show you step-by-step tutorial which would cover machine learning, but inspire you and show you another point-of-view about the power of Google Sheets with Google Apps Script.

Disclaimer: I have used to small hack, to included Tensorflow.js library. I cannot guarantee, that you get 100% accuracy of result.

Use case

I guess that you have plenty of data in your Google Sheets. Imagine the scenario, that based on some multiple columns (with numbers) you want to predict the value in the last column. It is useful if you want to forecast future values from past values or some values are missing and you can fill the gaps. The scenario is named Multivariate Regression.


Deploy Tensorflow.js in Google Apps Script

I copied the whole Tensorflow.js library into one-file code into Google Apps Script project as file tf-js.gs. 


I had to prepare Tensorflow.js library before training and predicting. First, the library uses the name global for the global variable. It was an easier part because I only defined a new variable and added a new line of code:



Second, the Tensorflow.js library uses native APIs  "measuring time" - specifically Performance.now()  or  process.hrtime() 

Performance.now() is available only in browser API (Chrome) and process.hrtime() is available only in backend language API (node.js). I got an error "Cannot measure time in this environment. You should run tf.js in the browser or in Node.js" in Google Apps Script, because I could not use first and second method.

I did not fully reverse engineered library, but I think measuring time is used to for yielding  main thread for other tasks. For this reason I setup yieldEvery as "never" during model compilation. (https://js.tensorflow.org/api/latest/)

If you have more elegant solutions, ping me on Twitter or email.

Data

Boston Housing Prices dataset is "hello world" entry task in the machine learning world. It is a collection of 500 simple real-estate records collected in Boston (Massachusetts) in the late 1970s. Each row includes numeric measurements of a Boston neighborhood (e.g. crime rate, the typical size of homes, how far the region is from the closest highway, whether the area has waterfront property..). 

These columns are named as a features (=inputs into the machine learning model).


We want to predict the price of the home according to this dataset. This column is one and its name is a target (=output from machine learning model).

This prepared function download dataset from Google Cloud Storage into Google Sheet directly.

Data preparation

We have to divide data into training and testing dataset. A variable rowSplit defines row number for this splitting. In our case rows from 2 to 336 will be used as training dataset. The remain rows (from 337 to 507) as a testing dataset. The variables FEATURE_COLUMN_FROM and FEATURE_COLUMN_TO define features columns for training, testing and predicting.   In our case features data are loaded from 1 - 12 columns.


As a last step, select range in Google Sheet. We want to estimates values in column M according to values A- L in selected rows 7-9.

Tensorflow does not work with Array data structure, but with Tensors. Tensors are multi-dimensional data structures. Function createTensor() creates 2D tensor for us.

Several features (columns) contain values in different scale (e.g. tax values 187 - 711) than others (e.g crime rate 0.01 - 88.98). We have to normalize and transform values that improve the performance and training stability of model.


Building the model and training

As you have already known, that deep-learning networks contains more layers with neurons. We need to define the architecture of neural network layer by layer. The syntax is similar metioned Keras. We have an architecture with two layers and each of them contains 50 neurons.


There are activation functions (Sigmoid) in every hidden layer.


The last layer contains only one neuron with default (linear) activation function. It is linear, because our example is regression use-case.

Next step is compilation. In this step, we need to setup
  • optimizer (Stochastic gradient descent), how to find the best solution and neuron's weights
  • loss function (meanSquaredError), how measure optimal solution

Now it is time for training. Method .fit() trains model from data over several iterations (=EPOCHS)

These values like number of epochs, number of layers, number of neurons, type of activation function are hyperparameters. Data scientists around the world tune these values and compare it with previous settings. More info about settings in Tensorflow.JS API https://js.tensorflow.org/api/latest/


Evaluation allows you to check the accuracy of your model. Less loss value is better. You should compare Train loss vs. Test loss. Bigger train loss means Overfitting and it is not ideal.

Prediction

When we are satisfied with quality of our model and loss value is optimal. Now you can predict futures values.  We also need to convert Array prediction values into Tensors and normalized it as well.

In our code snippet predicted values are saved into cell Notes and you can compare it with original values.


Here is a main function, which load, prepare, train the data.



Try it yourself!

If you want to test and play with it, full dataset + Google Apps Script code is available in this Google Sheet

1. Create a copy of spreadsheet
2. Select any of rows (last value will be skipped during training)
3. Open menu Tools --> Script editor
4. Select menu Run --> Run function and choose Main
5. The predicted values will be saved as a notes 


If you like Google Apps Script, folks like you are in this Google Groups community
https://developers.google.com/apps-script/community


Monday, February 24, 2020

Extract archived files directly from Google Drive with serverless tool Google Colab

Today I will show you how you can extract files from zip archives directly in Google Drive without any external 3rd tools. Everything is completed serverless and in da cloud :-)

Google Colab is a pretty neat tool for data scientists, who operate and manage Jupyter notebooks in the cloud. I prefer Google Colab because supports Python 3 runtime with CPU, GPU or TPU processors. You can also mount your Google Drive as an external drive.



Colab prepares the virtual machine for you with Jupyter notebook. Notebook contains several cells, where you can write Python code and run it with "Play" button.

There is also the magic behind "magic syntax sugar", which allows you to use more advanced commands. One of them is %%bash, which converts the rest of the commands into bash commands.



When we combine the above knowledge, we will get a powerful tool like an army knife:

1. Open Google Colab https://colab.research.google.com/notebooks/intro.ipynb#recent=true and create a new notebook.

2. Create a new notebook and rename it. At the right top click to button Connect, which starts your virtual machine. 

3.Now it is time to mount your Google Drive. Click to button Mount Drive. You have to authorize Colab application to your account. The path to your files is drive/My Drive/





Copy this snippet of code int your Colab cell


There are two - change directory and unzip desired files.

Click to "play" button to run the selected cell (=snippet of code). Files will be extracted into same locations.

In my example, I did Google Takeout to my Google Drive and backup zip files were saved into Takeout folder to my root folder. 



Note
Bash run cell in a subprocess, so it could take some time to unzip a lot of files



Monday, June 3, 2019

How to measure latency between Google Apps Script project and Google Cloud Platform regions

If you are debugging your Google Apps Script application, you find useful to measure time spent on requests. There are several regions (now 20) and zones (now 61) in Google Cloud Platform. (source)

G Suite and their smart-and-powerfull tool App Script are running on the same infrastructure. Integration between Apps Script and some GCP service (e.g. Vision API, Natural Language API, Cloud Functions) is not through Google private network, but through public internet (in other words: you are calling REST APIs with UrlFetchApp). That means, it is good to care about your region.


If you would like to get more insights and find out where your application is "hosted", you can use code snippet.


It measures latency between your Apps Script application and several GCP regions around the world. UrlFetchApp makes HTTP requests to VMs running in each region.

Here is my result:
(Apps Script Project Timezone: Paris, Europe, computer located: Prague, Czech republic)

[19-06-03 13:09:05:115 PDT] us-east1: 8.0 ms
[19-06-03 13:09:05:116 PDT] global: 10.0 ms
[19-06-03 13:09:05:116 PDT] us-east4: 33.0 ms
[19-06-03 13:09:05:117 PDT] northamerica-northeast1: 60.0 ms
[19-06-03 13:09:05:118 PDT] us-central1: 86.0 ms
[19-06-03 13:09:05:118 PDT] us-west1: 142.0 ms
[19-06-03 13:09:05:119 PDT] us-west2: 144.0 ms
[19-06-03 13:09:05:119 PDT] europe-west2: 183.0 ms
[19-06-03 13:09:05:120 PDT] europe-west1: 195.0 ms
[19-06-03 13:09:05:120 PDT] europe-west4: 197.0 ms
[19-06-03 13:09:05:121 PDT] europe-west3: 204.0 ms
[19-06-03 13:09:05:121 PDT] southamerica-east1: 246.0 ms
[19-06-03 13:09:05:122 PDT] europe-north1: 258.0 ms
[19-06-03 13:09:05:122 PDT] asia-northeast1: 319.0 ms
[19-06-03 13:09:05:123 PDT] europe-west6: 343.0 ms
[19-06-03 13:09:05:124 PDT] asia-east1: 389.0 ms
[19-06-03 13:09:05:124 PDT] australia-southeast1: 421.0 ms
[19-06-03 13:09:05:125 PDT] asia-east2: 433.0 ms
[19-06-03 13:09:05:125 PDT] asia-southeast1: 463.0 ms
[19-06-03 13:09:05:125 PDT] asia-south1: 563.0 ms

Inspiration and browser version is located here: http://www.gcping.com/

Tuesday, January 22, 2019

Deploy Google Apps Script web app as an Android application

The main advantage of serverless Google Apps Script against of serverless Cloud Functions is online editor (IDE) not only for writing code for server-side, but even for client-side. You can create HTML web application with HTMLService


Google has recently introduced (G Suite update 12/2018) option to deploy web application as a native Android application for G Suite users. If you are reading this, you are probably power-user so you can imagine what you are able to do now! Yes, deploy native Android application with Google Apps Script.

Notes:
-To use web apps, your user’s devices must have Google Chrome installed

1) Create a new Script at https://script.google.com

2) Insert code snippet into code.gs


3) Create a new HTML file and rename it to "view.html"



4) Edit HTML for your future über-cool-application (just for now, insert Hello World!)

5) Open menu Publish --> Deploy as web app and copy URL address



6) It is time setup in G Suite. Open Google Admin console (https://admin.google.com) and pick category Device management

7) Then choose App Management from left panel.

 8) In App Management you will see section in the middle of page - Manage apps for Android devices. This section contains button Manage whitelisted apps. Click it.



9) Now you are in mobile application management. First, you have to Add a new whitelisted application (with yellow circle button)


10) You will see Google Play directory. Change category (left panel) to Web Apps.



11) For now, you have to again click at Add button (circle button at right bottom of page.



12) Now you can define behaviour of your Android application.
- Title is name of application on home screen
- URL of your Google Apps Script (from step 5)
- I recommend you for display choose Full screen or Standalone
- Icon of your Android application


13) After you click at Create button it will take about 10 minutes to prepare your application.
(There is a small text Not available yet below during preparation )

14) When your application is successfully prepared, then you can click at desired icon



15) During distribution strategy, you can distribute application for all G Suite users or subset according to Organization unit.




16) Last step contains option to automatically install application on all devices (This option doesn't work for me)



17) When you visit Google Play at your phone, then navigate to section Work Apps. Click to application and install it.

The was a final step. Your Google Apps Script application is on mobile's homescreen



Friday, March 9, 2018

Analyse emotion in OSCAR movies 2018 with Google Apps Script and Google Natural Language API

And OSCAR goes to...

There are a lot of indicators, who predict the winner's movie. One of them, no doubt, is emotions. When the screenplay is written in a way to change emotion from positive to negative and vice versa, it is creating a big impact on peoples mind.

Nowadays you can you Natural Language Processing without deep knowledge of all things like syntax analysis, grammar, external libraries. Especially big companies like Google, Amazon or Microsoft invest a huge amount of money to improve their algorithm and provide as service/API.

I am always interested in easily and fast proof-of-concept, so today I will show you how I put these things together.




I have chosen a category for Best picture (90th Academy Awards) and these movies
  • Call Me by Your Name – Peter Spears, Luca Guadagnino, Emilie Georges, and Marco Morabito
  • Darkest Hour – Tim Bevan, Eric Fellner, Lisa Bruce, Anthony McCarten, and Douglas Urbanski
  • Dunkirk – Emma Thomas and Christopher Nolan
  • Get Out – Sean McKittrick, Jason Blum, Edward H. Hamm Jr., and Jordan Peele
  • Lady Bird – Scott Rudin, Eli Bush, and Evelyn O'Neill
  • Phantom Thread – JoAnne Sellar, Paul Thomas Anderson, Megan Ellison and Daniel Lupi
  • The Post – Amy Pascal, Steven Spielberg, and Kristie Macosko Krieger
  • The Shape of Water – Guillermo del Toro and J. Miles Dale
  • Three Billboards Outside Ebbing, Missouri – Graham Broadbent, Pete Czernin, and Martin McDonagh

1. I downloaded subtitles from server Opensubtitles and saved these files (.srt) to my Google Drive.

2. I created a new project in Google Apps Script and setup Google Natural Language API endpoint in menu Resources -> Cloud Platform -> left menu API & Services -> Library -> Cloud Natural Language API.
Note: Google Apps Script creates a new Google Cloud project for you, so you don't have to create it yourself.



In Google Cloud platform dashboard I also got API key, which identifies applications.

API & Services -> Credentials > Create credentials -> API key
(You should setup Application restrictions for HTTP referrer as a minimum)




2. First I did preprocess. I borrow a term from machine learning - bucketing. I aggregated multiple lines of subtitles into time-framed text "window" of length 2 minutes. Here is a code:





3. The rest is simple - iteration over all "buckets" and send each grouped text to Google Natural Language. This API response with two numbers - sentiment and magnitude. In my case, I used sentiment number in the range <-1;1>. Everything is saved in Google Spreadsheet and charts are rendered directly from there.
4. I put together two functions from above:
You can see how sentiment changes during the movie (time is on the x-axis) for the best OSCAR movie of the year 2018:

Rest nominated movies:










Thursday, October 6, 2016

Find the best location for your next shop in Google Spreadsheet (Voronoi Algorithm)



About 14 months ago I wondered how a Czech retail market is divided into regions according to their locations. Where is the best place to build next shop? I remembered to a Voronoi algoritm from university that could divide desire an area into small parts along the nearest spot.

Let's check a definition from Wikipedia
Voronoi diagram is a partitioning of a plane into regions based on distance to points in a specific subset of the plane. That set of points (called seeds, sites, or generators) is specified beforehand, and for each seed there is a corresponding region consisting of all points closer to that seed than to any other.


Why should you upload your a dataset into an unknown online web application when you want to visualize data? There were a lot of examples on the internet, but I wanted to create as easy as possible for everyone. Finally I found article by Chris Zettter, where code was written over Leaflet and Open Street map layer. It inspired me to rewrite as Google Add-on  to Google Spreadsheet with Google Maps underlay (I love Google Maps SDK)

I will introduce you shortly how to use Voronoi Map from scratch in lesss than 5 minutes.

1) Create a new Google Spreadsheet. Then insert data into Sheet 

For this demo I've downloaded data of California retailers from this Fusion Tables and filtered only Walgreens stores in San Francisco (my favorite place :-). 

You need to prepare data as separate columns for Name, Latitude, Longitude nad Type of point of interest. I've added a new column "brand" (green color) as a Type and filled it with text "Walgreens"


2) Open menu Add-ons and select Get add-ons. Find voronoi map and than click at + FREE button for install.

3) After installation open menu Add-ons -> Voroni Map and select Create Voronoi Map

4) Select appropriate colums and click Create map


Now you see a result. The blue points are point-of-interest (Walgreens stores) and regions along the nearest point. If you have more type of points-of-interest, you can filter them in right top panel.


Probably you are thinking where you should built you next shop? I would recommend you find intersection of region and think about this location :-)

Tuesday, February 23, 2016

Analyse city traffic from webcams through Google Cloud Vision API


The Google has recently introduced Cloud Vision API, which allows you to analyse image data through API instead of creating custom algorithms.

As Google Developer Expert I was involed in tester program of this API during alpha stage. My GDE's friend +Riël Notermans inspired me to connect Vision API into my favorite Google Apps Script. I was thinking how to use it and suddenly I found out useful something-like smart city solution.

Intro

I'm living in Prague. The municipal transportation provides almost real-time records of traffic from webcams The images are available on web (http://www.dpp.cz/en/webcams/). Anyone could check traffic before heading out to work.  I am able to connect, download and send these images to any API, because each image has own public URL

The maps of webcams in Prague

I have created Google Spreadsheet  as a database with new a Google Apps Script project.
Next step was activated required API in Google Developer Console. OAuth 2.0 dance was managed by great cGoa library by +Bruce Mcpherson. Cloud Vision "client" library was generated from Cloud Endpoints by +Spencer Easton's code.

My script fetched the image from URL as a Blob, converted into base64 and sent to Cloud Vision API with parameter for LABEL_DETECTION (Label detection tutorial available here)


As a JSON response I received description of image as labels with confidence score (min 0 - max 1).
The last step was setup Trigger for automatic run Apps Script every 5 minutes.

The result

I have run my script on webcam at Argentinska street at February 17th between 12am-12pm. The all collected data is available as interactive visualization. http://data.kutil.org/visualization/analysis-of-traffic-by-cloud-vision-api

The label "road" means, that higher value looks like only street without street, so traffic during lunch was low.

The chart for label traffic looks like inverse function of road label. The higher value means, that Google's recognizes traffic on the image with greater confidence.


It absolutely not 100% accurate, but it could inspire you how to use Google Cloud Vision API.