Thursday, May 28, 2020

Machine learning in Google Sheet with Tensorflow.js and Google Apps Script


This article will show you how you can setup, train, and predict spreadsheet data with deep-learning framework Tensorflow.js. You don't need to call REST APIs or use other 3rd parties storage and algorithm. All your data stay in your secure Google Sheet.

There is a Google Spreadsheet with demo dataset + full Apps Script inside at the end of article.



Intro

Google has recently introduced a new JavaScript runtime (V8 engine) into Google Apps Script. It enhances G Suite platform for new use-cases of automation. It replaces the old Mozilla's Rhino JavaScript interpreter and allows you to include modern JavaScript libraries.

TensorFlow was originally for Python, but Google added support for more programming languages later. (nodejs. JavaScript, Swift,..). Keras is high-level neural networks API that is on top of TensorFlow. It is appropriate for beginners and helps you to build neural networks. Tensorflow.js is JavaScript-based framework for building neural networks and syntax is similar to Keras.

The whole machine learning topic is very complex. It contains a lot of use-cases, design of architectures, setups, and tiny tweaking. My aim is not to show you step-by-step tutorial which would cover machine learning, but inspire you and show you another point-of-view about the power of Google Sheets with Google Apps Script.

Disclaimer: I have used to small hack, to included Tensorflow.js library. I cannot guarantee, that you get 100% accuracy of result.

Use case

I guess that you have plenty of data in your Google Sheets. Imagine the scenario, that based on some multiple columns (with numbers) you want to predict the value in the last column. It is useful if you want to forecast future values from past values or some values are missing and you can fill the gaps. The scenario is named Multivariate Regression.


Deploy Tensorflow.js in Google Apps Script

I copied the whole Tensorflow.js library into one-file code into Google Apps Script project as file tf-js.gs. 


I had to prepare Tensorflow.js library before training and predicting. First, the library uses the name global for the global variable. It was an easier part because I only defined a new variable and added a new line of code:



Second, the Tensorflow.js library uses native APIs  "measuring time" - specifically Performance.now()  or  process.hrtime() 

Performance.now() is available only in browser API (Chrome) and process.hrtime() is available only in backend language API (node.js). I got an error "Cannot measure time in this environment. You should run tf.js in the browser or in Node.js" in Google Apps Script, because I could not use first and second method.

I did not fully reverse engineered library, but I think measuring time is used to for yielding  main thread for other tasks. For this reason I setup yieldEvery as "never" during model compilation. (https://js.tensorflow.org/api/latest/)

If you have more elegant solutions, ping me on Twitter or email.

Data

Boston Housing Prices dataset is "hello world" entry task in the machine learning world. It is a collection of 500 simple real-estate records collected in Boston (Massachusetts) in the late 1970s. Each row includes numeric measurements of a Boston neighborhood (e.g. crime rate, the typical size of homes, how far the region is from the closest highway, whether the area has waterfront property..). 

These columns are named as a features (=inputs into the machine learning model).


We want to predict the price of the home according to this dataset. This column is one and its name is a target (=output from machine learning model).

This prepared function download dataset from Google Cloud Storage into Google Sheet directly.

Data preparation

We have to divide data into training and testing dataset. A variable rowSplit defines row number for this splitting. In our case rows from 2 to 336 will be used as training dataset. The remain rows (from 337 to 507) as a testing dataset. The variables FEATURE_COLUMN_FROM and FEATURE_COLUMN_TO define features columns for training, testing and predicting.   In our case features data are loaded from 1 - 12 columns.


As a last step, select range in Google Sheet. We want to estimates values in column M according to values A- L in selected rows 7-9.

Tensorflow does not work with Array data structure, but with Tensors. Tensors are multi-dimensional data structures. Function createTensor() creates 2D tensor for us.

Several features (columns) contain values in different scale (e.g. tax values 187 - 711) than others (e.g crime rate 0.01 - 88.98). We have to normalize and transform values that improve the performance and training stability of model.


Building the model and training

As you have already known, that deep-learning networks contains more layers with neurons. We need to define the architecture of neural network layer by layer. The syntax is similar metioned Keras. We have an architecture with two layers and each of them contains 50 neurons.


There are activation functions (Sigmoid) in every hidden layer.


The last layer contains only one neuron with default (linear) activation function. It is linear, because our example is regression use-case.

Next step is compilation. In this step, we need to setup
  • optimizer (Stochastic gradient descent), how to find the best solution and neuron's weights
  • loss function (meanSquaredError), how measure optimal solution

Now it is time for training. Method .fit() trains model from data over several iterations (=EPOCHS)

These values like number of epochs, number of layers, number of neurons, type of activation function are hyperparameters. Data scientists around the world tune these values and compare it with previous settings. More info about settings in Tensorflow.JS API https://js.tensorflow.org/api/latest/


Evaluation allows you to check the accuracy of your model. Less loss value is better. You should compare Train loss vs. Test loss. Bigger train loss means Overfitting and it is not ideal.

Prediction

When we are satisfied with quality of our model and loss value is optimal. Now you can predict futures values.  We also need to convert Array prediction values into Tensors and normalized it as well.

In our code snippet predicted values are saved into cell Notes and you can compare it with original values.


Here is a main function, which load, prepare, train the data.



Try it yourself!

If you want to test and play with it, full dataset + Google Apps Script code is available in this Google Sheet

1. Create a copy of spreadsheet
2. Select any of rows (last value will be skipped during training)
3. Open menu Tools --> Script editor
4. Select menu Run --> Run function and choose Main
5. The predicted values will be saved as a notes 


If you like Google Apps Script, folks like you are in this Google Groups community
https://developers.google.com/apps-script/community


Monday, February 24, 2020

Extract archived files directly from Google Drive with serverless tool Google Colab

Today I will show you how you can extract files from zip archives directly in Google Drive without any external 3rd tools. Everything is completed serverless and in da cloud :-)

Google Colab is a pretty neat tool for data scientists, who operate and manage Jupyter notebooks in the cloud. I prefer Google Colab because supports Python 3 runtime with CPU, GPU or TPU processors. You can also mount your Google Drive as an external drive.



Colab prepares the virtual machine for you with Jupyter notebook. Notebook contains several cells, where you can write Python code and run it with "Play" button.

There is also the magic behind "magic syntax sugar", which allows you to use more advanced commands. One of them is %%bash, which converts the rest of the commands into bash commands.



When we combine the above knowledge, we will get a powerful tool like an army knife:

1. Open Google Colab https://colab.research.google.com/notebooks/intro.ipynb#recent=true and create a new notebook.

2. Create a new notebook and rename it. At the right top click to button Connect, which starts your virtual machine. 

3.Now it is time to mount your Google Drive. Click to button Mount Drive. You have to authorize Colab application to your account. The path to your files is drive/My Drive/





Copy this snippet of code int your Colab cell


There are two - change directory and unzip desired files.

Click to "play" button to run the selected cell (=snippet of code). Files will be extracted into same locations.

In my example, I did Google Takeout to my Google Drive and backup zip files were saved into Takeout folder to my root folder. 



Note:
Bash runs cell in a subprocess, so it could take some time to unzip a lot of files



Monday, June 3, 2019

How to measure latency between Google Apps Script project and Google Cloud Platform regions

If you are debugging your Google Apps Script application, you find useful to measure time spent on requests. There are several regions (now 20) and zones (now 61) in Google Cloud Platform. (source)

G Suite and their smart-and-powerfull tool App Script are running on the same infrastructure. Integration between Apps Script and some GCP service (e.g. Vision API, Natural Language API, Cloud Functions) is not through Google private network, but through public internet (in other words: you are calling REST APIs with UrlFetchApp). That means, it is good to care about your region.


If you would like to get more insights and find out where your application is "hosted", you can use code snippet.


It measures latency between your Apps Script application and several GCP regions around the world. UrlFetchApp makes HTTP requests to VMs running in each region.

Here is my result:
(Apps Script Project Timezone: Paris, Europe, computer located: Prague, Czech republic)

[19-06-03 13:09:05:115 PDT] us-east1: 8.0 ms
[19-06-03 13:09:05:116 PDT] global: 10.0 ms
[19-06-03 13:09:05:116 PDT] us-east4: 33.0 ms
[19-06-03 13:09:05:117 PDT] northamerica-northeast1: 60.0 ms
[19-06-03 13:09:05:118 PDT] us-central1: 86.0 ms
[19-06-03 13:09:05:118 PDT] us-west1: 142.0 ms
[19-06-03 13:09:05:119 PDT] us-west2: 144.0 ms
[19-06-03 13:09:05:119 PDT] europe-west2: 183.0 ms
[19-06-03 13:09:05:120 PDT] europe-west1: 195.0 ms
[19-06-03 13:09:05:120 PDT] europe-west4: 197.0 ms
[19-06-03 13:09:05:121 PDT] europe-west3: 204.0 ms
[19-06-03 13:09:05:121 PDT] southamerica-east1: 246.0 ms
[19-06-03 13:09:05:122 PDT] europe-north1: 258.0 ms
[19-06-03 13:09:05:122 PDT] asia-northeast1: 319.0 ms
[19-06-03 13:09:05:123 PDT] europe-west6: 343.0 ms
[19-06-03 13:09:05:124 PDT] asia-east1: 389.0 ms
[19-06-03 13:09:05:124 PDT] australia-southeast1: 421.0 ms
[19-06-03 13:09:05:125 PDT] asia-east2: 433.0 ms
[19-06-03 13:09:05:125 PDT] asia-southeast1: 463.0 ms
[19-06-03 13:09:05:125 PDT] asia-south1: 563.0 ms

Inspiration and browser version is located here: http://www.gcping.com/

Tuesday, January 22, 2019

Deploy Google Apps Script web app as an Android application

The main advantage of serverless Google Apps Script against of serverless Cloud Functions is online editor (IDE) not only for writing code for server-side, but even for client-side. You can create HTML web application with HTMLService


Google has recently introduced (G Suite update 12/2018) option to deploy web application as a native Android application for G Suite users. If you are reading this, you are probably power-user so you can imagine what you are able to do now! Yes, deploy native Android application with Google Apps Script.

Notes:
-To use web apps, your user’s devices must have Google Chrome installed

1) Create a new Script at https://script.google.com

2) Insert code snippet into code.gs


3) Create a new HTML file and rename it to "view.html"



4) Edit HTML for your future ├╝ber-cool-application (just for now, insert Hello World!)

5) Open menu Publish --> Deploy as web app and copy URL address



6) It is time setup in G Suite. Open Google Admin console (https://admin.google.com) and pick category Device management

7) Then choose App Management from left panel.

 8) In App Management you will see section in the middle of page - Manage apps for Android devices. This section contains button Manage whitelisted apps. Click it.



9) Now you are in mobile application management. First, you have to Add a new whitelisted application (with yellow circle button)


10) You will see Google Play directory. Change category (left panel) to Web Apps.



11) For now, you have to again click at Add button (circle button at right bottom of page.



12) Now you can define behaviour of your Android application.
- Title is name of application on home screen
- URL of your Google Apps Script (from step 5)
- I recommend you for display choose Full screen or Standalone
- Icon of your Android application


13) After you click at Create button it will take about 10 minutes to prepare your application.
(There is a small text Not available yet below during preparation )

14) When your application is successfully prepared, then you can click at desired icon



15) During distribution strategy, you can distribute application for all G Suite users or subset according to Organization unit.




16) Last step contains option to automatically install application on all devices (This option doesn't work for me)



17) When you visit Google Play at your phone, then navigate to section Work Apps. Click to application and install it.

The was a final step. Your Google Apps Script application is on mobile's homescreen