Monday, February 24, 2020

Extract archived files directly from Google Drive with serverless tool Google Colab

Today I will show you how you can extract files from zip archives directly in Google Drive without any external 3rd tools. Everything is completed serverless and in da cloud :-)

Google Colab is a pretty neat tool for data scientists, who operate and manage Jupyter notebooks in the cloud. I prefer Google Colab because supports Python 3 runtime with CPU, GPU or TPU processors. You can also mount your Google Drive as an external drive.

Colab prepares the virtual machine for you with Jupyter notebook. Notebook contains several cells, where you can write Python code and run it with "Play" button.

There is also the magic behind "magic syntax sugar", which allows you to use more advanced commands. One of them is %%bash, which converts the rest of the commands into bash commands.

When we combine the above knowledge, we will get a powerful tool like an army knife:

1. Open Google Colab and create a new notebook.

2. Create a new notebook and rename it. At the right top click to button Connect, which starts your virtual machine. 

3.Now it is time to mount your Google Drive. Click to button Mount Drive. You have to authorize Colab application to your account. The path to your files is drive/My Drive/

Copy this snippet of code int your Colab cell

There are two - change directory and unzip desired files.

Click to "play" button to run the selected cell (=snippet of code). Files will be extracted into same locations.

In my example, I did Google Takeout to my Google Drive and backup zip files were saved into Takeout folder to my root folder. 

Bash run cell in a subprocess, so it could take some time to unzip a lot of files