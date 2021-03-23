



Boost Colab Notebook GPU Photo by Sigmund with Unsplash

Before delving into the tutorial. Let’s talk a little about Jupyter Notebooks.

Jupyter is a free, open source, sharable, interactive web application that allows users to combine code, computational output, visualization, text, and media.

Jupyter notebooks have become the computing tool of choice for data scientists. This name consists of Julia, python, and R. These are three of the more than 40 programming languages ​​currently supported by Jupyter.

Notebook problems

Not all notebooks are great. Some issues related to using notebooks are:

Running long async tasks causes problems with notebooks. Training a model for hours or days, as is common in deep learning, can be tedious, especially when the display times out. The workaround for this is to regularly check the model weights and write the log files to disk.

Error returned when notebook fails (note that this is a user issue)

It fosters poor software engineering practices. For example, lack of unit tests, writing code in reusable chunks.

However, notebooks aren’t designed to address these, so many argue that none of these should be considered a problem. Great for quick experimentation and prototyping. You can move the code into a script and call it from within your notebook.

Jupyter Notebook is a tool for exploration, not production

Google Colaboratory is a free Jupyter-based notebook environment that runs on Google’s cloud servers, allowing users to leverage the hardware provided by Google (CPU, GPU, TPU). The last two are very useful for data scientists and machine learning engineers. Another benefit of using Colab is that you can access Google Drive, which you can mount and access through File Explorer. Colab also comes pre-installed with suites of deep learning libraries such as TensorFlow and Keras.

Google Colab offers a 12GB NVIDIA Tesla K80 GPU with 3.7 Nvidia compute capabilities that can be used continuously for up to 12 hours. This is great for experimentation and prototyping, but when working with large datasets and large networks, you quickly face limitations.

Before getting into the tutorial, it’s important to note that this applies not only to Amazons EC2 instances, but also to Google Clouds Compute Engine, Microsoft Azures virtual machines, and even local setups.

By connecting an external runtime to Colab, you can maintain the Colab interface while using GPU acceleration other than that provided by Colab by default.

With AWS, many people prefer to use Sagemaker because it does the essential work of creating an EC2 instance in the background and setting up a Jupyter notebook connection. Therefore, creating an EC2 instance and connecting to a Colab notebook is cheaper than using SageMaker, yet (many people) maintain a preferred Colab interface.

Comparing the costs, an EC2 p3.2xlarge instance in the Ohio region costs $ 3.06 per hour, while the same instance of SageMaker costs $ 3.825 in the same region.

Let’s dive into.

Log in to the AWS Management Console.

In the upper left corner[サービス]Click to[EC2]Choose.

EC2 dashboard

In the left pane[インスタンス]Select and[インスタンスの起動]Choose.

EC2 left pane

Select an image type. This determines the OS and software version that will be pre-installed. Here, select Ubuntu 18.04 installation and CUDA installation using TensorFlow 2.4.1.

AMI Pictures

Then select the instance type. This will determine the type of hardware you will get. For this tutorial, select a P3.2xlarge instance. This includes a single Nvidia V100 GPU, 8 vCPUs, and 61GB of RAM.

If you are launching a GPU instance for the first time, keep in mind that the GPU instance is not available by default. See the included link for information on how to increase your GPU instance limit on AWS. In addition, GPU instances are not free. See this link for on-demand pricing. Please note that prices vary by region.

If you want to set additional details such as storage and network options,[インスタンスの詳細の構成]You can select. To keep the default[レビューして起動]Choose.

Instance type

Select an existing key pair or create a new key pair. This acts as an authentication to log on to the server. If a new one is created The .pem file is downloaded to your local computer.

Check the details and[起動]Click

Key pair

In the left pane[インスタンス]Click to return to the instance dashboard. This will list all the instances, including the one you just created. The created instance looks like this, the status of the instance is displayed as running, and wait until the status check 2/2 check passes. This means that the instance is currently running.

Click the instance ID. The following screen will be displayed. next,[接続]Click.

Connect via SSH

[SSHクライアント]Click the tab to view the instance’s ssh login details.

SSH details

However, in this tutorial, you will use the putty SSH client to log in to your instance. You can download the putty here. The main reason I used Putty is to be able to configure SSH tunneling. This is required to connect Colab to the EC2 runtime.

Before setting up the putty. Recall that the key file downloaded from AWS a few steps ago is in .pem file format. You need to convert this to a format that the putty can recognize. The putty installation includes another program called PuttyGen. This tool can convert .pem files to .ppk files. Open PuttyGen and[ファイルの読み込み]Select a private key

PuttyGen

Select the .pem file to load. To save the loaded .pem file in .ppk format[ファイル][秘密鍵の保存]Choose.

Save PuttyGen as ppk

Now that the key file is in the correct format, you can set up an SSH connection with putty.

Before some steps,[SSHクライアント]On the tab, I displayed the SSH login information for my EC2 instance. We will need it now. (Note that the SSH login information changes each time you restart the instance. Be sure to copy it again).

[セッションの保存]Specify a name in the text box[保存]Click to save the details in your profile. In the example below, we chose aws as the profile name to save as.

The putty should be set as follows

Putty profile

After saving, click on the profile name and[ロード]Click.

SSH tunneling

In the navigation tree on the left[接続SSHトンネル]Choose.

Enter 8888 for the source port (this is the port where Jupyter is provided, as you will specify later). Then enter 127.0.0.1:8888 in the destination port and[追加]Click. The final result looks like this:

SSH tunneling

This can also be saved in the profile so you don’t have to enter it every time. Go to session, select profile,[保存]Click.

You are now ready to log in to your EC2 instance using Putty. In the navigation tree on the left[接続SSH認証]Go to[参照]Click to select the .ppk file and[開く]Click.

Putty certification

If you log in successfully, your putty will look like this: You now have a command line to control your EC2 instances. As long as the SSH terminal window is open, all traffic from the source port will be forwarded to the destination port.

EC2 terminal

If you don’t already have Jupyter installed, you’ll need to install it from the command line. Follow this link.

To connect to Colab, you need to install the jupyter_http_over_w extension created by Google Colabotory next.Run the following command on the command line

pip install jupyter_http_over_ws jupyter serverextension enable py jupyter_http_over_ws

Start the Jupyter Notebook service on your EC2 instance using the following command

jupyter notebook NotebookApp.allow_origin = https: //colab.research.google.com’ port = 8888 NotebookApp.port_retries = 0 NotebookApp.disable_check_xsrf = true

Specify port 8888 as the same port that was specified during the SSH tunneling configuration.

If all goes well, the IP address will be printed on the terminal screen. You will need to copy the address as you will need it in the next step.

Notebook terminal

Open Google Colab and in the upper right corner[ローカルランタイムに接続]Choose.

Colab local runtime

Now copy the IP address from the terminal and[接続]Click.

Colab backend URL

The final result looks like this: Notice the Local next to Connected. This means that Colab is connected to a runtime other than Colab’s default. We also run some tests to confirm that the GPU has been detected.

ColabNotebook Colab list device

You can see below that the GPU detected is the Tesla V100, which is common for AWS instance type P3.2xlarge.

Collaboration T100

Quick note. After connecting to the local runtime, Colab will not be able to access Google Drive. One possible tool to avoid this is PyDrive.

Don’t forget to turn it off when you’re done using your EC2 instance.

