Connect with us

Tech

Accelerate machine learning model inference with Google Cloud Dataflow using NVIDIA GPUs

Published

on

 


Today, Google Cloud announced that in partnership with NVIDIA, Dataflow will bring GPUs to the world of big data processing, unlocking new possibilities. Dataflow GPUs allow users to take advantage of NVIDIA GPU capabilities in machine learning inference workflows. Here’s how to use BERT to access these performance benefits.

Google Clouds Dataflow is a managed service for performing a variety of data processing patterns, including both streaming and batch analytics. Recently, GPU support has been added to speed up machine learning inference workflows running in the Dataflow pipeline.

Check out the Google Cloud launch post for more exciting new features. In this post, I’ll show you the performance benefits and total cost of ownership of NVIDIA GPU acceleration by deploying a bidirectional encoder representation (BERT) model from a transformer that has been fine-tuned in a Dataflow question answering task. Introducing TensorFlow inference in Dataflow using CPU, how to significantly improve performance and execute the same code on GPU, best performance after converting a model via NVIDIA TensorRT, TensorRTs using Dataflow Demonstrates how to deploy via the python API. Check out the NVIDIA sample code to try it out now.

Figure 1. Dataflow architecture and GPU runtime.

There are several steps mentioned in this post. First, create an environment on your local machine and run all these Dataflow jobs. See the Dataflow Python Quick Start Guide for more information.

Creating an environment

We recommend that you create a virtual environment for Python. Here we use virtualenv.

virtualenv -p

If you use Dataflow, you need to match the Python version of your development environment with the Python version of the Dataflow runtime. Specifically, when running the Dataflow pipeline, you should use the same Python version and Apache Beam SDK version to avoid unexpected errors.

Then activate the virtual environment.

Source / bin / activate

One of the most important things to keep in mind before activating a virtual environment is to make sure you are not operating in another virtual environment. This is usually because it causes problems.

After activating the virtual environment, you are ready to install the required packages. The job is running in Dataflow, but I need some packages locally so that Python won’t complain when I run the code locally.

pip install apache-beam[gcp]
pip install TensorFlow == 2.3.1

You can try different versions of TensorFlow, but the key here is to match the version here with the version you use in your Dataflow environment. You also need Apache Beam and its Google Cloud components.

Get a fine-tuned BERT model

NVIDIA NGC has a wealth of resources, from GPU-optimized containers to fine-tuned models. Examine some NGC resources.

The first resource to use is the BERT large model, fine-tuned for the Squad V2 question answering task, which contains 340 million parameters. The following command downloads the BERT model.

wget –content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/bert_tf_savedmodel_large_qa_squad2_amp_384/versions/19.03.0/zip -O bert_tf_savedmodel_large_qa_squad2_amp_384_19.03.0.zip

The downloaded BERT model uses Automix Precision (AMP) during training and has a sequence length of 384.

You also need a vocabulary file, which you can get from the BERT checkpoint that you can get from NGC using the following command:

wget –content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/bert_tf_ckpt_large_qa_squad2_amp_128/versions/19.03.1/zip -O bert_tf_ckpt_large_qa_squad2_amp_128_19.03.1.zip

Once you have these resources, you need to unzip them and place them in your working folder. Using a custom Docker container, these models are included in the image.

Custom Dockerfile

Use a custom Dockerfile derived from a GPU-optimized NGC SensorFlow container. The NGC TensorFlow (TF) container is a great option for accelerating TF models using NVIDIA GPUs.

Then add a few more steps to copy these models and files. The Dockerfile is here and below is a snapshot of the Dockerfile.

FROM nvcr.io/nvidia/tensorflow: 20.11-tf2-py3 RUN pip install –no-cache-dir apache-beam[gcp]== 2.26.0 ipython pytest pandas && mkdir -p / workspace / tf_beam COPY –from = apache / beam_python3.6_sdk: 2.26.0 / opt / apache / beam / opt / apache / beam added. / workspace / tf_beam WORKDIR / workspace / tf_beam ENTRYPOINT [ “/opt/apache/beam/boot”]

The next step is to create a Docker file and push it to Google Container Registry (GCR). This can be done with the following command: Alternatively, you can use the script created here. If you are using the script in the repository, just run bashbuild_and_push.sh.

project_id = ““dockerbuild .-t” gcr.io / $ {project_id} / tf-dataflow- $ {USER}: latest “docker push” gcr.io / $ {project_id} / tf-dataflow-$ {USER}: latest “Job Execution of

If you have already authenticated your Google account, you can run the Python file provided here by calling run_cpu.sh. The run_gpu.sh script is available in the same repository.

CPU TensorFlow inference in data flow (TF-CPU)

The bert_squad2_qa_cpu.py file in the repository is designed to answer your questions based on a descriptive text document. The batch size is 16. That is, each inference call answers 16 questions and has 16,000 questions (1,000 batches of questions). Keep in mind that BERT can be fine-tuned for other tasks given a particular use case.

When you run a job in Dataflow, it automatically scales based on real-time CPU utilization by default. If you want to disable this feature, you must set autoscaling_algorithm to NONE. This allows you to choose the number of workers you will use throughout the duration of your work. Alternatively, you can have Dataflow autoscale the job and limit the maximum number of workers it can use by setting the max_num_workers parameter.

To better track your job by setting the job_name parameter, we recommend that you set the job name instead of using the auto-generated name. This job name will be the prefix of the compute instance running the job.

Acceleration by GPU (TF-GPU)

To run the same dataflow TensorFlow inference job that supports GPUs, you need to set the following parameters: See the Dataflow GPU documentation for more information. See the Dataflow GPU documentation for more information.

-Experiment “worker_accelerator = type: nvidia-tesla-t4; count: 1; install-nvidia-driver”

The above parameters allow you to connect an NVIDIA T4 Tensor core GPU to a Dataflow worker VM. It also appears as a Compute VM instance running the job. Dataflow will automatically install the required NVIDIA drivers to support CUDA 11.

The bert_squad2_qa_gpu.py file is similar to the bert_squad2_qa_cpu.py file. This means that you can run your job using the NVIDIA GPU with little or no changes. In this example, there are some additional GPU setups, such as setting the memory increase with the code below.

Physical_devices = tf.config.list_physical_devices (‘GPU’) tf.config..set_memory_growth (physical_devices)[0], True) Inference by NVIDIA optimization library

NVIDIA TensorRT optimizes deep learning models for inference, providing low latency and high throughput (more info). Here, we use NVIDIA TensorRT optimization for the BERT model and use it to answer questions in the Dataflow pipeline with GPUs at the speed of light. Users can follow the TensorRT demo BERT github repository.

It also uses Polygraphy, a high-level Python API for TensorRT, to load TensorRT engine files and perform inference. In the Dataflow code, the TensorRT model is encapsulated in a shared utility class, making it available to all threads in the Dataflow worker process.

Comparison of CPU and GPU execution

Table 10 shows the total run time and resources used to run the sample. The final cost of a Dataflow job is a linear combination of total vCPU time, total memory time, and total hard disk usage. For GPUs, there are also GPU components.

FrameworkMachineWorkersCount Total execution time Total vCPU time Total memory time Total HDDPD time TCO improvement TF-CPUn1-standard-822: 46: 0043.5163.131359.41xTF-GPUN1-standard-4 + T410: 35: 512.258.44140.649.2xTensorRTN1-standard-4 + T410: 09: 510.531.9933.0938x table. Total run time and resource usage for sample TF-CPU, TF-GPU, and TensorRT runs.

Note that the above table was compiled based on execution and the exact number may vary slightly, but according to our experiments, the ratio did not change much.

Accelerating your model with an NVIDIA GPU (TF-GPU) compared to using a CPU (TF-CPU) will result in more than 10x total savings, including cost and run-time savings. .. In short, using NVIDIA GPUs to infer this task reduces execution time and costs compared to running the model using only the CPU.

NVIDIA optimized inference libraries such as TensorRT allow users to run more complex and larger models on Dataflow’s GPUs. TensorRT makes the same job 3.6 times faster and costs 4.2 times less than running it using a TF-GPU. Comparing TensorRT and TF-CPU, the execution time is 1/17 and the billing amount is about 1/38.

Overview

In this post, we compared the performance of TF-CPU, TF-GPU, and TensorRT inference for question answering tasks running on Google Cloud Dataflow. Dataflow users can benefit greatly from leveraging GPU workers and NVIDIA-optimized libraries.

It’s very easy to speed up inference of deep learning models using NVIDIA GPUs and NVIDIA software. You can run the model using TF-GPU or TensorRT by adding or modifying a few lines. For reference, I have provided the scripts and source files here and here.

Acknowledgments

Thanks to the Google Cloud Dataflow team for their support and valuable feedback from Shan Kulandaivel, Valentyn Tymofieiev, Reza Rokni, NVIDIA’s Jill Milton and Fraser Gardiner.

Sources

1/ https://Google.com/

2/ https://developer.nvidia.com/blog/accelerating-machine-learning-model-inference-on-google-cloud-dataflow-with-nvidia-gpus/

The mention sources can contact us to remove/changing this article

What Are The Main Benefits Of Comparing Car Insurance Quotes Online

LOS ANGELES, CA / ACCESSWIRE / June 24, 2020, / Compare-autoinsurance.Org has launched a new blog post that presents the main benefits of comparing multiple car insurance quotes. For more info and free online quotes, please visit https://compare-autoinsurance.Org/the-advantages-of-comparing-prices-with-car-insurance-quotes-online/ The modern society has numerous technological advantages. One important advantage is the speed at which information is sent and received. With the help of the internet, the shopping habits of many persons have drastically changed. The car insurance industry hasn't remained untouched by these changes. On the internet, drivers can compare insurance prices and find out which sellers have the best offers. View photos The advantages of comparing online car insurance quotes are the following: Online quotes can be obtained from anywhere and at any time. Unlike physical insurance agencies, websites don't have a specific schedule and they are available at any time. Drivers that have busy working schedules, can compare quotes from anywhere and at any time, even at midnight. Multiple choices. Almost all insurance providers, no matter if they are well-known brands or just local insurers, have an online presence. Online quotes will allow policyholders the chance to discover multiple insurance companies and check their prices. Drivers are no longer required to get quotes from just a few known insurance companies. Also, local and regional insurers can provide lower insurance rates for the same services. Accurate insurance estimates. Online quotes can only be accurate if the customers provide accurate and real info about their car models and driving history. Lying about past driving incidents can make the price estimates to be lower, but when dealing with an insurance company lying to them is useless. Usually, insurance companies will do research about a potential customer before granting him coverage. Online quotes can be sorted easily. Although drivers are recommended to not choose a policy just based on its price, drivers can easily sort quotes by insurance price. Using brokerage websites will allow drivers to get quotes from multiple insurers, thus making the comparison faster and easier. For additional info, money-saving tips, and free car insurance quotes, visit https://compare-autoinsurance.Org/ Compare-autoinsurance.Org is an online provider of life, home, health, and auto insurance quotes. This website is unique because it does not simply stick to one kind of insurance provider, but brings the clients the best deals from many different online insurance carriers. In this way, clients have access to offers from multiple carriers all in one place: this website. On this site, customers have access to quotes for insurance plans from various agencies, such as local or nationwide agencies, brand names insurance companies, etc. "Online quotes can easily help drivers obtain better car insurance deals. All they have to do is to complete an online form with accurate and real info, then compare prices", said Russell Rabichev, Marketing Director of Internet Marketing Company. CONTACT: Company Name: Internet Marketing CompanyPerson for contact Name: Gurgu CPhone Number: (818) 359-3898Email: [email protected]: https://compare-autoinsurance.Org/ SOURCE: Compare-autoinsurance.Org View source version on accesswire.Com:https://www.Accesswire.Com/595055/What-Are-The-Main-Benefits-Of-Comparing-Car-Insurance-Quotes-Online View photos

ExBUlletin

to request, modification Contact us at Here or [email protected]