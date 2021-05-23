



Prior to Google I / O, Google Research released a new pose detection model called MoveNet on TensorFlow.js. This ultra-fast and accurate model can detect 17 important points in the human body. MoveNet is currently available on the TF Hub with two variants, Lightning and Thunder.

Lightning is aimed at applications where latency is important, while Thunder is aimed at applications that require higher accuracy. Both models claim to run faster than real-time (30+ frames per second (FPS)) on most personal computers, laptops, and phones.

The model can be launched in a browser using the TensorFlow.js architecture without having to call the server after loading the first page or external packages. You can get the live demo version here.

MoveNet tracks important points through fast movements and asymmetrical poses. (Source: TensorFlow)

Currently, the MoveNet model works for individuals within the field of view of the camera. But soon, Google Research is considering extending the MoveNet model to multiple domains so that developers can support their applications with multiple people.

How is MoveNet different from other models?

OpenPose, VIBE, and Adobes BodyNet are other major players in this area. Human pose estimation has come a long way in the last five years, but most companies have made pose models larger and more accurate than engineering work to enable them to be deployed quickly and anywhere. It hasn’t appeared in many applications because it focuses on doing.

Google has designed a model that leverages the SOTA architecture, but keeps the inference time as short as possible. This allows this model to provide accurate key points across different poses, environments and infrastructures.

Recently, Google Research worked with Ohio-based medical technology company Include Health to provide remote care to patients. Using MoveNet, the company has developed an interactive web application that guides patients through a variety of routines via laptops, smartphones, or tablets. The exercise was virtually constructed and prescribed by a physiotherapist to test balance, strength, and range of motion.

Visual of the Include Health demo application running in a browser that certifies balance and motion using keypoint estimation using TensorFlow.js and MoveNet. (Source: TensorFlow)

Google has provided MoveNet for Include Health, which can be accessed via a new pose detection API. IncludeHealth has integrated this model into the application.

Ryan Eder, founder and CEO of Include Health, said the MoveNet model has improved the speed and accuracy of delivering prescription care. This unique balance unleashed the next generation of care offerings, while other models were exchanging with each other, Edder said.

MoveNet enables instructors / experts to access users in real time (30+ FPS) and provide personalized solutions by bringing remote fitness, dance, physiotherapy and yoga sessions online. ..

While regular detectors are sufficient to track simple movements, even more complex poses can still be difficult, even with SOTA detectors trained with incorrect data. MoveNet claimed to provide quick and accurate results regardless of body posture.

Comparison of traditional detectors (top) and MoveNet (bottom) in difficult postures and routines. (Source: TensorFlow) MoveNet Architecture

MoveNet uses heatmaps to localize key human points, also known as bottom-up estimation models. This architecture consists of a feature extractor and a set of predictive heads. The forecasting scheme is also CenterNet compliant, with significant changes to improve both speed and accuracy. The TensorFlow Object Detection API is used to train all models.

The feature extractor used in the MoveNet architecture is MobileNet V2 with the Attacked Feature Pyramid Network (FPN), which enables high resolution and semantically rich feature map output / results. For the predictor head, there are four connected to the feature extractor, which predicts the following instances:

People-centric heatmap Keypoint regression field People keypoint heatmap 2D Offset field for each keypoint MoveNet architecture (Source: TensorFlow)

MoveNet was trained on COCO, and an internal Google dataset called Active. While COCO is primarily suitable for fitness and dance applications that show challenging poses and large motion blur, the active dataset is a key point for YouTube yoga, fitness and dance videos (COCO’s standard 17 body keys). It was created by labeling (adopt points). Details of the model can be found here.

Image from active keypoint dataset (Source: TensorFlow) MoveNet browser performance

To quantify the inference speed of MoveNet, the model was tested on multiple devices. Model latency (FPS) was measured on the GPU using WebGL and WebAssembly (WASM).

The image below shows the delay on multiple devices such as MacBook Pro, iPhone, Pixel and personal desktop.

Introducing MoveNet performance metrics in the browser. The first number in each cell indicates Lightning latency, and the second number is for Thunder. (Source: TensorFlow) Google Research used several techniques, including a WebGL kernel for depth-separable convolution and improved GL scheduling for mobile Chrome. TensorFlow.js, on the other hand, aims to continuously optimize the backend to facilitate model execution on all supported devices. This is achieved by iterative benchmarking and backend optimization.

Amit Raja Naik is a senior writer for Analytics India Magazine and delves into the latest innovations. He is also a professional bassist.

