



Posted by Paul Ruiz, Developer Relations Engineer, Kris Tonthat, Technical Writer

MediaPipe solution is now available in preview

This week at Google I/O 2023, we introduced MediaPipe Solutions, a new collection of on-device machine learning tools that simplify the development process. It consists of MediaPipe Studio, MediaPipe Tasks, and MediaPipe Model Maker. These tools provide mobile, web, desktop, and IoT developers with no-to-low-code solutions for common on-device machine learning tasks such as speech classification, segmentation, and text embedding.

new solution

In December 2022, we launched a MediaPipe preview with five tasks: gesture recognition, hand landmarks, image classification, object detection, and text classification. Today, we are happy to announce that we have launched nine additional tasks for Google I/O. More tasks will be added in the future. These new tasks include:

Face Landmark detects facial landmarks and blendshapes to determine human expressions such as smiles, raised eyebrows, and blinks. Additionally, this task is useful for applying effects to her 3D face according to the user’s movements. image segmenter. You can divide the image into regions based on predefined categories. You can use this feature to identify humans or multiple objects and apply visual effects such as background blur. The interactive segmenter takes a region of interest in an image, estimates the object boundaries at that location, and returns the object segmentation as image data. The upcoming Image Generator will allow developers to apply diffusion models within their apps to create visual content. Face Stylizer. You can take an existing style reference and apply it to the user’s face.media pipe studio

Our first MediaPipe tool lets you view and test MediaPipe compatible models on the web without having to write your own custom test application. You can also use MediaPipe Studio now in preview to try out the new tasks and all the additional features described here by visiting the MediaPipe Studio page.

Additionally, we have plans to extend MediaPipe Studio to provide a no-code model training solution that will allow you to create new models without a lot of overhead.

MediaPipe task

MediaPipe Tasks uses low-code libraries to simplify on-device ML deployments for web, mobile, IoT, and desktop developers. On-device machine learning solutions like the examples above can be easily integrated into your application with just a few lines of code without having to learn all the implementation details behind these solutions. These now include his three categories of tools: vision, audio and text.

To better understand how to use MediaPipe tasks, let’s take a look at an Android app that performs gesture recognition.

The following code uses a built-in machine learning model to create a GestureRecognizer object that can be iterated over to return a list of recognition results based on the input image.

// Step 1: Build a gesture recognizer valbaseOptions = BaseOptions.builder() .setModelAssetPath(“gesture_recognizer.task”) .build() valgestureRecognizerOptions = GestureRecognizerOptions.builder() .setBaseOptions(baseOptions) .build() valgestureRecognizer = GestureRecognizer .createFromOptions( context,gestureRecognizerOptions) // Step 2: Prepare the image val mpImage = BitmapImageBuilder(bitmap).build() // Step 3: Run inference val result =gestureRecognizer.recognize(mpImage)

As you can see, you can implement seemingly complex functionality in your application with just a few lines of code. Combined with other Android features such as CameraX, it can provide an enjoyable experience for your users.

Another big advantage of using MediaPipe tasks, in addition to simplicity, is that your code will look similar on multiple platforms regardless of which task you’re using. This makes development even faster as the same logic can be reused in each application.

MediaPipe Model Maker

Being able to recognize and use gestures within your app is great, but what if you need to recognize custom gestures beyond those provided by the built-in models? That’s where MediaPipe Model Maker comes in. With Model Maker, you can retrain built-in models on your dataset with just a few hundred new hand gesture examples to quickly create brand new models tailored to your needs. For example, you can customize your model to play rock-paper-scissors with just a few lines of code.

from mediapipe_model_maker importgesture_recognizer data =gesture_recognizer.Dataset.from_folder(dirname=’images’) train_data, validation_data = data.split(0.8) model =gesture_recognizer.GestureRecognizer.create( train_data=train_data, validation_data=validation_data, hparams=gesture_recognizer.HParams( export_dir =export_dir) ) metric = model.evaluate(test_data) model.export_model(model_name=’rock-paper-scissors.task’)

After retraining your model, you can use it in your app using MediaPipe tasks for an even more versatile experience.

getting started

To learn more, watch our I/O 2023 sessions: Easy On-Device ML with MediaPipe, Supercharging Web Apps with Machine Learning and MediaPipe, and What’s New in Machine Learning. Also, check out the official documentation at developers.google.com. /media pipe.

what’s next?

We will continue to improve and deliver new features for the MediaPipe solution, including new MediaPipe tasks and no-code training with MediaPipe Studio. You can also stay up to date by joining the MediaPipe Solutions Announcements Group. Announcements will be sent in this group when new features are available.

We look forward to seeing the exciting creations you create, so please share them with @googledevs and the developer community.

