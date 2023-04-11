



Google Cloud Dataflow provides a serverless architecture that can be used to shard and process very large batch datasets or large live streams of data in parallel. This short tutorial will show you how.

Many companies are leveraging Google Cloud Platform (GCP) for their data processing needs. Every day, millions, if not billions, of new data points are generated at the edge or in the cloud in a variety of formats. A scalable platform is required to handle this large amount of data.

Google Cloud Dataflow is a fully managed service that uses Java and Python APIs and the Apache Beam software development kit to transform and enrich data as a stream (real time) or in batch mode (for historical). Dataflow provides a serverless architecture that can be used to shard and process very large batch datasets or massive live streams of data and process them in parallel.

A Dataflow template is an Apache Beam pipeline written in Java or Python. Dataflow templates let you run pre-built pipelines while specifying your own data, environment, or parameters. You can choose a template provided by Google or customize your own template. Google Cloud Dataflow prebuilt templates let you stream or bulk load data from one source to another (Pub/Sub, Cloud Storage, Spanner, SQL, BigTable, BigQuery, etc.) through an easy-to-use interface. can be accessed by Google Cloud Console.

Redis Enterprise is used extensively across Google Cloud’s customer base for many purposes including real-time transactions, chat/messaging, gaming leaderboards, medical claims processing, real-time inventory, geospatial applications, and media streaming. As an in-memory database, Redis Enterprise consistently delivers millions of operations per second with sub-millisecond latency. Redis Enterprise is therefore a perfect complement to many of the native Google Cloud managed services that facilitate real-time user experiences.

As a practical introduction, we present a custom template built for Google Cloud Dataflow to ingest data into a Redis Enterprise database via Google Cloud Pub/Sub. A template is a streaming pipeline that reads messages as key-value strings from a Pub/Sub subscription to a Redis Enterprise database. Support for other data types such as lists, hashes, sets and sorted sets will be built over time by experts from Redis and Google, and possibly contributors from the open source community.

Our Motivation: Let’s Make This Easy

We want developers to have a great experience with Google Cloud Dataflow and Redis Enterprise.

Using pre-made templates has many advantages.

You can run pipelines without the development environment and associated dependencies that are common in non-templated deployments. This is useful for scheduling regular batch jobs. Runtime parameters let you customize how your pipeline runs. The template separates building the pipeline (which the developer does) from running the pipeline (which may be the responsibility of someone else). So you don’t have to recompile your code every time the pipeline runs. Non-technical users can run templates using the Google Cloud Console, Google Cloud Command Line Interface, or REST API. You can extend the template with user-defined functions. How to use Dataflow templates: step by step

It helps us understand how the process works. Here’s a high-level workflow that shows how to configure a Dataflow pipeline with a custom template.

In this example, we will process a message that arrives at a predefined Pub/Sub subscription and insert the message as a key-value pair into a Redis Enterprise database.

From the Dataflow GCP console, enter your pipeline name and region endpoint,[カスタム テンプレート]Choose. Enter gs://redis-field-engineering/redis-field-engineering/pubsub-to-redis/flex/Cloud_PubSub_to_Redis for the template path.

Then enter the Pub/Sub subscription name that holds the incoming messages. Add Redis Enterprise database parameters such as Redis database host, Redis database port, and Redis default user authentication password.

[パイプラインの作成]Choose. The pipeline is now set up to receive incoming messages. If you like celebrating small victories, you can also cheer.

Your sample message is ready to be published to your Pub/Sub topic. Enter sample data and[公開]Choose.

Make sure the sample data is public, just for peace of mind. To verify that data has been inserted into your Redis Enterprise database, you can use Redis Insight, a Redis GUI that supports command-line operations on a desktop client.

it’s just the beginning

This custom Dataflow template support model currently uses a community-based support mechanism. This means it is supported by the open source community on our GitHub repository.

Check out the open source code and give us your feedback. We encourage you to add new features. When you’re ready to contribute, fork the GitHub repository and create a pull request. Your support will make this project more successful and sustainable.

