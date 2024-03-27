



Key-value stores such as Bigtable have the ability to process hundreds of thousands of events per second with very low latency and are recommended for these workloads. However, key-value lookups often require a lot of careful production and scaling code to ensure that the processing can run with low latency and good operational performance.

With the new Apache Beam Enrichment transformation, this process is reduced to just a few lines of code, allowing you to process events in messaging systems such as Pub/Sub or Apache Kafka, enrich them with data in Bigtable, and then send them. can. Further processing.

This is important for streaming applications because streaming joins enrich the data and give meaning to streaming events. For example, knowing what's in a user's shopping cart or whether they've viewed similar products before can bring valuable context to the clickstream data that feeds recommendation models. Identifying fraudulent in-store credit card transactions requires much more information than the current transaction, such as previous purchase locations, number of recent transactions, and whether there is a travel notification. Similarly, enriching telemetry data from factory floor hardware with historical signals from the same devices and fleet-wide statistics can help machine learning (ML) models predict failures before they occur. Masu.

The Apache Beam enrichment transformation can optionally handle client-side throttling to rate limit the number of requests sent to your Bigtable instance. Retry the request using a configurable retry strategy. The default is exponential backoff. When combined with autoscaling, Bigtable and Dataflow can scale up and down in parallel and reach equilibrium automatically. Beam 2.5.4.0 supports exponential backoff, which can be disabled or replaced with a custom implementation.

Let's see this in action:

