Connect with us

Tech

Export your datastore to BigQuery using Google Dataflow | by Jakub Krajniak | April 2021

Avatar

Published

on



How to use Google Dataflow to export a datastore to BigQuery using additional filtering of Puerto De La Cruz (by author) entities

The final story showed how to build a serverless solution that exports all kinds from a data store to BigQuery. The approach presented in that article is perfectly valid and works well with large data stores. However, the main drawback is that it happens every time you export every row from your datastore to BigQuery. Also, for large data stores, this can incur unnecessary costs and take longer than necessary.

One way to solve that could be a stream of updates to the database. For example, AWS DynamoDB provides streams that you can easily link to AWS Lambdas. There is a very similar feature in Google Firestore (called the next generation datastore) that triggers Cloud Function when you modify a document. Please refer to the documentation.

The datastore does not provide streaming capabilities, but you can try to solve the problem using queries. Datastore import / export does not support native filtering of entities. Therefore, you have to do it manually. The procedure is as follows:

Filter the entities, export them to JSON, and save them to Cloud Storage. Load JSON from Cloud Storage into BigQuery.

Let Google Dataflow be adopted for this task.

Google Dataflow is a managed solution for executing various data processing schemas such as ETL, batch processing, and stream processing. However, Google Dataflow is one of the possible realizations of a dataflow model. The SDK used to describe the process is implemented under the framework Apache Beam.

The data flow model is organized around a pipeline, which is a data processing workflow from start to finish. Two objects are important in the pipeline. PCollection represents a distributed dataset and PTransform represents the processing operations of a PCollection.

Overview of pcollection / ptransform (by author)

Use Python as the programming language. However, pipelines can also be built in Java and Golang. A fully functional example is available on the GitHub project (https://github.com/jkrajniak/demo-datastore-export-filtering). Only important code blocks are commented here.

pipeline

Let’s start building the pipeline:

with Beam.Pipeline (options = pipeline_options) as p: # Create queries and filters

This will create a pipeline p containing the options stored in pipeline_options.Then the operator | used to join each PTransform block

Line = p |’Get all types’ >> GetAllKinds (project_id, to_ignore)

This is the first step, reading all types from the datastore for a particular project and generating a PCollection from that list. Internally, this block implements the expand method (below). In addition, filtering is done to remove some types that you do not want to export. Finally, use the Create transform to build a PCollection from a list of types.

Next, for each type, you need to create a query that is implemented by the following PTransform block “create queries”.

rows = (p |’Get all types’ >> GetAllKinds (project_id, to_ignore) |’Create query’ >> Beam.ParDo (CreateQuery (project_id, param)))

Use ParDo, a general-purpose parallel processing conversion block. It accepts objects derived from the beam.DoFn class that need to implement the method process (self, * args, ** kwargs). Below is the implementation of the process method of the CreateQuery class.

def process (self, kind_name, ** kwargs): “” “: param ** kwargs :: param kind_name: a kind name: return: Query” “” logging.info (f’CreateQuery.process {kind_name} {kwargs} ‘)

q = Query (kind = kind_name, project = self.project_id) if kind_name in self.entity_filtering: q.filters = self.entity_filtering[kind_name].get_filter ()

logging.info (query of type {kind_name}: {q}’)

Yield q

The above method is responsible for generating a query that fetches elements based on filtering parameters.Define filtering options using a simple YAML configuration file

One important note about this solution. Datastore entities need fields that can be used to retrieve a subset of records. In this example, we set the field’s timestamp to be used to fetch a subset of records.If the pipeline runs once a day, the record will match the query (endTime-24h) <= timestamp

Then add three more elements to your pipeline.

Apply query and fetch entity Convert entity to JSON Save JSON in BigQuery

The last two stages of the pipeline are very obvious.

Use the query created in the previous step to get the read entity from the datastore from the datastore. As a result, a PCollection of entities is created from the datastore. Each entity is then converted to a JSON representation in beam.Map (entity_to_json). Beam.Map is a special case of beam.ParDo. Gets a single element from the PCollection and produces a single element.

The last element of the pipeline is the output PTransform. Entities of the type that were not filtered are sent to an empty table. The other is owned by filtering and added to the existing table. To send the elements to these two outputs, we use a tagging feature that can generate multiple PCollections.

If the type name is included in the filtering options, tag the element with write_append. Otherwise, tag the element with write_truncate.

Then write the following two split collections to BigQuery.

Each write method uses the SCHEMA_AUTODETECT option. The output table name is dynamically derived from the type name if it needs to be created.

When you run the pipeline in Google Dataflow, the entire job is visualized as follows:

Data pipeline (by author)

What Are The Main Benefits Of Comparing Car Insurance Quotes Online

LOS ANGELES, CA / ACCESSWIRE / June 24, 2020, / Compare-autoinsurance.Org has launched a new blog post that presents the main benefits of comparing multiple car insurance quotes. For more info and free online quotes, please visit https://compare-autoinsurance.Org/the-advantages-of-comparing-prices-with-car-insurance-quotes-online/ The modern society has numerous technological advantages. One important advantage is the speed at which information is sent and received. With the help of the internet, the shopping habits of many persons have drastically changed. The car insurance industry hasn't remained untouched by these changes. On the internet, drivers can compare insurance prices and find out which sellers have the best offers. View photos The advantages of comparing online car insurance quotes are the following: Online quotes can be obtained from anywhere and at any time. Unlike physical insurance agencies, websites don't have a specific schedule and they are available at any time. Drivers that have busy working schedules, can compare quotes from anywhere and at any time, even at midnight. Multiple choices. Almost all insurance providers, no matter if they are well-known brands or just local insurers, have an online presence. Online quotes will allow policyholders the chance to discover multiple insurance companies and check their prices. Drivers are no longer required to get quotes from just a few known insurance companies. Also, local and regional insurers can provide lower insurance rates for the same services. Accurate insurance estimates. Online quotes can only be accurate if the customers provide accurate and real info about their car models and driving history. Lying about past driving incidents can make the price estimates to be lower, but when dealing with an insurance company lying to them is useless. Usually, insurance companies will do research about a potential customer before granting him coverage. Online quotes can be sorted easily. Although drivers are recommended to not choose a policy just based on its price, drivers can easily sort quotes by insurance price. Using brokerage websites will allow drivers to get quotes from multiple insurers, thus making the comparison faster and easier. For additional info, money-saving tips, and free car insurance quotes, visit https://compare-autoinsurance.Org/ Compare-autoinsurance.Org is an online provider of life, home, health, and auto insurance quotes. This website is unique because it does not simply stick to one kind of insurance provider, but brings the clients the best deals from many different online insurance carriers. In this way, clients have access to offers from multiple carriers all in one place: this website. On this site, customers have access to quotes for insurance plans from various agencies, such as local or nationwide agencies, brand names insurance companies, etc. "Online quotes can easily help drivers obtain better car insurance deals. All they have to do is to complete an online form with accurate and real info, then compare prices", said Russell Rabichev, Marketing Director of Internet Marketing Company. CONTACT: Company Name: Internet Marketing CompanyPerson for contact Name: Gurgu CPhone Number: (818) 359-3898Email: [email protected]: https://compare-autoinsurance.Org/ SOURCE: Compare-autoinsurance.Org View source version on accesswire.Com:https://www.Accesswire.Com/595055/What-Are-The-Main-Benefits-Of-Comparing-Car-Insurance-Quotes-Online View photos



picture credit

ExBUlletin

to request, modification Contact us at Here or [email protected]