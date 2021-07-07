



The SkillPractical Google Cloud Professional Data Engineer Certification Test is for data scientists, solutions architects, DevOps engineers, and anyone who wants to move to machine learning and data engineering in the context of Google. Students should have some familiarity with the basics of GCP, such as storage, computing, and security. Some basic coding skills (such as Python). And a good understanding of the database. You don’t have to have a background in data engineering or machine learning, but GCP experience is essential.

This is a highly accredited and highly recommended students to take the Skill Practical Google Accredited Associate Cloud Engineer Exam in advance.

For reference, 87% of Google Cloud certified users are confident in their cloud skills.

Purpose of course learning

Design a data processing system

Build and maintain data structures and databases

Analyze data and enable machine learning

Optimize data representation, data infrastructure performance, and cost

Ensuring the reliability of your data processing infrastructure

Visualize data

Design a secure data processing system

Description of the course syllabus:

1. Data processing system design

1.1 Choosing the right storage technology. Here are some considerations:

Map storage systems to business requirements

Data modeling

Trade-offs including latency, throughput and transactions

Distributed system

Schema design

1.2 Data pipeline design. Here are some considerations:

Data publishing and visualization (BigQuery, etc.)

Batch and streaming data (eg Cloud Dataflow, Cloud Dataproc, Apache Beam, Apache Spark and Hadoop ecosystem, Cloud Pub / Sub, Apache Kafka)

Online (interactive) and batch forecasting

Job automation and orchestration (eg Cloud Composer)

1.3 Data processing solution design. Here are some considerations:

Infrastructure selection

System availability and fault tolerance

Use of distributed system

Capacity planning

Hybrid cloud and edge computing

Architecture options (eg message broker, message queue, middleware, service-oriented architecture, serverless features)

Event processing at least once, in turn, exactly once, etc.

1.4 Data warehousing and data processing migration. Here are some considerations:

Awareness of the current state and how to move the design to a future state

Migrating from on-premises to the cloud (data transfer services, transfer appliances, cloud networking)

Migration verification

2. Construction and operation of data processing system

2.1 Construction and operation of storage system. Here are some considerations:

Effective use of managed services (Cloud Bigtable, Cloud Spanner, Cloud SQL, BigQuery, Cloud Storage, Cloud Datastore, Cloud Memorystore)

Storage cost and performance

Data lifecycle management

2.2 Construction and operation of pipeline. Here are some considerations:

2.3 Construction and operation of processing infrastructure. Here are some considerations:

3. Operation of machine learning model

3.1 Utilize the pre-built ML model as a service. Here are some considerations:

ML API (Vision API, Speech API, etc.)

ML API customization (eg AutoML Vision, Auto ML text)

Conversation experience (eg Dialogflow)

3.2 ML pipeline deployment. Here are some considerations:

Get the right data

Machine learning model retraining (Cloud Machine Learning Engine, BigQuery ML, Kubeflow, Spark ML)

Continuous evaluation

3.3 Choosing the right training and service infrastructure. Here are some considerations:

Distributed machine and single machine

Use of edge computing

Hardware accelerator (GPU, TPU, etc.)

3.4 Measuring, monitoring and troubleshooting machine learning models. Here are some considerations:

Machine learning terms (eg features, labels, models, regression, classification, recommendations, supervised and unsupervised learning, metrics)

Impact of machine learning model dependencies

Common causes of errors (eg data assumptions)

4. Ensuring the quality of the solution

4.1 Designed for security and compliance. Here are some considerations:

ID and access management (cloud IAM, etc.)

Data security (encryption, key management)

Ensuring privacy (eg data loss prevention API)

Legal compliance (eg Health Insurance Portability and Accountability Act (HIPAA), Children’s Online Privacy Protection Act (COPPA), FedRAMP, General Data Protection Regulation (GDPR))

4.2 Ensuring scalability and efficiency. Here are some considerations:

Build and run test suites

Pipeline monitoring (eg Stackdriver)

Evaluate, troubleshoot, and improve your data representation and data processing infrastructure

Resource resizing and autoscaling

4.3 Ensuring reliability and fidelity. Here are some considerations:

Perform data preparation and quality control (eg Cloud Dataprep)

Verification and monitoring

Data recovery planning, execution, and stress testing (fault tolerance, rerunning failed jobs, performing retroactive reanalysis)

Choose from eventual consistency requirements such as ACID, idempotence, etc.

4.4 Ensuring flexibility and portability. Here are some considerations:

Mapping to current and future business requirements

Designed for data and application portability (eg multi-cloud, data resident requirements)

Data staging, cataloging, and discovery

