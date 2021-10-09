



Since releasing Databricks on Google Cloud earlier this year, we’re excited to see stories about the value this collaborative solution has brought to data teams around the world. One of our favorite quotes is from Douglas Mettenburg, Vice President of Analysis at JB Hunt. Eventually, Databricks on Google Cloud became JB Hunt’s trusted source of information. By creating more AI solutions that have a significant impact on the business, we are demonstrating the true value of the data we bring to the entire company.

As Douglas explains, Google Cloud’s Databricks are designed to store all your data on a simple, open lakehouse platform that integrates all your analytics and AI workloads. Facilitate data-driven decision making within your organization by enhancing collaboration between data engineering, data science, and analytics teams using a cloud-based lakehouse architecture. And for even easier access, this solution can be used with other infrastructure within the Google Cloud console.

It’s easy to take the first step with Databricks on Google Cloud. Follow the onboarding guide below, which outlines the step-by-step procedure. You can also see the actual steps in these steps in the demo video.

1. Subscribe to Databricks from the GCP Marketplace

Start by logging in to the Google Cloud Platform. If you are a new user, you will need to create an account before you can subscribe to Databricks. Once in the console, first select an existing Google Cloud project or create a new one to see the Google Cloud Identity org object defined within the Google Cloud Console. This procedure requires permission from your billing administrator to set up a Google billing account or select an existing account that you can use for Databricks.This is in the left navigation bar of the GCP console[請求]Can be done using.

Find Databricks in Partner Solutions on the GCP console or search on Marketplace. Now you are ready to subscribe.

After reviewing the terms of use, you can log in using the familiar blue Google SSO. Tight integration with Google IAM makes it easy to authenticate Databricks workspace users with a Google Cloud Identity account through Google’s OAuth 2.0 implementation. This means that Databricks will not have access to your login information, eliminating the risks associated with storing or protecting your credentials at Databricks.

2. Prerequisites for configuring Databricks on GCP

Now that you’re almost ready to create your first Databricks workspace, check the following prerequisites first.

Ensuring proper resource allocation

You need to allocate a minimum quota to the target Google Cloud region where your Databricks cluster runs. If your project has less than the GCP defaults, it’s a good idea to look at the entire list of assignments in your user documentation.

Determine the size of the network

Next, configure the GKE subnet used by the Databricks workspace. You can run it only once before creating your first workspace. This is important because the workspace needs enough IP space for the Databricks job to run successfully. For convenience, Databricks provides a calculator to help you determine if your subnet’s default IP range meets your needs.

Check session length constraints

Databricks will not work properly if your IT administrator has set a global constraint on the session length of the logged-in user.In that case, give the administrator to Google Workspace[信頼できるアプリ]Ask to add Databricks to the list. Learn more about.

3. Create your first workspace

Now you are ready to create your Databricks workspace. After setting the prerequisites, create your first workspace in the Databricks account console with your name, region, and Google Cloud Project ID.

4. Add users to the workspace

Databricks administrators can manage user accounts in the Administration Console. As an administrator, you can:

Invite or delete more users. Assign another user as an administrator to grant cluster creation permissions.

Create role-based access control (RBAC) groups so that you can set different permissions for each user group. Again, native IAM integration makes user authentication much easier.

5. Run the first Databricks job

The fun begins now! Create a new cluster in the new Databricks workspace to provide a compute engine instance for running queries and jobs. The first time you create a new cluster, Databricks bootstraps the GKE cluster. This can take up to 20 minutes. Subsequent Databricks clusters only take a few minutes.

Check out the Quick Start Tutorial Notebook to make sure it’s all working. A notebook is a collection of cells that perform calculations on a Databricks cluster. Once you connect your notebook to the cluster, you can start executing queries in any of the supported languages ​​such as Python, SQL, R, Scala and switch between them in the same notebook.

Here we create a table using the data from the sample CSV data files available in the Databricks dataset, which is a collection of datasets mounted on the Databricks File System (DBFS), which is a distributed file system installed in the Databricks cluster. doing.

Write CSV data in DeltaLake format to create a Delta table. Delta Lake is an open table format that brings reliability, security, and performance to your data lake. The Delta Lake format consists of Parquet files and transaction logs that use Delta Lake to provide the best performance for future operations on your tables.

Next, load the CSV data into a DataFrame and export it in DeltaLake format. This command uses the Python language magic commands. This allows you to interweave commands in languages ​​other than the notebook’s default language (SQL).

Now you are ready to create the Delta table in the saved location, execute the SQL statement, and query the table for the average diamond price by color. Click the bar chart icon to see a graph of average diamond prices by color.

that’s it! This is a way to get started as a user by setting up Databricks in your Google Cloud account, creating workspaces, clusters, and notebooks, then running SQL commands and viewing the results.

do you have any questions?

Sign up for an instructor-led live hands-on workshop to get answers to your questions and learn how to get started with Databricks on Google Cloud. There are multiple dates to choose from – sign up now!

