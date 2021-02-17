



Today, we are announcing the availability of Databricks on Google Cloud. This co-developed service provides a simple and open lakehouse platform for data engineering, data science, analytics and machine learning. It integrates the databricks features that customers love with the data analytics solutions and global scale available from Google Cloud.

Encounter between open data platform and open cloud

Databricks and Google Cloud share a common vision for open data platforms built on open standards, open APIs, and open infrastructure. This partnership gives organizations the choice and flexibility to manage their infrastructure and access their data with the tools they need across the cloud and on-premises environments. By adopting open frameworks and APIs, customers can enjoy the benefits of combining open source with managed cloud analytics and AI products.

What does our new partnership mean to our customers? Enterprises can now implement the Databricks Lakehouse platform on Google Cloud, enabled by Databricks’ Delta Lake. Delta Lake adds data reliability to your data lake with ACID transactions and version control, improving data governance and query performance for Google Cloud Storage data. This announcement paves the way for one simple, integrated architecture for all data applications on Google Cloud, including real-time streaming, SQL workloads, business intelligence, data science, machine learning, and graph analytics.

The open cloud approach also improves interoperability and portability for companies that want to use multiple public clouds for their analytics applications. According to a recent Gartner survey, at least 80% of businesses are adopting multi-cloud strategies in multiple regions. Databricks’ multi-cloud capabilities enable customers to improve the efficiency and productivity of their data processes, improve the customer experience, and create new revenue opportunities, even when data is spread across multiple clouds. For example, a major global fast food company (and Google Cloud customer) wants to build and deploy marketing solutions such as reduced churn rates, behavioral segmentation, and lifetime value in about 12 global markets by the end of 2021. thinking about. A global data platform with Databricks provides businesses in each region with a choice of public cloud platforms.

Streamlined integration

Databricks is tightly integrated with Google Cloud’s computing, storage, analytics and management products to provide our customers with a simple, unified experience with high performance and enterprise security.

Computing and Storage: Built on Google Kubernetes Engine (GKE), Databricks on Google Cloud is the first fully container-based Databricks runtime in any cloud. Take advantage of GKE’s managed services to deliver portability, security, and scalability that developers know and love. Read / write access from Databricks to GCS allows customers to run their workloads faster and at lower cost.

Analytics: Databricks has an optimized connector with Google BigQuery for easy access to BigQuery data directly via the Storage API for high-performance queries. The connector supports additional predicate pushdowns, named table and view queries, executing SQL directly in BigQuery, and loading the results into Apache SparkDataFrame. In addition, Lookers and Databricks integration, SQL Analytics support, and an open API environment on Google Cloud complement the open multi-cloud architecture. This integration allows Looker users to query the data lake directly, providing a whole new visualization experience.

Security and Management: Experience a simplified deployment from the Google Cloud Marketplace with unified billing and one-click setup within the Google Cloud console. The integration of Databricks and Google Cloud Identity allows customers to use Google Cloud credentials for single sign-on and user provisioning with Databricks.

Run Databricks on Google Cloud

Databricks’ most innovative use cases on Google Cloud include retail, telco, media and entertainment, manufacturing, and financial services. In all industries, data is driving digital transformation initiatives. With the Lakehouse architecture, Databricks and Google Cloud customers are finding new ways to accelerate data-driven innovation.

Here are some of the most popular workloads customers are using Databricks today: For more information on industry-specific use cases, please visit our Industry Solutions page.

Data lake modernization

Delta Lake on Databricks provides the latest foundation for migrating from expensive, difficult-to-scale on-premises systems to well-designed Google Cloud Storage-based data lakes. Even cloud-based Hadoop services lack the performance benefits of the latest cloud-native cloud data platforms. In fact, companies migrating from cloud-based Hadoop services to Databricks are seeing up to 50% better data processing performance and 40% lower monthly infrastructure costs. By migrating to Google Cloud Databricks, customers can reduce administrative overhead, scale up or down computing resources quickly, and reduce operational costs with autoscaling and job termination.

Scalable data processing to prepare data for analysis

Databricks simplifies the ETL architecture and reduces the cost of ingesting and processing data using a high-performance runtime in a cluster optimized for large-scale data processing. Delta Lake ensures that all data (structured, semi-structured, unstructured) is stored in raw format, undergoes conversion steps, and is phased into an ACID-guaranteed, aggregated BI-enabled layer. You can move to.

Reliable analysis of data lakes

Customers use DeltaLake on top of a data lake based on the Google Cloud Storage file store for reliability, performance, and lifecycle management. Delta Lake helps prevent data corruption, speed queries, improve data freshness, and reproduce ML models. This allows customers to always trust their data for analytical insights. In addition, Databricks provides a Delta Engine to significantly accelerate query performance on data lakes, especially those enabled by Delta Lake.

Data science and machine learning

Databricks managed MLflow allows data teams to track all experiments and models in one place, publish dashboards, and easily pass on to colleagues and stakeholders throughout the workflow from raw data to insights. can. Databricks collaboration workspaces allow data teams to explore data, share insights, run experiments, and build ML models faster to be more productive.

getting started

The release of Databricks on Google Cloud benefits both parties. The tight integration of Databricks with Google Clouds analytics and AI products will provide even more functionality in the future. Together, we continue to innovate and support our customers by building intelligent applications that solve difficult data problems.

If you are interested in Google Cloud Databricks, please request access from the product page. For more information, visit the launch event hosted by TechCrunch. Ali Ghodsi and Thomas Kurian share the vision and customer benefits of this partnership.

Sign up for public preview

