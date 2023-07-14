



Saber is a leading software and technology company in the global travel industry. With decades of groundbreaking firsts, the company’s team of experts drives innovation and ingenuity across the travel ecosystem. Saber partners with airlines, hoteliers, agents and other travel partners to retail, distribute and fulfill travel. In this blog post, he highlighted Saber for his DevOps achievements winning the Nurturing Team Culture Award at the 2022 DevOps Awards. If you want to learn more about our winners and how they used DORA metrics and practices to grow their business, start here.

As a leader in the travel technology industry, Saber continues to innovate and grow, but its legacy systems have prevented it from realizing the full benefits of the cloud. Our largest product, Air Offer, is a robust platform that processes millions of flight calculations and shopping requests per second, generates trillions of flight solutions each month, and integrates 7.5 petabytes of data into our big data platform. increase. Air Offer was first built in his 90s and migrated to the cloud, but could be cumbersome and unreliable due to the system’s inherent complexity. The system was too big to fail, but it also meant it was too big to change.

The biggest challenge this created was around scalability. As shopping algorithms have become more complex, services such as air shopping have seen traffic double year-over-year, and so has the computational need to keep adapting to changing business dynamics and airline algorithms. It was a team effort to keep up with these changes, but continuous improvement had to be combined with better infrastructure, tools, and best practices.

multiple optimization problems

The company determined that it needed a solution that could improve reliability, optimize costs, and increase agility while maintaining consistency and competitiveness in performance and quality. Our customers expect fast results when searching for travel deals and wanted more stability to the point where they could introduce marketing campaigns like flash sales. This requires both mastering the cloud and adopting DevOps.

The three main high-level goals the company sought to achieve in this digital transformation are:

Improve the speed and quality of the software and services we provide to you

Become more innovative by partnering with Google Cloud from a technology and travel perspective

Improve reliability and security while reducing operating costs

Resolution

To achieve these goals, we worked closely with Google Cloud to transform our systems and corporate culture to make better use of the cloud. With the help of Google Cloud, we explored our core network and security design and found options that provided a flexible foundation for our product teams to build upon. Next, we considered specific design and implementation to fit key products such as Air Shopping and Google Cloud services that meet their scalability requirements.

The solution we built combined a public cloud with a data center to host our applications. This approach takes into account application dependencies and delays due to communication between applications, ensuring a better customer experience. For example, moving Air Offer from two over-provisioned data centers for redundancy to a four-region distributed model that leverages data insights and cloud flexibility to manage capacity and distribution This allowed us to optimize performance and cost. Increase confidence in real time.

Scale was a big issue in this digital transformation, so we decided that focusing on autoscaling was the best solution. With standard autoscaling, MIG monitors the autoscaling signal (CPU utilization in this case) to determine when excessive demand is occurring and launches more servers to handle the increased demand. However, this was not a panacea solution. For more complex applications like air shopping where data caching requirements can take minutes to start a server, we focused on Compute Engine’s predictive autoscaling capabilities. This feature allows you to maintain additional compute for periods of high traffic and optimize compute usage without sacrificing customer experience. Solutions like these are invaluable during Black Friday and Cyber ​​Monday, when businesses need to prepare for the highest demand of the year, but don’t want to waste money on over-provisioning.

We have also optimized Compute Engine with Spot and Preemptible VMs. These allow us to adjust the blend of compute and flexible instances across regions for optimal pricing.

The first workload migration, starting with air shopping, took about 15 months. During this time, we learned new ways of working, such as implementing secure CI/CD pipelines and adopting cloud-native Infrastructure as Code (IaC) concepts alongside autoscaling and cost control. Once the team got used to the process and started establishing best practices, it became much quicker and smoother to bring in other regions to handle the compute.

A cross-functional team including site reliability engineers, platform engineers, and software engineers was formed to migrate the workloads to Google Cloud. By adhering to Westrum’s definition of good culture, common goals removed workflow bottlenecks, directly addressed disabled people at stand-ups, and reduced miscommunication. This collaborative approach increased the pride and efficiency of the team. Ultimately, we built a team of experts across our organization to assist other teams with similar migrations. Success calls success!

result

This combination of multi-region deployment for high reliability, predictive autoscaling for more efficient resource consumption, and spot and preemptible VMs for cost optimization gives the company the flexibility they need to meet their customers’ needs. We have achieved sustainability and continuity. And meeting those needs has seen quantifiable improvements in our own business as well.

Specifically, predictive autoscaling provides about a 10% greater benefit from using basic autoscaling, which translates to a savings of $3 million in 2023 based on projected shipping costs. Equivalent to what is expected. Knowing how long it takes to start a server allows the predictive autoscaling logic to start the server a little while before it’s needed. This means that without autoscaling, businesses no longer need to run about 50% more servers than they need during the day in case of peak traffic, saving the additional 10% needed with basic autoscaling. I was able to save. Additionally, by moving all workloads to Google Cloud and delivering changes in one of his CI/CD pipelines, Saber has significantly reduced deployment times and improved cycle times for new feature releases. .

Before the workload is migrated to Google Cloud. Deploying one of our large applications took about 8-10 hours per release per location. We were operating across 4 locations, but after adopting a single CI/CD pipeline, we were able to reduce this by 50% to 4 hours per release per region. I was.

Between the technologies that Google Cloud helped deploy and the DORA Research Principles, we saw significant improvements in five key characteristics of cloud computing.

Rapid elasticity: To handle traffic spikes, Managed Instance Groups (MIG) can scale to hundreds of servers in minutes.

On-demand self-service: Sabers teams can reserve capacity at the project or organization level and adjust allocations to meet changing demand with the click of a button.

Broad network access: Google Cloud simplifies infrastructure by giving teams easy access to computing resources, reports, and alerts across devices.

Measured Services: Saber can use Google Cloud operations and other tools to measure and scale resources while tracking costs to optimize user experience.

Resource Pooling: Using cloud-native features such as autoscaling and isolation of resource pools by client type, Saber enables customers to scale and consume resources within their contracted volume without human intervention.

Stay tuned for the rest of the series focused on our DevOps Award winners. Also, read the 2022 State of DevOps report to dig deeper into his DORA research.

