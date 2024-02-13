



Improving data performance with Google Cloud and AlloyDB

As an AI company, efficiently processing large amounts of data is paramount to our ability to accelerate time to market and build differentiated algorithms. That's why we were initially drawn to Google Cloud's distinctive tensor processor units (TPUs) and graphics processor units (GPUs) such as the NVIDIA L4 GPU. Then, as we prototyped the service and started building the consumer application, a Google Cloud managed database solution became important to help us scale the application with a skeleton team.

When we discovered AlloyDB for PostgreSQL, we were caught in the middle. Usage of our services was growing rapidly, creating unique stress on various parts of our infrastructure, especially our database. Initially, they were able to solve the increased demand by scaling up to larger machines, but as the weeks went on, they realized that even the largest machines could not reliably meet customer demand, and they were running out of room. I did. Due to time pressure, we needed to find a solution that could be implemented within a few days. For example, a major refactoring to the shard architecture or a proprietary database engine was out of the question. AlloyDB promised improved performance and scalability with full compatibility with PostgreSQL, but will you be able to migrate in the time frame you need?

Achieved 150% growth with AlloyDB scalability

We chose a replication strategy to ease the migration process. We ran two replication sets from the source database to the destination AlloyDB database and operated in change data capture (CDC) mode for 10 days. This allowed us to prepare the environment for cutover. As a precaution, we provisioned a fallback instance on the source database in case we needed to roll back the migration. Because AlloyDB is fully compatible with PostgreSQL, the migration process was smooth and did not require any changes to the application code.

Since migrating to AlloyDB, we can confidently segment our read traffic into read pools to ensure continued growth in user activity. AlloyDB's replication lag is consistently less than 100ms, allowing you to scale reads to 20x your previous capacity. This improvement allows us to more effectively handle spikes in demand and handle a higher volume of queries, significantly increasing queries processed per second by 150%.

As a direct result, we have significantly improved our service and are able to provide a better user experience with better uptime. AlloyDB's full compatibility with PostgreSQL and low-latency read pool gave us a solid foundation to continue scaling. But we didn't stop there.

Extend your infrastructure with AlloyDB and Spanner

We knew that at our current growth rate, we would eventually run into scaling issues with the original monolith architecture. Needless to say, he needed the room to scale to his 1 billion daily active users.

To address this, we identified the fastest growing parts of the Django monolith and refactored them into their own standalone microservices. This allowed us to isolate the growth of this particular part of the system and manage it independently from the rest of the monolith, which we had already migrated to AlloyDB.

Currently, AlloyDB is playing a key role in powering engagement systems. Especially for front-end chats, real-time performance is essential for a responsive user interface, and data requires the highest level of consistency and availability as users interact with the chatbot.

However, its interactivity is mostly temporary, and its profile and reference data are relatively small and limited from a business model perspective (for example, one user profile per user). With this in mind, he refactored his second part of the chat stack into a microservice written in Go and supported with Spanner. With our industry-leading HA story and virtually unlimited scale, we were able to significantly improve the scalability and performance of our refactored chat stack.

Now, Spanner has enhanced its chat history recording system to allow the frontend to send chat requests to the backend, where the chat requests are recorded and sent to the AI ​​magic. Asynchronously captures the response, logs it, and sends it back to the user's front end. This bidirectional system allows both databases to actively collaborate across front-end chats, ensuring the highest level of data consistency and availability, and providing a superior experience for end users. With Spanner, you can now process terabytes of data every day without worrying about site stability.

We are confident that our chat application is future-proofed and able to handle any spike in user activity. We also reduced operational costs by moving to a managed database service. The biggest cost right now is opportunity cost.

As we grow, we continue to evolve our architecture and look for ways to improve scalability, performance, and reliability. By adopting an architecture that leverages the strengths of both AlloyDB and Spanner, we have built a system that meets the needs of our users and can support our growth, which is predicted to grow 10x over the next 12 months. I believe it can be done.

