How Pinhome Successfully Migrated to ScyllaDB for a High-Performance and Cost-Efficient Recommendation Engine Infrastructure

5 min readMay 25, 2023

written by Muhammad Ghiyast Farisi, Principal Engineer at Pinhome

As our company continues to grow rapidly, we’re committed to improving the user experience and staying ahead of the competition. To achieve this goal, we’re focusing on the importance of user engagement and simplifying the exploration process. One way we’re doing this is by implementing a recommendation engine in our application.

A recommendation engine allows us to analyze user behavior and provide tailored suggestions for products and services. By doing so, we can simplify the exploration process for our users and increase user interaction with our application, ultimately leading to greater user satisfaction.

As we have limited resources and time, we’re focused on building a fast and reliable recommendation engine in an MVP way. We’re taking a strategic approach to development by decoupling several steps and workarounds into independent concurrent asynchronous jobs and events running in the background.

Our approach is based on solving key problems sets such as optimizing algorithms for generating recommendations, minimizing latency by using efficient data structures and caching techniques, and ensuring fault tolerance. By doing so, we can process large amounts of data efficiently, provide real-time personalized suggestions to users, and meet the core needs of our users in a cost-effective way. We focus on these problems set:

Gathering data to be analyzed
Processing and analyzing the data
Serving a post processed data to be consumed

Despite the challenges that come with building a recommendation engine in an MVP way, we’re confident that our strategic approach will help us develop a recommendation engine that provides a competitive advantage for our company while ensuring a great user experience.

This simple diagram shows the big picture of our recommendation engine with the current relational database that we have and a NoSQL document store database to store all data needed, the event data, processed recommendations data and so on. We use Postgresql within Google CloudSQL as the relational SQL database to store post processed data, and Mongodb as a NoSQL document storage for storing gathered event or activity data.

Problem

As our user base continues to grow rapidly, we see a significant increase in web traffic requests. Unfortunately, our previous technology stack struggled to keep up, and we experienced degraded performance with our recommendation engine. This led to instability of our services and inaccurate recommendation results. As a result, we explored new technologies to help us scale our platform and provide a better user experience.

Solution

After evaluating various options, we decided to switch to ScyllaDB as our database technology for the recommendation engine. Through our tests, ScyllaDB offered superior performance and scalability compared to the database we used before, Cassandra. With ScyllaDB, we could handle a larger volume of data and requests while still maintaining fast response times. This not only improved the stability of our recommendation engine but also allowed us to provide more accurate recommendations to our users.

Result and Analysis

We’re excited to keep using ScyllaDB as we grow our business and enhance user experience. To make things simpler, we’ve decided to restructure our infrastructure, as illustrated in the diagram below:

By consolidating our storage infrastructure and switching to ScyllaDB, we were able to achieve significant cost savings. We were pleasantly surprised to discover that for the same cost as our previous document-based NoSQL storage solution, we were able to deploy a node cluster with higher specifications for each node.

Our primary focus in this migration was to improve the performance of our recommendation engine, and the results exceeded our expectations. We have three endpoints that are responsible for serving and processing recommendations, and all of them have shown significant performance improvements since the switch to ScyllaDB.

Impact

On the codebase itself, we reduced code complexity on querying and storing user event data, because for the first time (after using ScyllaDB), we were able to design a bucket model through the document-based NoSQL storage, since it was modeled as a time-series data model.

The most important impact was seen on the performance level. We have 3 critical components as an API to handle most actions of the recommendation engine, such as pre-populating, constructing pre-calculated data and delivering a result.

Previously, we have 2 APIs that has a high read throughput process and traffic, as can be seen in the graph below:

After switching to ScyllaDB, the API became 42x faster with 97%+ latency decrement

The API on the metric above became 23x faster with 95%+ latency decrement.

Now let us take a look at an API with high write throughput at the graphic below:

The API on the metric above became 9x faster with latency 89%+ decrement

Cost-wise, with the same cost average of document-based NoSQL storage that we previously used, we got almost 2x resource infrastructure specification increment and more than 40x faster performance with 97%+ latency reduction.

Not only reducing the latency and increasing the performance, it also eliminates a lot of problems that occurred previously. These problems are marked by the yellow and red lines shown on all of the graphs above. The occurrence of the yellow and red lines indicates the degradation of performance and user experience through the recommendation.

On the infrastructure side, we saved a lot of storage size by only storing the latest version of the user processed data AI model. This is convenient as previously we weren’t able to clean up or permanently delete data on postgres because it was not designed to have high Write-Delete execution due to its nature.

Summarizing the cost over optimization that we have, we can conclude that after using ScyllaDB, we achieved almost 60% cost reduction, with 97% decreased latency and 90% storage suppressed.

We were also pleasantly surprised to find that even several months after migrating to ScyllaDB, we have not experienced any issues with our cluster. Not a single CPU load spike, lack of memory, or performance degradation.

How Pinhome Successfully Migrated to ScyllaDB for a High-Performance and Cost-Efficient Recommendation Engine Infrastructure

Problem

Solution

Result and Analysis

Impact

Written by Engineering at Pinhome

No responses yet