Quality Focused Feed Re-ranking

What is Feed Re-ranking?

Online shopping platforms face intense competition, where the ability to deliver the most relevant products to customers is crucial. A well-organized and relevant product feed, which inherently is a list of products, forms the backbone of any E-commerce platform. However, simply presenting products based on traditional metrics such as relevance or popularity does not ideally lead to the best outcomes. Feed re-ranking addresses this limitation by adjusting product display order based on multiple factorsm including price, inventory status, seller credibility, seasonal relevance, search query matching, shipping speed, and product quality, which eventually provides a better experience for the user.

This blog focuses primarily on how we at Meesho incorporate the quality of a product as a ranking factor, its implementation methods, and its long-term advantages. Considering the quality of the product in re-ranking the feed ensures that products are not only relevant but also meet a certain standard of quality, thereby improving customer experience and, ultimately, user retention. Retention is one of the foremost metrics in any Internet shopping platform, measured by the ability of the platform to keep customers coming back and making repeat purchases over time. Feed re-ranking with a quality-first approach goes beyond just driving sales; it is about building long-term customer satisfaction, promoting repeat purchases, and enhancing the overall reputation of an E-Commerce platform.

ML in Ranking

In the field of machine learning, the ranking systems that are designed to re-order the feed are commonly referred to as Learning to Rank (LTR) models. LTR is a machine learning approach that trains models to predict the optimal ranking of items, such as search results or recommendations, based on their relevance to a specific query or context. The primary goal of an LTR system is to boost sales. Mathematically, it aims to increase the likelihood that a user will purchase after viewing a product in their feed. This can be modeled directly, or in two separate stages: the first part focuses on maximizing the click-through rate (CTR), while the second part optimizes for conversion rates, assuming the user clicks on the product, which we refer to as the CVR.

Equation 1 represents a fundamental ranking equation, where the ranking score for a product p_j viewed by user ui is given by S_ij . Incorporating the quality of a product in the ranking framework can be done through several strategies. The first option is to integrate quality into the LTR model, however, the objectives of LTR and quality aware ranking differ. LTR is primarily concerned with maximizing conversions, whereas a quality-focused approach seeks to enhance the overall user experience—even if that means sacrificing some immediate sales in favor of better customer satisfaction, which can be compensated through retention and repeat purchases over time. Hence, we model the quality factor explicitly. The ranking equation can be modified as

Quality being a function of a product itself, just depends on p_j . Throughout the next sections, we focus on building the f_quality(.) function.

How do we Measure Quality?

To assess the quality of a product, we rely on the ratings it receives from customers. The most straightforward metric for this is the average rating, which represents the mean of all customer ratings. However, the average rating has its limitations, as it can sometimes mask important signals of dissatisfaction. For instance, a product with numerous 5-star ratings and a few 1- or 2-star ratings may still have a high average, despite a significant number of unhappy customers. This suggests that the average rating might not fully capture the extent of customer dissatisfaction or underlying quality issues. To address this limitation, we introduce the Net Quality Detractor (NQD) ratio. Mathematically, we can represent NQD as

For any product p_j, NQD is the proportion of 1- and 2-star ratings in relation to the total count of the ratings received as shown in Equation 3, where n^(k)_j is the count of rating k ∈ {1, 2, 3, 4, 5} received by product p_j. N_j represents the total rating count received by the product. Unlike the average rating, which can be skewed by a larger number of positive reviews, NQD ratio offers a more precise insight into customer dissatisfaction, serving as a more actionable metric to identify potential quality concerns in a product. Figure 1 shows a scatter plot with the product’s average rating on the x-axis and its NQD on the y-axis. From the plot, we observe that the NQD can vary significantly, even for products with the same average rating. In fact, some products with higher average ratings may have a higher NQD compared to those with lower ratings. An example of this is highlighted in the plot above in orange, where a product with an average rating of 4.0 has 33% customer dissatisfaction, while a product with an average rating of 3.0 has 0% dissatisfaction.

With this motivation, we use NQD as a measure of quality to re-rank our feed. Therefore, we can modify Equation 1 to:

Figure 1: Product avg rating vs product NQD.

NQD Modeling

NQD ratio offers a more effective measure of customer dissatisfaction compared to average ratings. A simple approach is to use a product’s lifetime NQD, which is the NQD calculated based on all the ratings it has received till date in re-ranking the feed. However, relying solely on the lifetime NQD of the product presents several challenges, which are outlined below.

While lifetime NQD offers an aggregate view, it often misses seasonal or immediate trends. A static NQD cannot adapt to these dynamics, making it inadequate for timely decisions.
Any heuristic NQD-based formulations curated to account for trends and seasonality fail to generalize across diverse products. Creating such distinct rules across categories would be impractical, as rating patterns can vary significantly.
For new or low-volume products, NQD is often volatile due to limited ratings, leading to unreliable decisions. Incorporating seller-level quality metrics offers a more stable estimate, as sellers’ past performance can be considered a good indicator of the product quality for new products.
Ranking decisions are believed to be more effective when based on a product’s future behavior rather than past performance. Machine learning models can forecast NQD by analyzing historical data, providing accurate and scalable predictions that capture product trends and seller behavior.

Hence, we use ML models to forecast the possible future NQD for a product, which is then used in the reranking equation.

NQD Forecasting

With the objective of forecasting the future NQD of products, a window of 90 days is identified as the most appropriate for this task, as it provides sufficient rating data for the majority of the products. Since NQD is a ratio that ranges between 0 and 1, we formulate this as a regression problem. The prediction is based on various input signals, including product-specific features, represented as [P], supplier characteristics as [S], and category-level information (e.g., electronics, apparel, etc.), represented as [C]. Additionally, we incorporate seller-category features, given by [SC], that capture seller behavior within specific categories, especially when sellers operate across multiple domains.

For all the signals we incorporate their values at different recency windows as well so as to capture the trends if there exists any. For instance, one of the important signals that we use is the seller's historic NQD, along with which we incorporate seller NQD in past x days where x ∈ {7, 14, 28, 56}. After experimenting with various architectures, including boosting models and deep learning approaches like fully connected networks, we found that LightGBM delivered the best results for our task with a very light weight model. We can represent the NQD prediction function as

In the current approach, the predicted NQD is used as a proxy for quality in the re-ranking process.

Evaluation

We trained a Light-GBM model using the standard mean squared error (MSE) loss. However, for better interpretability, we evaluated the mean absolute error (MAE) of our predictions against two baselines: the product’s lifetime NQD and the supplier’s lifetime NQD in the respective product category. The results are presented in Table 1.

Table 1: Mean absolute error of the model calculated at different rating buckets and compared against the baselines. Here, Baseline 1 indicates product NQD lifetime, and Baseline 2 indicates seller x category NQD lifetime.

In Table 1 we observe that for products with lower rating counts, the seller’s category NQD baseline serves as a more reliable proxy than the product’s lifetime NQD.

This supports our hypothesis that NQD values are highly volatile and unreliable in low-rating scenarios. We can see the trained LGBM model significantly beats both the baselines producing a better and reliable NQD forecast for new products. As the product accumulates more ratings, its lifetime NQD stabilizes and becomes a comparably effective proxy for its future NQD. Nonetheless, our model consistently outperforms both baselines across all rating count buckets, demonstrating its robustness in capturing various scenarios as we have incorporated various signals as features for the model.

Final Ranking

In this section, we present the final ranking equation. As shown in Equation 4, the final score comprises two key components: the relevance score, learned through the LTR model, and the quality component, represented by the f_quality(.) function, which is based on the predicted NQD of the product, as discussed in the previous section. Since NQD reflects negative user experiences, we do not directly multiply the predicted NQD score with the LTR score. Instead, we define it as follows:

, where the concept of using 1 − \hat{NQD} emphasizes on the goal of minimizing negative user experiences. α serves as a tuning factor, providing flexibility in adjusting the importance of quality during the re-ranking process. Finally, the overall ranking equation is formulated as

where α has been finalized to 0.4 through rigorous A/B testing.

Figure 2: Feed re-ranking example of top 8 products for a query ”Jacket for men” as searched by a user.

Figure 2 illustrates an example of feed re-ordering, illustrating how products initially positioned second and third, fourth and fifth, swap places as a higher-quality product is promoted. The product re-ordering extends beyond consecutive swaps, it is just one of the examples as shown in Figure 2. A natural question that arises is why the best quality product wasn’t moved to the top position. The reason being maintaining relevance between the query and the displayed feed remains a priority. The re-ranking process aims to elevate better-quality products while ensuring they stay relevant to the user’s intent. Internal evaluations have indicated a measurable improvement, with a reduction in platform-level NQD by nearly 1% in A/B test environments.

Ongoing Improvements

There are several areas for further enhancement. One key area is the personalization of the NQD score. The current model assumes a generalized view of product dissatisfaction, but in reality, different users may perceive product quality differently. For instance, a product that one user rates as a 1 or 2 might be rated more favorably by another user, depending on their expectations, preferences, and past experiences. These nuances highlight opportunities to improve quality modeling, especially in user-specific contexts. Personalizing the NQD score would enable a more tailored and accurate estimation, enhancing the downstream applications like product recommendations and feed ranking.

Another significant direction involves moving away from using NQD as a proxy for customer satisfaction to directly modeling user retention. Ultimately, the goal of integrating NQD into the ranking equation is to minimize dissatisfaction and improve user retention on the platform. An area of exploration is the use of machine learning techniques to predict user retention directly. These models would assess whether a user is likely to return and make future purchases based on their historical behavior, interactions, and satisfaction levels. This approach may allow the incorporation of a broader set of signals, which can result in more holistic ranking strategies aligned with user experience objectives.