New users rarely give second chances. If their first experience on the app, especially the homepage product feed, feels irrelevant then they leave quickly. At Meesho, we noticed that our static homepage product feed for new users was causing high bounce rates and a poor first impression.
But with zero browsing history, how do you show the right products from the very first scroll?
In this blog, we share how we reengineered Meesho’s homepage feed to deliver personalized product recommendations right from a user’s very first session. 🚀

The Problem
There are two core challenges in personalizing feed for new users: First, lack of behavioral signals early on. Second, incorporate new user interactions on the app in real-time to improve personalization.
Journey of New User Personalization at Meesho

▶️ Demographic Recommender (DemoRec):
Our early approach to solving the cold start problem relied on basic demographic segmentation.
Since we didn’t have behavioral data for new users, we grouped them into broad cohorts based on attributes like gender, location, etc. We then surfaced popular products within each cohort to offer some level of personalization. While this strategy helped avoid completely generic feeds, it had its limitations especially in capturing the unique and evolving preferences of each user.
▶️ Early Signal Personalization Model (ESPM):
To improve on demographic-only personalization, we began blending in early user behavior.
We combined basic demographic signals with a user’s initial interactions on the app such as product clicks, time spent on listings, and scroll patterns to estimate their likelihood of purchasing from different categories. Using this predicted purchase probability, we identified the top categories for each user and curated a personalized product feed tailored to their predicted interests. This hybrid approach gave us a more dynamic way to serve relevant content, even in a user's first session.
The above two approaches definitely solved the first problem mentioned above. We still were looking for a new architecture to keep improving feed as new interactions were done by the user.
▶️ Cold-Warm Net (CWN):
The goal of modeling cold-start users is to learn effective user representation from both cold and warm user behaviors and build models that adapt as users evolve. Cold users are users with no interaction history. Warm users are new users with some interaction history.
Cold-Warm Net uses expert towers for cold-state and warm-up state of users, combined via a gate network that adapts based on user behavior. A dynamic teacher selector guides learning through knowledge distillation, ensuring high-quality personalization from the start. We’ll discuss this in detail below -
Let’s Deep Dive into the Cold-Warm Net model:


Our model consists of two experts - cold & warm experts. User demographic features Xdemo are passed to the cold expert to get cold embedding ecold and user’s interaction sequence Xseq are passed to the warm expert to get warm embedding ewarm. Gating network combines ecold and ewarm to get the final user embedding euser. We get item embedding eitem from the item lookup table where embeddings are randomly initialised.
ecold= fcold(Xdemo)
ewarm= fwarm(Xseq)
eitem=Lookup(iitem)
Gating network
User state features Xstate like login state, activity level, lifecycle stage,etc are passed to the gate network to get cold expert weight wcold and warm expert weight wwarm.
wwarm, wcold= fgate(Xstate)
euser = wwarm.ewarm+wcold.ecold
Lets say y and y^ are the actual and predicted labels for each sample, then we optimise the whole network by minimizing Binary cross entropy loss L between them
y^= sigmoid(cosine_similarity (euser ,eitem))
L= Binary cross entropy( y^ , y )
Dynamic knowledge distillation:
Cold-start experts often underfit due to limited information during cold-state of users, so we use Dynamic Knowledge Distillation (DKD) to transfer knowledge from warm expert to the cold expert when needed. An auxiliary distillation loss function Ld added to the main loss L to guide learning.
Let y^cold and y^warm denote the predicted label for the cold-start expert and warm expert respectively. For each sample, we compare the Binary cross entropy losses from both experts.
If the cold expert performs worse L(y^cold,y) > L(y^warm ,y), Ld is added to L to help it learn from the warm expert. The overall loss function of the network Lo and distillation Ld is defined as
Ld= Cross entropy (y^cold ,y^warm)
Lo= L +α* Ld (where, α=0 if L(y^cold,y) ≤ L(y^warm,y)
here, α determines the strength of distillation from the warm-up expert.
Why SUB_CATEGORY and PRICE prediction auxiliary tasks were added
1. Enforcing Hierarchical Learning: In our use case, there’s a natural hierarchy: e.g., sub_category → price_decile → catalog. Auxiliary tasks help the model capture and align with this structure.
2. Improved Gradient Flow / Optimization: Auxiliary tasks add additional loss signals at intermediate layers, helping stabilize training by improving gradient flow in deep networks.
3. Better Representation Learning: By encouraging the network to solve sub-problems, it learns richer, much better representation that can improve performance on the main task.
4. Faster Convergence: Training with auxiliary tasks can accelerate convergence by guiding early layers to learn useful features more quickly.
Mathematical formulation: We add auxiliary losses Ltsub_category and Lprice_decile to the original overall loss function of the network Lo to get Ltotal
Lsub_category = Cross entropy (y^sub_category ,ysub_category)
Lprice_decile = Cross entropy (y^price_decile, yprice_decile)
Ltotal= Lo+Lprice_decile+Lsub_category
where,
y^sub_category and ysub_category are the predicted and actual sub_category label respectively
y^price_decile and yprice_decile denote the predicted and actual price_decile label respectively.
Ltotal is the total loss used to optimize the network
Backtesting Results:


🚀 Impact!
We rolled out these model enhancements on the Homepage Product Feed and saw a notable uplift in feed engagement metrics like CTR, CVR, O/Vi highlighting stronger engagement and conversion within the feed. Order Contribution from FY increased significantly, reinforcing its central role in driving purchases. The FY feed improvements also influenced Search and other RE surfaces through stronger user intent. Additionally, we saw a sharp drop in bounce rate, indicating that users are finding more relevant content early in the feed, leading to quicker conversion, higher-quality interactions.
Overall saw a notable rise in new user activation, along with a sharp drop in bounce rate—a game-changing impact!
🎉 Shoutouts
Special thanks to Pukhraj Baraskar, Divay Jindal, Devashish Gupta for working closely on the project and Madhurita Mahapatra, Vinit Rongata, Ravindra Kumar Yadav, Debdoot Mukherjee, Anmol Verma and Milan Partani for their guidance.
🗒️Reference
1. https://arxiv.org/pdf/1808.09781
2. https://arxiv.org/pdf/2106.03819
3. https://arxiv.org/pdf/2205.04507
4. https://arxiv.org/pdf/2309.15646

