Loading
Data Science and Machine Learning

How to Define the Number and Types of Clustering (1) - Building Segmentation Hypotheses with Business Logic

2026-05-21

How to Define the Number and Types of Clustering (1) - Building Segmentation Hypotheses with Business Logic
Before performing clustering, we can first use the Business Logic Define Method to establish an initial classification direction. This method starts from the business objective and clearly defines the problem the analysis aims to solve, such as “identifying which users can increase usage.” After that, we define the target users, select observable behavioral signals, and use product experience and operational understanding to initially group users into several possible user types. These classifications are not the final answer. They are business hypotheses before clustering. Their value is to help the team build a simple and intuitive analysis direction, making the subsequent data validation more focused. For example, if the goal is to increase app usage, we may first hypothesize that users can be grouped into high-active users, medium-active users, deal-oriented users, content browsers, and at-risk users. Then, we can use K-means, the Elbow Method, and the Silhouette Score to check whether these classifications truly exist in the data. The final segments should have data validity, business interpretability, and product action value.

How to Decide the Number of Clusters

The number of clusters should not be decided by simply saying, “I want 6 groups.” 
Instead, we should first ask:

Can the resulting groups help me make decisions?

So:

  • If K=6 but only 3 groups can be clearly explained, then K=6 is not a good choice.
  • If K=4 and every group is clear, strategic, and measurable with KPIs, then K=4 may be better.
  • If K=5 can clearly identify a growth-potential user group, then K=5 has product value.

In short:

The product question determines the direction of clustering; the features determine what the model can see; the K value determines the level of segmentation detail; and UX / business interpretation determines whether clustering is truly useful.


How to Define the Number of Clusters, or the K Value

“First understand the approximate business-defined scope, then use data to identify reasonable candidate values, and finally use business judgment to decide the final number, naming, and product strategy.”

The K value should not be decided only by intuition, nor only by mathematical scores. We should first use business understanding to define an approximate scope, then use data to identify reasonable candidates, and finally evaluate which K is most useful from a UX / business perspective, such as through business interpretability.

K-means requires us to define K = the number of clusters in advance. It divides an unlabeled dataset into K clusters and assigns data points to the nearest centroid based on distance. One limitation of K-means is that it is not always easy to identify the correct K value.


1. Use the Business Question to Define the Approximate Scope

Do not start by asking:

How many clusters should we create?

Instead, ask:

What problem do I want to solve with clustering?

For example:

Which users are most likely to increase their usage through product design, content recommendation, or promotional incentives?

This question is not simply about finding “low-usage users.” It is about finding:

User groups whose usage has not yet peaked, but who still show clear potential for growth.


How to Set the Direction

If the goal is to “increase usage,” the segmentation should focus on:

Activity level, return visits, interaction depth, and cross-feature usage.


2. Who: Define the Target Users

Confirm which group of users you want to analyze.

For example:

Target UsersDescription
All app usersSuitable for overall segmentation
Active users in the last 30 daysSuitable for analyzing usage growth
Low-activity usersSuitable for re-engagement analysis
Jetso / Reward usersSuitable for analyzing deal-oriented behavior
Community interaction usersSuitable for analyzing UGC / community growth

3. What: Define Behavioral Signals

Translate business understanding into observable data signals.

For example:

Business ConceptMeasurable Metrics
Activity levelApp opens, session count, active days
Content interestPage views, article views, category views
Deal interestJetso clicks, Reward clicks, redemption
Interaction depthLikes, comments, shares, follows, saves
Search demandSearch count, AI search usage
Churn riskDays since last visit, inactive days
Conversion behaviorRegistration, coupon claim, mission completion

4. How Often: Define High / Medium / Low Thresholds

Use simple rules to create an initial classification.

For example:

LevelInitial Definition
High activityUses the app 5+ days per week / high session count
Medium activityUses the app 2–4 days per week
Low activityUses the app 0–1 day per week
Churn riskNo return visit for 14 or 30 days
High deal interestJetso / Reward clicks above average
High content interestArticle views above average

These thresholds do not need to be very precise at the beginning. They can be based on experience or percentiles, such as top 25%, middle 50%, and bottom 25%.


5. So What: Define Business Value

Each segment should be able to answer:

What can I do after identifying this segment?

For example:

Initial User TypeAction Value
Highly active loyal usersMaintain loyalty, promote member missions, improve retention
Medium-active potential usersMost suitable for increasing usage
Deal-oriented usersUse offers to drive content and community usage
Content browsing usersUse recommendations, AI Search, and save features to increase return visits
Low-activity / churn-risk usersUse re-engagement campaigns to bring them back

If a segment does not have a clear action, it is usually not worth keeping as an independent business segment.


Target Groups Worth Prioritizing

If the business question is “increase usage,” the most valuable groups to focus on are:

1. Medium-active Potential Users

They already have a usage habit, but they have not yet developed high-frequency behavior.

For example, they may use the app once or twice a week, but not every day.

Strategy direction: 
Push personalized content, mission systems, daily check-ins, save reminders, and related article recommendations.


2. Deal-oriented Users

They have clear motivation toward Jetso, rewards, and coupons, but they may only enter the app when there are offers.

Strategy direction: 
Use offer pages to guide them toward articles, lifestyle content, community sharing, and member missions to increase cross-feature usage.


3. Content Browsing Users

They are willing to consume content, but their interaction depth is still low.

Strategy direction: 
Strengthen related content, AI Search, topic following, author / topic tracking, and personalized homepages.


Groups Not Recommended as the First Priority

Highly Active Loyal Users

They already have high usage. The main goal for this group should be retention, not usage growth.

Extremely Low-activity Users

They may no longer have a clear need, and the cost of reactivating them may be high. Reactivation can still be attempted, but they may not be the most effective target group in the first stage.


Conclusion

Business logic definition is a simple and intuitive method, and it is very suitable as the starting point for clustering. It can first divide users into several possible types based on business goals, product experience, and user behavior understanding, helping the team quickly establish an analysis direction.

However, this type of classification is essentially a business hypothesis. It does not mean that the data will naturally form the same groups. The core of clustering is to automatically discover hidden natural groupings based on similarity between data points. Clustering is a form of unsupervised learning that discovers natural groupings in unlabeled data.

Therefore, a more reasonable process is:

First use business logic to define the approximate direction, then use clustering methods to validate it.

For example, we can first hypothesize 5 user types from a business perspective, then use K-means, the Elbow Method, and the Silhouette Score to check whether the data supports these classifications. If the data shows that K=4 is more reasonable, similar groups should be considered for merging. If K=5 has a slightly lower score but each group has clear characteristics, sufficient user volume, and different strategic value, then K=5 can still be kept.

The final number and types of clusters should not be decided only by mathematical scores or only by business intuition. They should balance three things:

Data validity, business interpretability, and product actionability.


Related Article

How Clustering Supports UX and Product Design: From User Segmentation to Product Strategy

How to Define the Number and Types of Clustering (2): Using Data Methods to Find a Reasonable Number of Clusters