Week 3 (16/07/18 - 20/07/18)

In week 3, I was introduced to new algorithm in unsupervised learning that is K-means.I saw the lecture video of ISB course understood it,discussed it with my peers and teacher and then implemented it in python using jupyter notebook.Later i used it to find results in other applications like clustering news articles in recommendation system,tf-idf vectorizer etc.To understand it better i implemented it in two ways using for-loop and without for-loop.
Along with it i was introduced to visualization which i will be covering in my next blog.

K-means


K-means clustering is a type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity. The results of the K-means clustering algorithm are:
  1. The centroids of the K clusters, which can be used to label new data
  2. Labels for the training data (each data point is assigned to a single cluster)
Business Use

The K-means clustering algorithm is used to find groups which have not been explicitly labeled in the data. This can be used to confirm business assumptions about what types of groups exist or to identify unknown groups in complex data sets. Once the algorithm has been run and the groups are defined, any new data can be easily assigned to the correct group.
This is a versatile algorithm that can be used for any type of grouping. Some examples of use cases are:
  • Behavioral segmentation:
    • Segment by purchase history
    • Segment by activities on application, website, or platform
    • Define personas based on interests
    • Create profiles based on activity monitoring
  • Inventory categorization:
    • Group inventory by sales activity
    • Group inventory by manufacturing metrics
  • Sorting sensor measurements:
    • Detect activity types in motion sensors
    • Group images
    • Separate audio
    • Identify groups in health monitoring
  • Detecting bots or anomalies:
    • Separate valid activity groups from bots
    • Group valid activity to clean up outlier detection

Comments

Popular posts from this blog

Day 1 - Numpy Revision

Week 18 (29/10/2018 - 02/11/2018)

Week 22 (26/11/18 - 30/11/18)