Topic: In this assignment we will show the algorithm: Mean Shift. This algorithm belongs in the clustering algorithms family.

Exercise: Create a Python script file and perform the following tasks:

  1. Import libraries:
    • numpy: For calculations of arrays.
    • cdist: To calculate Euclidean distance.
    • make_blobs: To create a dataset with points depending to input points and bandwidth value.
  2. Matplotlib: To plot points in 2-dimension figure.
  3. Create a class Mean Shift with the name Mean_Shift which contains a constructor _init_ and a function fit. The constructor takes radius as a parameter and has as default value 3, in case the user doesn’t give a value. Function fit takes data as parameter. Radius is an int and data is a numpy.ndarray of 2-dimensions.
  4. Load and use the dataset of sklearn make_blobs to create a dataset of 100 samples by giving input [[2, 2], [4, 7], [7, 7], [5, 13]] to centers parameter and 1 in cluster_std parameter. Load and use the library matplotlib to plot the points.
  5. The algorithm starts by making all points centroids. Use the library matplotlib to plot the points. Plot the centroids with mark ‘x’.
  6. Calculate new centroids by finding the points in the circle with center the centroid and radius (given value or 3) and find the mean point of them, which becomes the new centroid. Repeat this for each centroid.
  7. You need to subtract duplicates centroids
  8. Repeat the last step until new centroids and previous ones are the same. Use a bool variable to end the while loop.
  9. Plot the diagram of points and centroids with unique color for each cluster. Make a .gif file of the changes of centroids and points that change during the process.

Help (if needed): Click Pdf  to download the exercise help.

Lecture for better understanding:http://icarus.csd.auth.gr/data-clustering-lecture/

This exercise was developed by G.R.Pegias.

————————————————————————————————————–

For the solutions to the exercises, please contact koroniioanna@csd.auth.gr