Which Of The Following Individuals Helped Draw People's Attention To The Method Of Cohort Analysis?

Introduction

The idea of creating machines which learn by themselves has been driving humans for decades now. For fulfilling that dream, unsupervised learning and clustering is the primal. Unsupervised learning provides more than flexibility, but is more challenging as well.

Clustering plays an of import function to depict insights from unlabeled data. It classifies the data in similar groups which improves various business decisions by providing a meta understanding.

In this skill test, we tested our customs on clustering techniques. A full of 1566 people registered in this skill test. If y'all missed taking the test, here is your opportunity for you lot to find out how many questions you could take answered correctly.

If you are simply getting started with Unsupervised Learning, here are some comprehensive resources to assistance you in your journey:

Car Learning Certification Course for Beginners
The Most Comprehensive Guide to Chiliad-Means Clustering You'll Ever Demand
Certified AI & ML Blackbelt+ Program

Overall Results

Below is the distribution of scores, this will help you evaluate your operation:

You tin can access your performance hither. More than 390 people participated in the skill test and the highest score was 33. Here are a few statistics about the distribution.

Overall distribution

Mean Score: 15.11

Median Score: 15

Mode Score: sixteen

Helpful Resource

An Introduction to Clustering and different methods of clustering

Getting your clustering correct (Function I)

Getting your clustering right (Role 2)

Questions & Answers

Q1. Moving picture Recommendation systems are an example of:

Classification
Clustering
Reinforcement Learning
Regression

Options:

B. A. 2 Only

C. 1 and ii

D. 1 and 3

E. 2 and iii

F. ane, 2 and 3

H. 1, 2, 3 and 4

Solution: (Eastward)

Generally, pic recommendation systems cluster the users in a finite number of similar groups based on their previous activities and profile. Then, at a fundamental level, people in the aforementioned cluster are fabricated like recommendations.

In some scenarios, this can also be approached every bit a classification trouble for assigning the almost appropriate movie form to the user of a specific grouping of users. Also, a movie recommendation system can be viewed as a reinforcement learning problem where it learns by its previous recommendations and improves the future recommendations.

Q2. Sentiment Assay is an example of:

Regression
Classification
Clustering
Reinforcement Learning

Options:

A. i Only

B. ane and 2

C. 1 and iii

D. 1, 2 and 3

E. 1, 2 and 4

F. 1, 2, 3 and 4

Solution: (E)

Sentiment assay at the fundamental level is the task of classifying the sentiments represented in an paradigm, text or oral communication into a ready of divers sentiment classes like happy, distressing, excited, positive, negative, etc. It can also be viewed every bit a regression problem for assigning a sentiment score of say i to 10 for a respective image, text or speech communication.

Another style of looking at sentiment analysis is to consider it using a reinforcement learning perspective where the algorithm constantly learns from the accuracy of past sentiment analysis performed to improve the futurity performance.

Q3. Can decision trees be used for performing clustering?

A. Truthful

B. False

Q4. Which of the following is the nigh appropriate strategy for data cleaning before performing clustering analysis, given less than desirable number of data points:

Capping and flouring of variables
Removal of outliers

Options:

A. 1 only

B. two only

C. i and two

D. None of the in a higher place

Q5. What is the minimum no. of variables/ features required to perform clustering?

A. 0

B. 1

C. 2

D. 3

Q6. For 2 runs of K-Mean clustering is it expected to get aforementioned clustering results?

A. Yep

B. No

Solution: (B)

M-Means clustering algorithm instead converses on local minima which might also correspond to the global minima in some cases but not ever. Therefore, information technology'due south advised to run the K-Ways algorithm multiple times before drawing inferences nigh the clusters.

Notwithstanding, notation that it's possible to receive aforementioned clustering results from Thousand-ways by setting the same seed value for each run. But that is done by only making the algorithm cull the set of same random no. for each run.

Q7. Is it possible that Assignment of observations to clusters does not change between successive iterations in K-Ways

A. Yes

B. No

C. Can't say

D. None of these

Solution: (A)

When the K-Ways algorithm has reached the local or global minima, it will not modify the assignment of data points to clusters for two successive iterations.

Q8. Which of the following can human action every bit possible termination conditions in K-Ways?

For a fixed number of iterations.
Assignment of observations to clusters does not change betwixt iterations. Except for cases with a bad local minimum.
Centroids practise not alter betwixt successive iterations.
Terminate when RSS falls below a threshold.

Options:

A. 1, 3 and 4

B. 1, 2 and 3

C. 1, 2 and 4

D. All of the above

Solution: (D)

All four conditions can be used as possible termination condition in K-Means clustering:

This condition limits the runtime of the clustering algorithm, just in some cases the quality of the clustering volition be poor because of an insufficient number of iterations.
Except for cases with a bad local minimum, this produces a good clustering, simply runtimes may exist unacceptably long.
This also ensures that the algorithm has converged at the minima.
Terminate when RSS falls below a threshold. This criterion ensures that the clustering is of a desired quality afterward termination. Practically, it's a proficient exercise to combine it with a bound on the number of iterations to guarantee termination.

Q9. Which of the following clustering algorithms suffers from the problem of convergence at local optima?

Chiliad- Means clustering algorithm
Agglomerative clustering algorithm
Expectation-Maximization clustering algorithm
Diverse clustering algorithm

Options:

A. 1 simply

B. 2 and 3

C. 2 and 4

D. 1 and iii

E. 1,ii and 4

F. All of the above

Solution: (D)

Out of the options given, only K-Means clustering algorithm and EM clustering algorithm has the drawback of converging at local minima.

Q10. Which of the following algorithm is most sensitive to outliers?

A. K-means clustering algorithm

B. K-medians clustering algorithm

C. Thousand-modes clustering algorithm

D. K-medoids clustering algorithm

Solution: (A)

Out of all the options, K-Means clustering algorithm is nearly sensitive to outliers equally it uses the mean of cluster data points to detect the cluster eye.

Q11. After performing 1000-Means Clustering analysis on a dataset, you lot observed the following dendrogram. Which of the following conclusion can be drawn from the dendrogram?

A. At that place were 28 information points in clustering assay

B. The best no. of clusters for the analyzed information points is 4

C. The proximity office used is Average-link clustering

D. The above dendrogram interpretation is not possible for K-Ways clustering analysis

Solution: (D)

A dendrogram is not possible for 1000-Means clustering analysis. Nonetheless, 1 can create a cluster gram based on Chiliad-Means clustering assay.

Q12. How can Clustering (Unsupervised Learning) be used to improve the accuracy of Linear Regression model (Supervised Learning):

Creating dissimilar models for different cluster groups.
Creating an input feature for cluster ids as an ordinal variable.
Creating an input feature for cluster centroids equally a continuous variable.
Creating an input feature for cluster size as a continuous variable.

Options:

A. 1 only

B. 1 and ii

C. i and iv

D. iii only

E. 2 and four

F. All of the higher up

Solution: (F)

Creating an input feature for cluster ids as ordinal variable or creating an input characteristic for cluster centroids as a continuous variable might non convey any relevant information to the regression model for multidimensional data. But for clustering in a single dimension, all of the given methods are expected to convey meaningful information to the regression model. For instance, to cluster people in two groups based on their hair length, storing clustering ID as ordinal variable and cluster centroids every bit continuous variables will convey meaningful information.

Q13. What could be the possible reason(south) for producing two different dendrograms using agglomerative clustering algorithm for the aforementioned dataset?

A. Proximity function used

B. of information points used

C. of variables used

D. B and c only

E. All of the above

Solution: (E)

Change in either of Proximity function, no. of information points or no. of variables will lead to different clustering results and hence different dendrograms.

Q14. In the figure beneath, if you draw a horizontal line on y-axis for y=2. What will be the number of clusters formed?

A. 1

B. 2

C. iii

D. 4

Solution: (B)

Since the number of vertical lines intersecting the red horizontal line at y=2 in the dendrogram are 2, therefore, two clusters volition be formed.

Q15. What is the most appropriate no. of clusters for the data points represented by the following dendrogram:

A. 2

B. 4

C. vi

D. 8

Solution: (B)

The decision of the no. of clusters that can all-time draw dissimilar groups tin be chosen past observing the dendrogram. The all-time selection of the no. of clusters is the no. of vertical lines in the dendrogram cut past a horizontal line that can transverse the maximum distance vertically without intersecting a cluster.

In the to a higher place case, the best option of no. of clusters will be 4 equally the cerise horizontal line in the dendrogram below covers maximum vertical distance AB.

Q16. In which of the following cases will One thousand-Means clustering fail to give skilful results?

Data points with outliers
Information points with different densities
Information points with round shapes
Data points with non-convex shapes

Options:

A. 1 and ii

B. 2 and 3

C. 2 and iv

D. i, 2 and 4

Eastward. i, 2, 3 and 4

Solution: (D)

K-Means clustering algorithm fails to give good results when the data contains outliers, the density spread of data points beyond the data infinite is different and the data points follow non-convex shapes.

Q17. Which of the following metrics, do we have for finding dissimilarity betwixt 2 clusters in hierarchical clustering?

Single-link
Complete-link
Average-link

Options:

A. 1 and 2

B. i and 3

C. 2 and 3

D. one, 2 and iii

Solution: (D)

All of the three methods i.e. single link, complete link and average link tin can exist used for finding dissimilarity betwixt 2 clusters in hierarchical clustering.

Q18. Which of the following are true?

Clustering assay is negatively affected by multicollinearity of features
Clustering assay is negatively affected by heteroscedasticity

Options:

A. 1 simply

B. two simply

C. 1 and 2

D. None of them

Solution: (A)

Clustering analysis is not negatively affected past heteroscedasticity simply the results are negatively impacted past multicollinearity of features/ variables used in clustering every bit the correlated feature/ variable will carry actress weight on the distance calculation than desired.

Q19. Given, six points with the post-obit attributes:

Which of the following clustering representations and dendrogram depicts the use of MIN or Single link proximity function in hierarchical clustering:

Solution: (A)

For the single link or MIN version of hierarchical clustering, the proximity of ii clusters is defined to exist the minimum of the distance between any two points in the different clusters. For instance, from the table, we see that the distance between points iii and 6 is 0.11, and that is the pinnacle at which they are joined into one cluster in the dendrogram. Every bit another example, the distance between clusters {three, half-dozen} and {two, five} is given past dist({3, 6}, {2, v}) = min(dist(three, 2), dist(6, ii), dist(3, v), dist(six, 5)) = min(0.1483, 0.2540, 0.2843, 0.3921) = 0.1483.

Q20 Given, 6 points with the following attributes:

Which of the following clustering representations and dendrogram depicts the employ of MAX or Complete link proximity function in hierarchical clustering:

Solution: (B)

For the single link or MAX version of hierarchical clustering, the proximity of two clusters is defined to be the maximum of the distance between whatsoever two points in the different clusters. Similarly, here points 3 and half dozen are merged beginning. Still, {3, 6} is merged with {4}, instead of {2, five}. This is because the dist({3, 6}, {4}) = max(dist(3, 4), dist(6, 4)) = max(0.1513, 0.2216) = 0.2216, which is smaller than dist({3, half dozen}, {2, 5}) = max(dist(3, ii), dist(6, 2), dist(three, 5), dist(6, 5)) = max(0.1483, 0.2540, 0.2843, 0.3921) = 0.3921 and dist({three, 6}, {1}) = max(dist(3, one), dist(6, 1)) = max(0.2218, 0.2347) = 0.2347.

Q21 Given, 6 points with the post-obit attributes:

Which of the post-obit clustering representations and dendrogram depicts the apply of Grouping average proximity function in hierarchical clustering:

B.
C.

Solution: (C)

For the group average version of hierarchical clustering, the proximity of two clusters is divers to be the boilerplate of the pairwise proximities between all pairs of points in the different clusters. This is an intermediate approach between MIN and MAX. This is expressed by the following equation:

Here, the altitude betwixt some clusters. dist({3, 6, four}, {1}) = (0.2218 + 0.3688 + 0.2347)/(iii ∗ 1) = 0.2751. dist({ii, 5}, {1}) = (0.2357 + 0.3421)/(2 ∗ i) = 0.2889. dist({three, half-dozen, 4}, {two, 5}) = (0.1483 + 0.2843 + 0.2540 + 0.3921 + 0.2042 + 0.2932)/(half-dozen∗1) = 0.2637. Because dist({3, vi, 4}, {2, 5}) is smaller than dist({3, 6, 4}, {1}) and dist({2, five}, {ane}), these two clusters are merged at the 4th stage

Q22. Given, six points with the following attributes:

Which of the following clustering representations and dendrogram depicts the utilise of Ward'due south method proximity function in hierarchical clustering:

Solution: (D)

Ward method is a centroid method. Centroid method calculates the proximity between two clusters by calculating the distance between the centroids of clusters. For Ward's method, the proximity betwixt two clusters is divers every bit the increment in the squared error that results when two clusters are merged. The results of applying Ward's method to the sample data set up of half-dozen points. The resulting clustering is somewhat different from those produced past MIN, MAX, and group average.

Q23. What should exist the best choice of no. of clusters based on the following results:

A. 1

B. two

C. 3

D. 4

Solution: (C)

The silhouette coefficient is a measure of how similar an object is to its own cluster compared to other clusters. Number of clusters for which silhouette coefficient is highest represents the all-time choice of the number of clusters.

Q24. Which of the post-obit is/are valid iterative strategy for treating missing values before clustering assay?

A. Imputation with hateful

B. Nearest Neighbor assignment

C. Imputation with Expectation Maximization algorithm

D. All of the higher up

Solution: (C)

All of the mentioned techniques are valid for treating missing values earlier clustering assay only only imputation with EM algorithm is iterative in its functioning.

Q25. K-Hateful algorithm has some limitations. One of the limitation it has is, information technology makes hard assignments(A point either completely belongs to a cluster or not belongs at all) of points to clusters.

Annotation: Soft consignment tin can exist consider as the probability of being assigned to each cluster: say G = 3 and for some point xn, p1 = 0.7, p2 = 0.2, p3 = 0.1)

Which of the following algorithm(s) allows soft assignments?

Gaussian mixture models
Fuzzy K-means

Options:

A. 1 only

B. 2 but

C. ane and 2

D. None of these

Solution: (C)

Both, Gaussian mixture models and Fuzzy M-means allows soft assignments.

Q26. Presume, you want to cluster 7 observations into 3 clusters using K-Means clustering algorithm. After beginning iteration clusters, C1, C2, C3 has post-obit observations:

C1: {(2,two), (iv,four), (6,vi)}

C2: {(0,iv), (iv,0)}

C3: {(5,v), (ix,9)}

What volition exist the cluster centroids if y'all want to proceed for second iteration?

A. C1: (4,iv), C2: (two,2), C3: (7,7)

B. C1: (6,6), C2: (4,four), C3: (ix,nine)

C. C1: (two,2), C2: (0,0), C3: (5,5)

D. None of these

Solution: (A)

Finding centroid for data points in cluster C1 = ((2+4+vi)/3, (2+4+6)/3) = (iv, 4)

Finding centroid for data points in cluster C2 = ((0+4)/two, (four+0)/2) = (two, 2)

Finding centroid for data points in cluster C3 = ((5+9)/2, (v+9)/2) = (7, 7)

Hence, C1: (4,4), C2: (two,ii), C3: (vii,7)

Q27. Assume, you want to cluster 7 observations into iii clusters using K-Means clustering algorithm. After first iteration clusters, C1, C2, C3 has following observations:

C1: {(two,2), (4,4), (6,6)}

C2: {(0,4), (iv,0)}

C3: {(v,5), (ix,ix)}

What will exist the Manhattan distance for ascertainment (9, 9) from cluster centroid C1. In second iteration.

A. 10

B. 5*sqrt(ii)

C. xiii*sqrt(two)

D. None of these

Solution: (A)

Manhattan distance between centroid C1 i.due east. (4, 4) and (9, 9) = (9-4) + (9-four) = x

Q28. If two variables V1 and V2, are used for clustering. Which of the post-obit are true for Grand means clustering with k =3?

If V1 and V2 has a correlation of i, the cluster centroids will be in a direct line
If V1 and V2 has a correlation of 0, the cluster centroids volition be in directly line

Options:

A. 1 merely

B. two only

C. 1 and 2

D. None of the in a higher place

Solution: (A)

If the correlation between the variables V1 and V2 is 1, then all the information points will exist in a straight line. Hence, all the iii cluster centroids will class a straight line as well.

Q29. Characteristic scaling is an important pace earlier applying K-Mean algorithm. What is reason backside this?

A. In distance calculation it will requite the same weights for all features

B. You always go the same clusters. If you lot use or don't utilise feature scaling

C. In Manhattan distance it is an important step just in Euclidian it is not

D. None of these

Solution; (A)

Characteristic scaling ensures that all the features get same weight in the clustering assay. Consider a scenario of clustering people based on their weights (in KG) with range 55-110 and summit (in inches) with range 5.6 to 6.iv. In this example, the clusters produced without scaling can be very misleading as the range of weight is much higher than that of acme. Therefore, its necessary to bring them to aforementioned scale so that they have equal weightage on the clustering result.

Q30. Which of the post-obit method is used for finding optimal of cluster in K-Hateful algorithm?

A. Elbow method

B. Manhattan method

C. Ecludian mehthod

D. All of the above

E. None of these

Solution: (A)

Out of the given options, just elbow method is used for finding the optimal number of clusters. The elbow method looks at the percentage of variance explained equally a part of the number of clusters: Ane should choose a number of clusters and then that adding another cluster doesn't requite much better modeling of the data.

Q31. What is true about Thousand-Hateful Clustering?

Thousand-ways is extremely sensitive to cluster center initializations
Bad initialization can lead to Poor convergence speed
Bad initialization can pb to bad overall clustering

Options:

A. 1 and iii

B. i and two

C. 2 and three

D. i, 2 and 3

Solution: (D)

All three of the given statements are true. K-means is extremely sensitive to cluster center initialization. Also, bad initialization tin can pb to Poor convergence speed as well as bad overall clustering.

Q32. Which of the following can be applied to get adept results for K-means algorithm corresponding to global minima?

Try to run algorithm for different centroid initialization
Adjust number of iterations
Find out the optimal number of clusters

Options:

A. two and 3

B. 1 and 3

C. 1 and 2

D. All of higher up

Solution: (D)

All of these are standard practices that are used in order to obtain good clustering results.

Q33. What should be the best selection for number of clusters based on the following results:

A. 5

B. 6

C. 14

D. Greater than 14

Solution: (B)

Based on the above results, the best choice of number of clusters using elbow method is vi.

Q34. What should be the best choice for number of clusters based on the following results:

A. 2

B. 4

C. 6

D. viii

Solution: (C)

By and large, a higher average silhouette coefficient indicates better clustering quality. In this plot, the optimal clustering number of grid cells in the study area should be 2, at which the value of the average silhouette coefficient is highest. However, the SSE of this clustering solution (k = 2) is too big. At g = 6, the SSE is much lower. In add-on, the value of the boilerplate silhouette coefficient at thousand = 6 is likewise very high, which is just lower than g = 2. Thus, the best choice is one thousand = vi.

Q35. Which of the following sequences is correct for a 1000-Means algorithm using Forgy method of initialization?

Specify the number of clusters
Assign cluster centroids randomly
Assign each data point to the nearest cluster centroid
Re-assign each point to nearest cluster centroids
Re-compute cluster centroids

Options:

A. i, 2, 3, 5, 4

B. 1, 3, 2, 4, 5

C. 2, i, 3, 4, 5

D. None of these

Solution: (A)

The methods used for initialization in M means are Forgy and Random Partition. The Forgy method randomly chooses chiliad observations from the data set and uses these as the initial means. The Random Sectionalisation method first randomly assigns a cluster to each observation and and so proceeds to the update step, thus calculating the initial mean to be the centroid of the cluster's randomly assigned points.

Q36. If you are using Multinomial mixture models with the expectation-maximization algorithm for clustering a set of data points into two clusters, which of the assumptions are of import:

A. All the data points follow 2 Gaussian distribution

B. All the data points follow north Gaussian distribution (n >2)

C. All the data points follow two multinomial distribution

D. All the data points follow north multinomial distribution (northward >2)

Solution: (C)

In EM algorithm for clustering its essential to choose the aforementioned no. of clusters to classify the information points into as the no. of unlike distributions they are expected to exist generated from and also the distributions must be of the same blazon.

Q37. Which of the following is/are not truthful about Centroid based Yard-Ways clustering algorithm and Distribution based expectation-maximization clustering algorithm:

Both starts with random initializations
Both are iterative algorithms
Both have stiff assumptions that the information points must fulfill
Both are sensitive to outliers
Expectation maximization algorithm is a special instance of K-Means
Both requires prior knowledge of the no. of desired clusters
The results produced past both are non-reproducible.

Options:

A. i only

B. v only

C. 1 and three

D. 6 and 7

E. 4, 6 and 7

F. None of the to a higher place

Solution: (B)

All of the above statements are true except the v^th as instead K-Means is a special case of EM algorithm in which only the centroids of the cluster distributions are calculated at each iteration.

Q38. Which of the following is/are not truthful virtually DBSCAN clustering algorithm:

For data points to be in a cluster, they must exist in a altitude threshold to a core indicate
It has strong assumptions for the distribution of data points in dataspace
Information technology has substantially high time complexity of order O(n³)
It does not require prior cognition of the no. of desired clusters
It is robust to outliers

Options:

A. 1 but

B. 2 only

C. 4 only

D. 2 and 3

E. one and 5

F. 1, 3 and v

Solution: (D)

DBSCAN tin can form a cluster of any capricious shape and does non accept strong assumptions for the distribution of data points in the dataspace.
DBSCAN has a low time complication of order O(north log n) simply.

Q39. Which of the following are the high and depression bounds for the existence of F-Score?

A. [0,1]

B. (0,1)

C. [-1,1]

D. None of the higher up

Solution: (A)

The everyman and highest possible values of F score are 0 and 1 with one representing that every data betoken is assigned to the right cluster and 0 representing that the precession and/ or recall of the clustering analysis are both 0. In clustering analysis, high value of F score is desired.

Q40. Following are the results observed for clustering 6000 data points into 3 clusters: A, B and C:

What is the F₁-Score with respect to cluster B?

A. 3

B. four

C. 5

D. 6

Solution: (D)

Hither,

True Positive, TP = 1200

True Negative, TN = 600 + 1600 = 2200

False Positive, FP = 1000 + 200 = 1200

False Negative, FN = 400 + 400 = 800

Therefore,

Precision = TP / (TP + FP) = 0.five

Recall = TP / (TP + FN) = 0.6

Hence,

F_one= ii * (Precision * Recall)/ (Precision + recall) = 0.54 ~ 0.5

Finish Notes

I hope you lot enjoyed taking the examination and plant the solutions helpful. The exam focused on conceptual besides every bit practical knowledge of clustering fundamentals and its diverse techniques.

I tried to clear all your doubts through this article, only if we have missed out on something then allow us know in comments below. Also, If you have any suggestions or improvements you think nosotros should make in the next skilltest, you can permit us know past dropping your feedback in the comments department.