daisy is a function from the cluster package, calculating all the pairwise dissimilarities (distances) between observations in the data set.
Questions tagged [r-daisy]
39 questions
14
votes
2 answers
Python equivalent of daisy() in the cluster package of R
I have a dataset that contains both categorical (nominal and ordinal) and numerical attributes. I want to calculate the (dis)similarity matrix across my observations using these mixed attributes. Using the daisy() function of the cluster package in…

Zhubarb
- 11,432
- 18
- 75
- 114
8
votes
2 answers
R: RStudio: How to get silhouette plot working?
Today I have realised that the silhouette plot in the cluster package doesn't display properly in RStudio. A Google search revealed that someone else had had a problem with…

user32259
- 1,113
- 3
- 13
- 21
6
votes
1 answer
Weighted Euclidean Distance in R
I'd like to create a distance-matrix with weighted euclidean distances from a data frame. The weights will be defined in a vector. Here's an example:
library("cluster")
a <- c(1,2,3,4,5)
b <- c(5,4,3,2,1)
c <- c(5,4,1,2,3)
df <-…

h7681
- 355
- 4
- 13
6
votes
1 answer
R Cluster Package Error Daisy() function long vectors (argument 11) are not supported in .C
Trying to convert a data.frame with numeric, nominal, and NA values to a dissimilarity matrix using the daisy function from the cluster package in R. My purpose involves creating a dissimilarity matrix before applying k-means clustering for customer…

Scott Davis
- 983
- 6
- 22
- 43
4
votes
0 answers
Compute dissimilarity matrix on parallel cores
I'm trying to compute a dissimilarity matrix based on a big data frame. As my features are mixed with categorical and numerical features I need to use the daisy function in the cluster package.
Any idea how I can run this in parallel cores? Below an…

Codutie
- 1,055
- 13
- 25
3
votes
1 answer
Getting "invalid type character" error with daisy
I have a data frame with mixed data types (integer, character, and logical) which I'm trying to cluster with daisy.
I'm using:
gower_dist <- daisy(relchoice, metric = "gower")
and getting:
Error in daisy(relchoice, metric = "gower") :
invalid…

Gilad Brandes
- 31
- 1
- 3
3
votes
1 answer
Compute dissimilarity matrix for large data
I'm trying to compute a dissimilarity matrix based on a big data frame with both numerical and categorical features. When I run the daisy function from the cluster package I get the error message:
Error: cannot allocate vector of size X.
In my…

Codutie
- 1,055
- 13
- 25
3
votes
4 answers
Cluster Analysis in R with missing data
So I spent a good amount of time trying to find the answer on how to do this. The only answer I have found so far is here: How to perform clustering without removing rows where NA is present in R
Unfortunately, this is not working for me.
So here is…

akvallejos
- 329
- 5
- 11
3
votes
2 answers
Determining optimal number of clusters and with Daisy function and Gower Similarity
I am attempting to cluster the behavioral traits of 250 species into life-history strategies. The trait data consists of both numerical and nominal variables. I am relatively new to R and to cluster analysis, but I believe the best option to find…

user2639963
- 31
- 2
2
votes
1 answer
Clustering using daisy and pam in R
I'm trying to perform a pretty straightforward clustering analysis but can't get the results right. My question for a large dataset is "Which diseases are frequently reported together?". The simplified data sample below should result in 2 clusters:…

Joep_S
- 481
- 4
- 22
2
votes
1 answer
X axis label is not showing in clustering dendrogram in ggplot
I have done a clustering dendrogram following a previous code I found online, but the x-axis of is not being shown in the graph. I would like to have the dissimilarity value shown in the x-axis, but I have not been…
2
votes
1 answer
R - Different results gower.dist and daisy(...,metric="gower")
I want to calculate the distances (dissimilarities) between the rows of two data frames in order to find the closest cluster for each observation. Because I have factors and numerical variables, I'm using Gower distance. As I want to compare two…

Vanessa
- 33
- 1
- 6
2
votes
0 answers
Computing Silhouette Width - special case
I am completely redrafting this question following the advice of @MrFlick.
Assume I have a data.frame like the following
set.seed(1)
group<-(rep(1:10, sample(50:200, 10, replace=T)))
gender<-factor((sample(0:1, 1328, replace=T, prob=c(0.55,…

Riccardo
- 743
- 2
- 5
- 14
2
votes
1 answer
computing the dot product between all column pairs in a data frame
I have an R data frame which columns are logical variables.
I need to make some kind of dot product between all possible pairs of columns.
This arise from text corpus analysis, where the data frame indicates which terms (rows) are present in which…

Marc G.
- 141
- 1
- 9
1
vote
1 answer
How to make sample vs features clustering heatmap using daisy (gower) in R?
I am still learning the clustering methods.
I have a dataset with mixed types: continuous, binary, categorical. I read some articles that using 'gower' is a good clustering distance for mixed type data. So I would like to try it out and make an…

WenliL
- 419
- 2
- 14