machine learning - Python - Intra similarity -
i'm trying code in python intra similarity on iris data set. distance between elements same class. example on set:
1 2 3 4 |0 5 6 7 8 |0 1 3 5 6 |1 11 12 13 14 |0 10 2 4 6 |1 distance1 = (1-5)^2 + (2-6)^2 + (3 - 7)^2 + (4-8)^2 distance1 = sqrt(distance1) distance2 = (1- 11)^2 + (2-12)^2 + (3 - 13)^2 + (4-14)^2 distance2 = sqrt(distance2) similarityclass0 = (ditance1 + distance2) / 2
and have same class 1, 2 , 3 , on.
for code think functionnal pretty ugly
in input have x , y. when finish compute tab0, same tab1, tab2 etc.
my question is: how can create code n classes? goal have each line measure of intra similarity
from sklearn import datasets import numpy np iris = datasets.load_iris() iris.data.shape, iris.target.shape x = iris.data #0 = setosa // 1 = versicolor // 2 = virginica y = iris.target #at first, retrieve indexes of each classes #for example if tab0 has classes on ligne 1,2,6. tab0 store 1,2,6 tab0 = list() tab1 = list() tab2 = list() j = 0 output in y: if output == 0 : tab0.append(j) if output == 1 : tab1.append(j) if output == 2 : tab2.append(j) j = j + 1 ######################################################################## #computation intra similarity# import math sim0_intra = list() sim1_intra = list() sim2_intra = list() #classes stores 1,2,3 ( 3classes), count number of elements in each classes classes, count = np.unique(y, return_counts=true) temp = 0 in tab0: temp = 0 j in tab0: k in range(len(x[0])): temp = temp + np.square(x[i][k] - x[j][k]) sim0_intra.append(np.sqrt(temp / ( count[0] - 1)) )
you can use sklearn.metrics.pairwise.pairwise_distances
returns distance matrix, , default using 'euclidean' similarity (the function computed in example).
you'll find here http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html
and here, code ;)
import numpy np sklearn import datasets sklearn.metrics import pairwise iris = datasets.load_iris() x = iris.data y = iris.target # dividing x classes {0,1,2} perform intra-distances x0 = x[np.where(y==0)] x1 = x[np.where(y==1)] x2 = x[np.where(y==2)] sim0_intra = pairwise.pairwise_distances(x0, metric='euclidean') sim1_intra = pairwise.pairwise_distances(x1, metric='euclidean') sim2_intra = pairwise.pairwise_distances(x2, metric='euclidean')
as documentation states, pairwise_distances returns "a distance matrix d such d_{i, j} distance between ith , jth vectors of given matrix x"
so, in our case, example: sim0_intra[0][1] --> 0.53851648071346281
distance between first , second elements of class 0. , no surprise if ask sim0_intra[5][5] --> 0.0
, observe distance 0, asking distance element none :)
and finally, ask mean value in each matrix , give intra-similarity:
similarityclass0 = sim0_intra.sum()/(50*50-50) # output: 0.69812194319103826 similarityclass1 = sim1_intra.sum()/(50*50-50) # output: 0.99736067331161615 similarityclass2 = sim2_intra.sum()/(50*50-50) # output: 1.1767808010528609
i'm calculating myself mean (there should prettier ways it). i'm adding distances (which way added twice) , dividing total number of elements (50*50) substracting ones in diagonal.
note: i've tried several things, np.triu
gives upper part of matrix , tried call mean
mean taking account lower part of matrix well, being 0. so... if prettier ways come up, please share! :)
Comments
Post a Comment