machine learning - Python - Intra similarity -


i'm trying code in python intra similarity on iris data set. distance between elements same class. example on set:

 1  2  3  4  |0  5  6  7  8  |0   1  3  5  6  |1 11 12 13 14  |0  10  2  4  6  |1  distance1 = (1-5)^2 + (2-6)^2 + (3 - 7)^2 + (4-8)^2 distance1 = sqrt(distance1) distance2 = (1- 11)^2 + (2-12)^2 + (3 - 13)^2 + (4-14)^2 distance2 = sqrt(distance2) similarityclass0 = (ditance1 + distance2) / 2 

and have same class 1, 2 , 3 , on.

for code think functionnal pretty ugly
in input have x , y. when finish compute tab0, same tab1, tab2 etc.

my question is: how can create code n classes? goal have each line measure of intra similarity

from sklearn import datasets import numpy np   iris = datasets.load_iris()  iris.data.shape, iris.target.shape  x = iris.data #0 = setosa // 1 = versicolor // 2 = virginica y = iris.target  #at first, retrieve indexes of each classes #for example if tab0 has classes on ligne 1,2,6. tab0 store 1,2,6 tab0 = list()  tab1 = list()  tab2 = list()  j = 0  output in y:     if output == 0 :         tab0.append(j)     if output == 1 :         tab1.append(j)     if output == 2 :         tab2.append(j)     j = j + 1  ######################################################################## #computation intra similarity# import math  sim0_intra = list() sim1_intra = list() sim2_intra = list()  #classes stores 1,2,3 ( 3classes), count number of elements in each classes classes, count = np.unique(y, return_counts=true)  temp = 0  in tab0:     temp = 0     j in tab0:         k in range(len(x[0])):             temp = temp + np.square(x[i][k] - x[j][k])      sim0_intra.append(np.sqrt(temp / ( count[0] - 1)) ) 

you can use sklearn.metrics.pairwise.pairwise_distances returns distance matrix, , default using 'euclidean' similarity (the function computed in example).

you'll find here http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html

and here, code ;)

import numpy np sklearn import datasets sklearn.metrics import  pairwise   iris = datasets.load_iris()  x = iris.data y = iris.target  # dividing x classes {0,1,2} perform intra-distances x0 = x[np.where(y==0)]  x1 = x[np.where(y==1)] x2 = x[np.where(y==2)]  sim0_intra = pairwise.pairwise_distances(x0, metric='euclidean') sim1_intra = pairwise.pairwise_distances(x1, metric='euclidean') sim2_intra = pairwise.pairwise_distances(x2, metric='euclidean') 

as documentation states, pairwise_distances returns "a distance matrix d such d_{i, j} distance between ith , jth vectors of given matrix x"

so, in our case, example: sim0_intra[0][1] --> 0.53851648071346281 distance between first , second elements of class 0. , no surprise if ask sim0_intra[5][5] --> 0.0 , observe distance 0, asking distance element none :)

and finally, ask mean value in each matrix , give intra-similarity:

similarityclass0 = sim0_intra.sum()/(50*50-50) # output: 0.69812194319103826 similarityclass1 = sim1_intra.sum()/(50*50-50) # output: 0.99736067331161615 similarityclass2 = sim2_intra.sum()/(50*50-50) # output: 1.1767808010528609 

i'm calculating myself mean (there should prettier ways it). i'm adding distances (which way added twice) , dividing total number of elements (50*50) substracting ones in diagonal.

note: i've tried several things, np.triu gives upper part of matrix , tried call mean mean taking account lower part of matrix well, being 0. so... if prettier ways come up, please share! :)


Comments

Popular posts from this blog

Hatching array of circles in AutoCAD using c# -

ios - UITEXTFIELD InputView Uipicker not working in swift -

Python Pig Latin Translator -