Dibatalkan

Find a pratical example with datas based on hierarchical clustering algorithm of S. Dasgupta - 02/09/2018 02:53 EDT -- 2

The goal of the project is to get an example with metric datas which is based on the article : " Performance guarantee for hierarchical clustering " authors : Dasgupta and Long,

you can also find this article on line : [login to view URL]~dasgupta/papers/[login to view URL]

Deadline is 7th september 2018

This jobs suits to Dataminers, Data scientist , machine learning Specialist, Statisticians.....

The goal of the project is to find metric datas and applying Dasgupta hierarchical clustering algorithm : first step is to apply Farthest First Traversal algorithm on metric data : author Gonzalez : then we get a tree and then applying Dasgupta algorithm on this tree of Farthest First Traversal algorithm: we must have better or optimal clustering by getting shorter distance with Dasgupta algorithm.

For this project : any distance can be used except euclidian distance

Distance that can be used are : Lorentzian or Jaccard or Canbera , or any other distance etc..... But the most important thing is to get a shorter distance when applying Dasgupta Algorithm on the tree of Farthest First Traversal algorithm : if the results are greater distance than Farthest First Traversal or equal distance : so that these results ( greater or equal ) don't show that Dasgupta Algorithm is better than Farthest First Traversal. When applying Dasgupta Algorithm ( second step ) : it is mandatory to get a shorter distance between two points on a metric space . ( please also attached txt file : important notice goal [login to view URL] )

In attached file to this mail : you can find a bad example : [login to view URL] preliminaire or new rapport preliminaire : but in these examples : distances are greater or equal to distances obtained with Farthest First Traversal algorithm. These documents are written in French , if needed, I can translated in English.

Could you manage to get a practical example where distance are shorter when applying Dasgupta Algorithm on the results of Farthest First Traversal algorithm ?

I'm looking forward hearing from you.

Have a nice day

Sezbil

new deadline is 6th september 2018 ( instead of 7th september 2018 )

There is also a little algorithmic demonstration to finish :
the goal is to argue or reason that P is a Tree :

- starting from a fonction p from {2,...,n} to {1,...,n} so that p(i) < i. In particular p(2)=1.
- Argue or reason that p is a Tree
- For k<= n. if we delete (2,1), (3,p(3)), ..., (k,p(k)), then we get connected components
and the points 1, 2,..., k are not connected. These points will be choosen as k centers.
If m > k, the distance of m to its center is given by the path
m -> p(m) -> p(p(m)) -> p(p(p(m))) -> ... until we reach a point of {1,2,...,k}.
- Because of triangular inequality, the distance of m to its center is delimited by
d(m,p(m))+d(p(m),p(p(m)))+...
- Etc.

This job is to argue or reason that P is a tree and finishing the demonstration

but please be aware that the most important thing : when applying Dasgupta Algorithm : is to get at least a shorter distance than the distance obtained with farthest first traversal algorithm
with at least one of the clustering : for example , it could be clustering two : k=2 or clustering three K=3 or clustering four K=4, etc...
In applying Dasgupta algorithm on a set of datas: you have two steps :
first step is : applying farthest first traversal algorithm using any distance but not euclidian distance.
second step : is applying Dasgupta Algorithm on the result obtained from the first step (farthest first traversal algorithm) so distance can also change but to show that Dasgupta Algorithm is better than the Farthest First Traversal Algorithmat least one of the clustering ( k=2 or k=3, ...) : and at least one distance obtained with Dasgupta Algorithm must be shorter than the distance obtained in the same cluster with Farthest First Traversal Algorithm.
For example : for a two clustering algorithm : k= 2 , the distance obtained with Dasgupta Algorithm between two points in a metric space must be shorter than the distance obtained for the same clustering k=2 with Farthest First Traversal Algorithm.

Another thing:in the end,I need also a Table(for example see page 14 of attached file:Old rapport preliminaire.pdf): a Table comparing for cost between Dasgupta Algorithm cost, Farthest First Traversal cost and Optimal cost.I need these three costs in one table in this case cost are distances : in this board, you have to calculate Optimal cost , Dasgupta algorithm Cost,
and Farhest First Traversal algorithm cost , and in the last column : you have to calculate maximal factor for each clustering : k=2 and K=3 and K=4, etc......until the last clustering

Maximal factor is : cost of clustering obtained with Dasgupta Algorithm divided by the optimal cost

cost dasgupta algorithm
Maximal factor= __________________________
optimal cost


Definition of Optimal cost : a k-clustering is optimal if it minimizes the cost. In other words, a k-clustering is optimal if there is no other k-clustering with a strictly smaller cost. In the article, the cost of a k-clustering is defined as the radius maximum of k clusters.

For example : if you see attached file : Old rapport preliminaire.pdf :
there is a bad example : because distance obtained with Dasgupta Algorithm are greater than distance obtained with Farthest First Traversal algorithm .
page 18 : the is a graphic for k=4 clustering
This is comparing results Farthest First Traversal algorithm and Dasgupta Algorithm :
- Farthest First Traversal algorithm : distance between point 4 and point 6 is 2
point 4 : (10,5)
point 6 : ( 8,5 )
euclidian distance {(8,5),(10 ,5)} = 2
- Dasgupta Algorithm : distance between point 2 and 6 is 3
point 6 : (8,5)
point 2 : (8,8)
euclidian distance {(8,8),(8 ,5)} = 3
In conclusion : this example is not good : Dasgupta Algorithm = 3 > 2 = Farthest First Traversal algorithm so this example doesn't show that Dasgupta Algorithm is better than Farthest First Traversal algorithm because distance obtained with Dasgupta algorithm = 3 which is greater than distance obtained with Farthest First Traversal = 2.
Another case which would also be a bad example : if you get same or equal distance in the same cluster between two points for Dasgupta Algorithm as well as for Farthest First Traversal algorithm
Another bad thing in this example ( in old rapport preliminaire.pdf ) is that they used Euclidian distance but this distance is not aloud in this project.




Kemahiran: Algoritma, Perlombongan Data, Sains Data, Pembelajaran Mesin, Bahasa Pengaturcaraan R

Lihat lagi: hclust r, hierarchical clustering heatmap, agglomerative clustering, hierarchical clustering r heatmap, hierarchical clustering example, hierarchical clustering correlation matrix r, r heatmap clustering method, hierarchical clustering r package, find job notice boards, send link title link url link description sending form belowforget url links page find verify linkreserve refuse link exchange s, aucsmith`s algorithm, develop frortran aprogram to find the 2 second region with the largest power with respect to other 2 second regions in the signa, s www elance com q find workh, https www blogger com profile find g t l&loc0 br&loc1 s c3 a3o cristinaal, m datingbuzz co za s find populous r php, dijkstra's algorithm graph, dijkstra's algorithm shortest path, find a children's book illustrator, find a children's book illustrator uk, find a children's book illustrator ukl

Tentang Majikan:
( 0 ulasan ) Belgium

ID Projek: #17690997

7 pekerja bebas membida secara purata €195 untuk pekerjaan ini

liveexperts123

....................................................................................................................................................................

€555 EUR dalam 3 hari
(9 Ulasan)
4.9
umg536

........................................................................................................................................v

€277 EUR dalam 3 hari
(3 Ulasan)
4.0
ExperSolutions

ML DS , experts................................................................................................................................................................

€222 EUR dalam 3 hari
(13 Ulasan)
3.7
Statxn

En tant que: - licence en mathématiques appliquées; - Ingénieur en statistique et économie appliquée; - Technicienne spécialisée en Microsoft Office (diplômée auprès de la compagnie Microsoft : Diplôme MOS). Je sui Lagi

€110 EUR dalam 4 hari
(8 Ulasan)
3.4
stats4data

Data science, Python, R, Name : Snoussi Khalil age : 27 website : [login to view URL] Summary: I’m a data scientist with a Master’s degree in Statistics, Data science and applied economics fr Lagi

€50 EUR dalam 3 hari
(3 Ulasan)
1.8
dannygist

A dynamic, self motivated, and experienced statistician with diverse knowledge of statistical theories, applications, and methodology. Am looking forward to creating a mutual relationship with my clients by solving the Lagi

€30 EUR dalam sehari
(0 Ulasan)
0.0
iduyuncu

Merhaba Akademik bir proje sanirmsam. Bolumde iken machine learning ve matlab almistim. Bu tarz problemlere asinayim. Fakat suanda makinada matlab yok. Python ile implemente edilse problem olur mu? Tesek Lagi

€122 EUR dalam 3 hari
(0 Ulasan)
0.0