# cluster-density-reducing unsupervised feature selection algorithm

(1) Given a distance measure between two vectors X and Y,

D(X,Y) = 1⁄2 { (var(X) + var(Y))2 - √[ (var(X) + var(Y))2 - 4 var(X) var(Y) (1-ρ(X,Y)2 ] }

ρ(x,y) = cov(X,Y)/√(var(X)var(Y)).

var(X) = (1/n) (∑(X-meanX)2), where n is the dimensionality of vector X.

cov(X,Y) = (1/n) (∑(X-meanX)(Y-meanY)), where both X and Y have the same dimensionality, n.

(2) Given a number of features (n), there exists a feature set O = {Fi, i= 1, 2... n}, which may be reducible. For the sake of this test, the features are represented by ‘feature vectors’ (e.g., X=<FV1, FV2, ... FVn>), where the dimensions of each and every vector is equal to the number of randomly-picked instances of a dataset used to construct the feature vectors (say 10% of the available instances).

(3) Implement the following cluster-density-reducing unsupervised feature selection algorithm (or CDR)

For every feature vector, compute the D distance between the vector and all other vectors; Find the smallest distance (>0) between any two vectors: call that D_min_global;

Let e = k x D_min_global (where k is a real-valued constant =>1 entered by the user);

Repeat {

For all vectors whose minimum D distance to another vector =~ D_min_global Do

{

Find a nearest vector (this will give you a pair of vectors, Va and Vb); Discard one of the pair of vectors (Va, Vb); // function description below

}

Re-compute D_min_global; // as the elimination of features will alter it }

Until (D_min_global > e)

Output retained features; // just a list of feature indexes, such as FV1, FV3, FV14.

Discard one of the pair of vectors (Va, Vb) { Create a vector_cluster = {Va,Vb};

Let k=2; // meaning 2nd nearest neighbour

10: If there are no more nearest neighbours Then { Randomly discard Va or Vb; Flip switch; Exit function };

}

Find the k-nearest_neighbour of Va and of Vb; // they may be two distinct vectors or the same one Let vector_cluster = vector_cluster + k-nearest_neighbour of Va and of Vb;

Calculate D_average of Va and of Vb to the other vectors in the vector_cluster;

If D_average of (Va>Vb) OR (Va<Vb) Do

{

If (switch = true) Then discard Va or Vb with lowest D_average Else discard the other; Switch = NOT(switch); // flip the Switch, where switch is a global Boolean variable

}

Else { k++; go to 10 } // in order to increase the size of the vector_cluster

(4) To solve the test, you will need to:

- Implementing & debugging the distance measure (described above)

- Possibly, adapt a nearest neighbor classifier to the needs of applying CDR to one of the data set below.

Description: [login to view URL]

Example code in Java: [login to view URL]

- Design & Implement a function to read from one of the data sets at:

[login to view URL] (e.g., REALDISP, Image Segmentation and Robot Execution

Failures datasets).

- Incrementally Implement & Debug the algorithm (CDR): first to compute D, then the Discard function and

finally, to implement the whole algorithm.

- Provide a theoretical assessment of the time complexity of CDR as a function of n (the initial number of

features); test your prediction empirically by running your program on sub-sets with different n values.

- Present the program & output file and a written report on the results- especially, theoretical & empirical

time scalability results.

Marks are out of 100% and they are assigned as follows: 35% accurate & efficient Java or C++ program implementation of the CDR algorithm (USB key); 15% a complete output file exhibiting feature reduction for one dataset (USB key); 15% correct theoretical analysis of the order of time complexity of your implementation (paper); 35% correct analysis of the empirical time-performance of your program, with justification for any divergence between it and the theoretical analysis (on paper). Attach a completed & signed statement of originality form to your submission.

Kemahiran: Kejuruteraan, Java

Tentang Majikan:
( 10 ulasan ) Montreal, Canada

ID Projek: #6677637

## 2 pekerja bebas membida secara purata \$84 untuk pekerjaan ini

Yknox

Hello I'm interesting your project very well I'm a Good C/C++, Java, Math, Algorithm expert. I understand your req exactly. I m quite well experienced in these assignment jobs. Let's go ahead with me I want t Lagi

\$90 CAD dalam sehari
(436 Ulasan)
8.5
MukrramBajwa

Dear i have done a lot of work in it and also currently working on these type of projects. I will deliver you best quality work on time.

\$77 CAD dalam sehari
(8 Ulasan)
2.8