The breast cancer is the most threatening factor of women’s lifestyle and the reason of the disease has many factors, but still the gene factor has more influence in the generation of breast cancer where the early diagnosis and prevention is essential. There are many approaches has been discussed in the literature, but the identification and selection of a set of genes which influence the disease is still complicated one. We propose a multi variant approach for gene selection which is performed by performing high dimensional subspace clustering. With the given data set, the method generates a set of rules and unlike generic fuzzy rules the method splits the range values into the number of parts and based on that the rules are generated. Also, according to the different range values, the method generates a multi gene impact matrix where the frequency of range values of each rule is stored. The data set is clustered according to the generated rules and from the generated rules the gene selection is performed. For the gene selection, we compute the multi gene frequency measure which represents how depth the gene has an impact on the classification of disease. The proposed method produces efficient classification of genes in the influence of breast cancer and produces efficient results.
The methods of high-dimensional clustering have been applied for variety of problems and in case of decisive support systems, there are few approaches discussed earlier, but suffers with the problem of false indexing ratio with poor clustering accuracy and higher time complexity. To overcome the issue of poor clustering accuracy, a novel Kn Fast Clustering algorithm is discussed in this paper. The method generates rule sets using the data records from the data set. First the dimension N is identified and for each dimension the range values are identified. From identified fuzzy values, the method computes disease impact factor for each of the dimension or symptoms towards each disease class. Based on the impact factor and the data points, we generate rule sets that consist of a single rule for each of the disease class. The Kn Fast clustering algorithm uses the fuzzy rule sets generated and for each data point from the data set, the clustering algorithm computes KN dimensional similarity measure. Based on computed similarity measure, the data points are assigned a class, and the method reduces the false indexing, overlapping, and time complexity of clustering.