Volume 16, Issue 1, June 2015, Pages 252–259
Urmila Mahor1 and Sujoy Das2
1 Master of Computer Application, Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, India
2 Master of Computer Application, Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, India
Original language: English
Copyright © 2015 ISSR Journals. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Authorship attribution tries to identify the original author of an unattributed text or document. Authorship attribution is a challenging task as it becomes quite difficult to identify original author automatically. Stylometry and authorship recognition or attribution is used interchangeably. Normally authorship attribution is done on the basis of lexical, syntactic and semantic features of a document. More recently, the problem of authorship attribution has gained wide variety attentions in the field of forensic analysis, electronic commerce etc. In this paper various feature selection, reduction and classification techniques are compared for attributing authorship of a document on PAN CLEF 2012 data set. LDA performed 12% well over all other classifiers.
Author Keywords: machine learning, attribute selection, classification, WEKA 3.6.
Urmila Mahor1 and Sujoy Das2
1 Master of Computer Application, Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, India
2 Master of Computer Application, Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, India
Original language: English
Copyright © 2015 ISSR Journals. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Authorship attribution tries to identify the original author of an unattributed text or document. Authorship attribution is a challenging task as it becomes quite difficult to identify original author automatically. Stylometry and authorship recognition or attribution is used interchangeably. Normally authorship attribution is done on the basis of lexical, syntactic and semantic features of a document. More recently, the problem of authorship attribution has gained wide variety attentions in the field of forensic analysis, electronic commerce etc. In this paper various feature selection, reduction and classification techniques are compared for attributing authorship of a document on PAN CLEF 2012 data set. LDA performed 12% well over all other classifiers.
Author Keywords: machine learning, attribute selection, classification, WEKA 3.6.
How to Cite this Article
Urmila Mahor and Sujoy Das, “Performance Evaluation of Various Feature Extraction and Classification Techniques for Authorship Attribution,” International Journal of Innovation and Scientific Research, vol. 16, no. 1, pp. 252–259, June 2015.