An Approach to Uncover Hidden Topics in Short and Sparse Web Documents

Sameena, Sumayya; Rehana,

Volume 11, Issue 2, November 2014, Pages 259–263

An Approach to Uncover Hidden Topics in Short and Sparse Web Documents

BibTex | RIS | EndNote | RefWorks

@article{IJISR-14-244-03,
author = {Sumayya Sameena and Rehana},
title = {{An Approach to Uncover Hidden Topics in Short and Sparse Web Documents}},
journal = {International Journal of Innovation and Scientific Research},
volume = {11},
year = {2014},
pages = {259--263},
issue = {2},
number = {2},
issn = {2351-8014},
url = {http://www.ijisr.issr-journals.org/abstract.php?article=IJISR-14-244-03},
abstract_html_url = {http://www.ijisr.issr-journals.org/abstract.php?article=IJISR-14-244-03},
pdf_url = {http://www.issr-journals.org/links/papers.php?journal=ijisr&application=pdf&article=IJISR-14-244-03},
document_type={Article},
source={www.issr-journals.org}
}

TY  - JOUR
ID  - 
TI  - An Approach to Uncover Hidden Topics in Short and Sparse Web Documents
AU  - Sumayya Sameena
AU  - Rehana
PY  - 2014
VL  - 11
IS  - 2
SP  - 259
EP  - 263
JO  - International Journal of Innovation and Scientific Research
T2  - International Journal of Innovation and Scientific Research
SN  - 23518014
UR  - http://www.ijisr.issr-journals.org/abstract.php?article=IJISR-14-244-03
AB  - My work introduces a hidden topic-based framework for processing short and sparse documents (e.g., search result snippets, product descriptions, book/movie summaries, and advertising messages) on the Web. The framework focuses on solving two main challenges posed by these kinds of documents: 1) data sparseness and 2) synonyms/homonyms. The former leads to the lack of shared words and contexts among documents while the latter are big linguistic obstacles in natural language processing (NLP) and information retrieval (IR). The underlying idea of the framework is that common hidden topics discovered from large external data sets (universal data sets), when included, can make short documents less sparse and more topic-oriented. Furthermore, hidden topics from universal data sets help handle unseen data better. The proposed framework can also be applied for different natural languages and data domains. We carefully evaluated the framework by carrying out two experiments for two important online applications (Web search result classification and matching/ranking for contextual advertising) with large-scale universal data sets and we achieved significant results.
ER  -

TY  - JOUR
ID  - 
TI  - An Approach to Uncover Hidden Topics in Short and Sparse Web Documents
AU  - Sumayya Sameena
AU  - Rehana
PY  - 2014
VL  - 11
IS  - 2
SP  - 259
EP  - 263
JO  - International Journal of Innovation and Scientific Research
SN  - 23518014
AB  - 
My work introduces a hidden topic-based framework for processing short and sparse documents (e.g., search result snippets, product descriptions, book/movie summaries, and advertising messages) on the Web. The framework focuses on solving two main challenges posed by these kinds of documents: 1) data sparseness and 2) synonyms/homonyms. The former leads to the lack of shared words and contexts among documents while the latter are big linguistic obstacles in natural language processing (NLP) and information retrieval (IR). The underlying idea of the framework is that common hidden topics discovered from large external data sets (universal data sets), when included, can make short documents less sparse and more topic-oriented. Furthermore, hidden topics from universal data sets help handle unseen data better. The proposed framework can also be applied for different natural languages and data domains. We carefully evaluated the framework by carrying out two experiments for two important online applications (Web search result classification and matching/ranking for contextual advertising) with large-scale universal data sets and we achieved significant results.
ER  -

RT Journal Article
ID IJISR-14-244-03
A1 Sumayya Sameena
A1 Rehana
YR 2014
T1 An Approach to Uncover Hidden Topics in Short and Sparse Web Documents
JF International Journal of Innovation and Scientific Research

Download

Sumayya Sameena¹ and Rehana²

¹ M.Tech, Department of CSE, NimraCollege of Engg. & Tech, Vijayawada, Andhra Pradesh., India
² Assistant Professor in CSE Dept, NimraCollege of Engg. & Tech, Vijayawada, Andhra Pradesh., India

Original language: English

Copyright © 2014 ISSR Journals. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

My work introduces a hidden topic-based framework for processing short and sparse documents (e.g., search result snippets, product descriptions, book/movie summaries, and advertising messages) on the Web. The framework focuses on solving two main challenges posed by these kinds of documents: 1) data sparseness and 2) synonyms/homonyms. The former leads to the lack of shared words and contexts among documents while the latter are big linguistic obstacles in natural language processing (NLP) and information retrieval (IR). The underlying idea of the framework is that common hidden topics discovered from large external data sets (universal data sets), when included, can make short documents less sparse and more topic-oriented. Furthermore, hidden topics from universal data sets help handle unseen data better. The proposed framework can also be applied for different natural languages and data domains. We carefully evaluated the framework by carrying out two experiments for two important online applications (Web search result classification and matching/ranking for contextual advertising) with large-scale universal data sets and we achieved significant results.

Author Keywords: Web mining, matching, Natural Language Processing, classification, clustering.

How to Cite this Article

Sumayya Sameena and Rehana, “An Approach to Uncover Hidden Topics in Short & Sparse Web Documents,” International Journal of Innovation and Scientific Research, vol. 11, no. 2, pp. 259–263, November 2014.

About IJISR

News

Submission

Downloads

Archives

Custom Search

Contact

Connect with IJISR

An Approach to Uncover Hidden Topics in Short & Sparse Web Documents

Abstract

How to Cite this Article