Open Access   Article Go Back

A Supervised Forum Crawler

Sreeja S R1 , Sangita Chaudhari2

Section:Research Paper, Product Type: Conference Paper
Volume-04 , Issue-02 , Page no. 43-47, Apr-2016

Online published on May 10, 2016

Copyright © Sreeja S R, Sangita Chaudhari . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Sreeja S R, Sangita Chaudhari, “A Supervised Forum Crawler,” International Journal of Computer Sciences and Engineering, Vol.04, Issue.02, pp.43-47, 2016.

MLA Style Citation: Sreeja S R, Sangita Chaudhari "A Supervised Forum Crawler." International Journal of Computer Sciences and Engineering 04.02 (2016): 43-47.

APA Style Citation: Sreeja S R, Sangita Chaudhari, (2016). A Supervised Forum Crawler. International Journal of Computer Sciences and Engineering, 04(02), 43-47.

BibTex Style Citation:
@article{R_2016,
author = {Sreeja S R, Sangita Chaudhari},
title = {A Supervised Forum Crawler},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {4 2016},
volume = {04},
Issue = {02},
month = {4},
year = {2016},
issn = {2347-2693},
pages = {43-47},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=50},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=50
TI - A Supervised Forum Crawler
T2 - International Journal of Computer Sciences and Engineering
AU - Sreeja S R, Sangita Chaudhari
PY - 2016
DA - 2016/05/10
PB - IJCSE, Indore, INDIA
SP - 43-47
IS - 02
VL - 04
SN - 2347-2693
ER -

           

Abstract

Web Forums or Internet Forums provide a space for users to share, discuss and request information. Web Forums are sources of huge amount of structured information that is rapidly changing. So crawling Web Forums requires special softwares. A Generic Deep Web Crawler or a Focused Crawler cannot be used for this purpose. In this paper, we propose an effective Web Crawler especially for Internet Forums. This Forum Crawler overcomes the drawbacks of many of the existing Forum Crawlers. It has the ability to detect the Entry URL (Uniform Resource Locator) of a Forum site, given any page of it. Crawling process starting from Entry URL increases the coverage. Different URLs in the Web Forums are classified into four categories. The entire crawling process is divided into a learning part and an online crawling part. Learning part will create regular expressions based on URLs and crawling part actually crawls the Web pages.

Key-Words / Index Term

Forum Crawling; URL Type; Page Classification; Crawling Strategy; Javascript-based URLs

References

[1] Internet forum. http://en.wikipedia.org/wiki/Internet forums.
[2] Web Crawler. http://en.wikipedia.org/wiki/Webcrawler.
[3] Asa Ben-Hur and JasonWeston. A user’s guide to support vector machines. In Data mining techniques for the life sciences, pages 223–239. Springer, 2010N.B. Salem, and J-P Hubaux, “Securing Wireless Mesh Networks”, IEEE Wireless Communications, Vol.13, Issue-2, 2006, pp.50-55.
[4] Rui Cai, Jiang-Ming Yang, Wei Lai, Yida Wang, and Lei Zhang. irobot: An intelligent crawler for web forums. In Proceedings of the 17th international conference on World Wide Web, pages 447–456. ACM, 2008.
[5] Li Ding, Tim Finin, Anupam Joshi, Rong Pan, R Scott Cost, Yun Peng, Pavan Reddivari, VC Doshi, and Joel Sachs. Swoogle: A semantic web search and metadata engine. In Proc. 13th ACM Conf. on Information and Knowledge Management, pages 65–69, 2004.
[6] Hai Dong and Farookh Khadeer Hussain. Focused crawling for automatic service discovery, annotation, and classification in industrial digital ecosystems. Industrial Electronics,IEEE Transactions on, 58(6):2106–2116, 2011.
[7] Yan Guo, Kui Li, Kai Zhang, and Gang Zhang. Board forum crawling: a web crawling method for web forum. In Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pages 745–748. IEEE Computer Society, 2006.
[8] Amit Sachan, Wee-Yong Lim, and Vrizlynn LL Thing. A generalized links and text properties based forum crawler. In Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology-Volume 01, pages 113–120. IEEE Computer Society, 2012.
[9] ] Jingtian Jiang, Nenghai Yu, and Chin-Yew Lin. Focus: learning to crawl web forums. In Proceedings of the 21st international conference companion on World Wide Web, pages 33–42. ACM, 2012.
[10] Alex Goh Kwang Leng, KP Ravi, Ashutosh Kumar Singh, and Rajendra Kumar Dash.Pybot: An algorithm for web crawling. In Nanoscience, Technology and Societal Implications (NSTSI), 2011 International Conference on, pages 1–6. IEEE, 2011.
[11] Ian H Witten, Eibe Frank, Leonard E Trigg, Mark A Hall, Geoffrey Holmes, and Sally Jo Cunningham. Weka: Practical machine learning tools and techniques with java implementations. 1999.
[12] Jamali, Mohsen, et al. "A method for focused crawling using combination of link structure and content similarity." Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence. IEEE Computer Society, 2006.