Open Access   Article Go Back

Pragmatic Aspects of Token-based Technique in Detecting Source Code Duplicates

S. Bharti1 , H. Singh2

Section:Research Paper, Product Type: Journal Paper
Volume-06 , Issue-05 , Page no. 43-49, Jun-2018

Online published on Jun 30, 2018

Copyright © S. Bharti, H. Singh . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: S. Bharti, H. Singh, “Pragmatic Aspects of Token-based Technique in Detecting Source Code Duplicates,” International Journal of Computer Sciences and Engineering, Vol.06, Issue.05, pp.43-49, 2018.

MLA Style Citation: S. Bharti, H. Singh "Pragmatic Aspects of Token-based Technique in Detecting Source Code Duplicates." International Journal of Computer Sciences and Engineering 06.05 (2018): 43-49.

APA Style Citation: S. Bharti, H. Singh, (2018). Pragmatic Aspects of Token-based Technique in Detecting Source Code Duplicates. International Journal of Computer Sciences and Engineering, 06(05), 43-49.

BibTex Style Citation:
@article{Bharti_2018,
author = {S. Bharti, H. Singh},
title = {Pragmatic Aspects of Token-based Technique in Detecting Source Code Duplicates},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {6 2018},
volume = {06},
Issue = {05},
month = {6},
year = {2018},
issn = {2347-2693},
pages = {43-49},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=418},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=418
TI - Pragmatic Aspects of Token-based Technique in Detecting Source Code Duplicates
T2 - International Journal of Computer Sciences and Engineering
AU - S. Bharti, H. Singh
PY - 2018
DA - 2018/06/30
PB - IJCSE, Indore, INDIA
SP - 43-49
IS - 05
VL - 06
SN - 2347-2693
ER -

           

Abstract

Clone research community has described several techniques to detect code duplicates present in the code base, mainly categorized into four classes viz. textual or text-based techniques, lexical or token-based techniques, syntactic techniques (including tree-based and metrics-based approaches) and semantic techniques. Literature lists various clone detector tools based on each category capable of detecting clones in batch mode as well as in real-time development environment. But, most of the tools use tokens as their intermediate representation of the source code upon which clone detection algorithms are applied. Thus, this paper will focus on this token-based intermediate representation and its pragmatic aspects towards code duplication detection. By discussing the practical process of converting source code into tokens as an intermediate code representation and how code duplicates are detected, authors will put light on the obscured pros and cons of this token-based approach that will help researchers to select as well as implement, or reject this approach as an intermediate representation for their duplication detection algorithms.

Key-Words / Index Term

Code Clone Detection, Clone Detection Techniques, Token-based Clone Detection Technique

References

[1] Ira D. Baxter, Andrew Yahin, Leonardo Moura, Marcelo Sant` Anna, and Lorraine Bier, "Clone Detection Using Abstract Syntax Tree," in Proceedings of 14th International Conference on Software Maintenance(ICSM`98), Bethesda, Mayland, 1998, pp. 368 - 377.
[2] Stefan Bellon, Rainer Koschke, Giuliano Antoniol, Jens Krinke, and Ettore Merlo, "Comparision and Evaluation of Clone Detection Tools," IEEE Transaction on Software Engineering, vol. 33, no. 9, pp. 577 - 591, 2007.
[3] Chanchal K. Roy and James R. Cordy, "A Survey on Software Clone Detection Research," Queen`s University, Kingston, Technical Report 2007-541, 2007.
[4] Miryung Kim, Lawrence Bergman, Tessa Lau, and David Notkin, "An Ethnographic Study of Copy and Paste Programming Practices in OOPL," in Proceedings of the 2004 International Symposium on Empirical Software Engineering (ISESE’04), Redondo Beach, CA, USA, USA, 2004.
[5] Minhaz F. Zibran, Ripon K. Saha, Muhammad Asaduzzaman, and Chanchal K. Roy, "Analysing and Forecasting Near-miss Clones in Evolving Software: An Empirical Study," in Proceedings of the 16th IEEE International Conference on Engineering of Complex Computer Systems, Las Vegas, USA, 2011, pp. 295-304.
[6] M. F. Zibran and Chanchal Kumar Roy, "The Road to Software Clone Management: A Survey," Department of Computer Science, University of Saskatchewan, Canada, Technical Report 2012.
[7] Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue, "CCFinder: A Multilinguistic Token-Based Code Clone Detection System For Large Scale Source Code," IEEE Transactions on Software Engineering, vol. 28, no. 7, pp. 654-670, July 2002.
[8] Brenda Baker, "On Finding Duplication and Near Duplication in Large Software Systems," in Proceedings of the 2nd Working Conference on Reverse Engineering (WCRE`95), 1995, pp. 86 - 95.



[9] Zhenmin Li, Shan mar, Yuanyuan ZohuLu, and Suvda Myag, "CP-Miner: Finding Copy Paste and Related Bugs in Large Scale Software Code," IEEE Transaction on Software Engineering, vol. 32, no. 3, pp. 176 - 192, March 2006.
[10] Wikipedia.[Online]. https://en.wikipedia.org/wiki/Lexical_analysis

[11] Alfred V. Aho, Monica S. Lam, and Jeffrey D. Ullman Ravi Sethi, Compilers: Principles, Techniques, and Tools, 2nd ed.: Pearson.
[12] Raimer Falke, Pierre Frenzel, and Rainer Koschke, "Empirical Evaluation of Clone Detection using Syntax Suffix Trees," Empirical Software Engineering, vol. 13, no. 6, pp. 601 - 643, July 2008.
[13] Elizabeth Burd and John Bailey, "Evaluating Clone Detection Tools for Use during Preventative Maintenance," in Proceedings of the Second IEEE International Workshop on Source Code Analysis and Manipulation (SCAM `02), Montreal, Canada, 2002, pp. 36-43.
[14] M. Rieger, "Effective Clone Detection without Language Barriers," University of Bern, Switzerland, Dissertation 2005.
[15] Chanchal Kumar Roy, James Cordy, and Rainer Koschke, "Comparison and Evaluation of Code Clone Detection Techniques and Tools: A Quantitative Approach," Science of Computer Programming, vol. 74, no. 7, pp. 470 - 495, March 2009.