Abstract
Source code search is frequently needed and important in software development. Keyword search for source code is a widely used but a limited approach. This paper presents CodeKoan, a scalable engine for searching millions of online code examples written by the worldwide programmers' community which uses data parallel processing to achieve horizontal scalability. The search engine relies on a token-based, programming language independent algorithm and, as a proof-of-concept, indexes all code examples from Stack Overflow for two programming languages: Java and Python. This paper demonstrates the benefits of extracting crowd knowledge from Stack Overflow by analyzing well-known open source repositories such as OpenNLP and Elasticsearch: Up to one third of the source code in the examined repositories reuses code patterns from Stack Overflow. It also shows that the proposed approach recognizes similar source code and is resilient to modifications such as insertion, deletion and swapping of statements. Furthermore, evidence is given that the proposed approach returns very few false positives among the search results.
Item Type: | Journal article |
---|---|
Faculties: | Mathematics, Computer Science and Statistics > Computer Science |
Subjects: | 000 Computer science, information and general works > 004 Data processing computer science |
Language: | English |
Item ID: | 66418 |
Date Deposited: | 19. Jul 2019 12:19 |
Last Modified: | 04. Nov 2020 13:47 |