Sara Ersson

Code Clone Detection for Equivalence Assurance

Abstract

To support multiple programming languages, the concept of offering application programming interfaces (APIs) in multiple programming languages has become commonplace. However, this brings with it the challenge of ensuring that the APIs are equivalent regarding their interface. To achieve this, code clone detection techniques were adapted, to match similar function declarations in the APIs. Firstly, existing code clone detection tools were investigated. As they did not perform well, a tree-based syntactic approach was used, where all header files were compiled with Clang. The abstract syntax trees, which were obtained during the compilation, were then traversed to locate the function declaration nodes, and to store function names and parameter variable names. When matching the function names, a textual approach was used, transforming the function names according to a set of implemented rules.

A strict rule compares transformations of full function names in a precise way, whereas a loose rule only compares transformations of parts of function names, and matches anything for the remainder. An example of a strict rule is the word separator convention rule, which converts the function names to lower case, and removes underscores, to omit language-specific conventions for separating words. An example of a loose rule is one which only considers the first word in the function names. The rules were applied both by themselves, and in different combinations, starting with the strictest rule, followed by the second strictest rule, and so fourth.

The best-matching rules showed to be the ones which are strict, and are not affected by the order of the functions in which they are matched. These rules showed to be very robust to API evolution. Rules which are less strict and stable, and not robust to API evolution, can still be used, such as matching functions on the first or last word in the function names, but preferably as a complement to the stricter and more stable rules, when most of the functions already have been matched.

The tool has been evaluated on the two APIs of King’s software development kit, and covered 94% of the 124 available function matches.