Research

Columbus - Reverse Engineering Tool and Schema for C++

Columbus - Reverse Engineering Tool and Schema for C++

Authors: Rudolf Ferenc, Árpád Beszédes, Mikko Tarkiainen and Tibor Gyimóthy

In Proceedings of the 6th International Conference on Software Maintenance (ICSM 2002), Montréal, Canada, pages 172-181, October 3-6, 2002. Published by IEEE Computer Society.

Abstract:
One of the most critical issues in large-scale software development and maintenance is the rapidly growing size and complexity of software systems. As a result of this rapid growth there is a need to better understand the relationships between the different parts of a large software system. In this paper we present a reverse engineering framework called Columbus that is able to analyze large C++ projects, and a schema for C++ that prescribes the form of the extracted data. The flexible architecture of the Columbus sys-tem with a powerful C++ analyzer and schema makes it a versatile and readily extendible toolset for reverse engineering. This tool is free for scientific and educational purposes and we fervently hope that it will assist academic persons in any research work related to C++ re- and reverse engineering.


Journal/Proceedings/Conference: http://csdl.computer.org/comp/proceedings/icsm/2002/1819/00/1819toc.htm


Download full paper: http://csdl.computer.org/dl/proceedings/icsm/2002/1819/00/18190172.pdf


Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction

Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction

Authors: Tibor Gyimóthy, Rudolf Ferenc and István Siket

In IEEE Transactions on Software Engineering, Vol. 31, No. 10, October 2005, pages 897-910. Published by IEEE Computer Society.

Abstract:
Open source software systems are becoming increasingly important these days. Many companies are investing in open source projects and lots of them are also using such software in their own work. But, because open source software is often developed with a different management style than the industrial ones, the quality and reliability of the code needs to be studied. Hence, the characteristics of the source code of these projects need to be measured to obtain more information about it. This paper describes how we calculated the object-oriented metrics given by Chidamber and Kemerer to illustrate how fault-proneness detection of the source code of the open source Web and e-mail suite called Mozilla can be carried out. We checked the values obtained against the number of bugs found in its bug database—called Bugzilla—using regression and machine learning methods to validate the usefulness of these metrics for fault-proneness prediction. We also compared the metrics of several versions of Mozilla to see how the predicted fault-proneness of the software system changed during its development cycle.


Journal/Proceedings/Conference: http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/trans/ts/&toc=c...


Download full paper: http://csdl.computer.org/dl/trans/ts/2005/10/e0897.pdf


Data Exchange with the Columbus Schema for C++

Data Exchange with the Columbus Schema for C++

Authors: Rudolf Ferenc and Árpád Beszédes

In Proceedings of the 6th European Conference on Software Maintenance and Reengineering (CSMR 2002), Budapest, Hungary, pages 59-66, March 11-13, 2002. Published by IEEE Computer Society.

Abstract:
To successfully carry out a software maintenance or reengineering task, a suitably assembled set of tools is required, which interoperate seaminglessly. To achieve this goal, an exchange format is needed that can be used to represent the facts extracted from a software system in a standardized way; serving as an output of one tool and as an input for other tools. In this paper we propose a modular schema for C++, called the Columbus Schema. The schema has been implemented in the Columbus/CAN front end framework tool and is already utilized in several usages, one of which is its representation in the GXL form.


Journal/Proceedings/Conference: http://csdl.computer.org/comp/proceedings/csmr/2002/1438/00/1438toc.htm


Download full paper: http://csdl.computer.org/dl/proceedings/csmr/2002/1438/00/14380059.pdf


Columbus Schema for C/C++ Preprocessing

Columbus Schema for C/C++ Preprocessing

Authors: László Vidács, Árpád Beszédes and Rudolf Ferenc

In Proceedings of the 8th European Conference on Software Maintenance and Reengineering (CSMR 2004), Tampere, Finland, pages 75-84, March 24-26, 2004. Published by IEEE Computer Society.

Abstract:
File inclusion, conditional compilation and macro processing has made the C/C++ preprocessor a powerful tool for programmers. However, program code with lots of directives often causes difficulties in program understanding and maintenance. The main source of the problem is the difference between the code that the programmer sees and the preprocessed code that the compiler gets. To aid program comprehension we designed a C/C++ preprocessor schema (supplementing the Columbus Schema for C++) and implemented a preprocessor which produces both preprocessed files and schema instances. The instances of the schema may be used to model: (1) preprocessor constructs in the original source code, (2) the preprocessed compilation unit, and (3) the transformations made by the preprocessor.


Journal/Proceedings/Conference: http://csdl.computer.org/comp/proceedings/csmr/2004/2107/00/2107toc.htm


Download full paper: http://csdl.computer.org/dl/proceedings/csmr/2004/2107/00/21070075.pdf


Clone Smells in Software Evolution

Clone Smells in Software Evolution

Authors: Tibor Bakota, Rudolf Ferenc, Tibor Gyimóthy

Software Maintenance, 2007. ICSM 2007. IEEE International Conference on Volume , Issue , 2-5 Oct. 2007 Page(s):24 - 33 Digital Object Identifier 10.1109/ICSM.2007.4362615

Abstract:
Although source code cloning (copy&paste programming) represents a significant threat to the maintainability of a software system, problems usually start to arise only when the system evolves. Most of the related research papers tackle the question of finding code clones in one particular version of the software only, leaving the dynamic behavior of the clones out of consideration. Eliminating these clones in large software systems often seems absolutely hopeless, as there might exist several thousands of them. Alternatively, tracking the evolution of individual clones can be used to identify those occurrences that could really cause problems in the future versions. In this paper we present an approach for mapping clones from one particular version of the software to another one, based on a similarity measure. This mapping is used to define conditions under which clones become suspicious (or "smelly") compared to their other occurrences. Accordingly, these conditions introduce the notion of dynamic clone smells. The usefulness of these smells is validated on the Mozilla Firefox internet browser, where the approach was able to find specific bugs that resulted from neglecting earlier copy&paste activities.


Journal/Proceedings/Conference: http://ieeexplore.ieee.org/xpl/RecentCon.jsp?punumber=4362596


Download full paper: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&isnumber=&arnumber=4362615


Mining design patterns from C++ source code

Mining design patterns from C++ source code

Authors: Zsolt Balanyi, Rudolf Ferenc

Software Maintenance, 2003. ICSM 2003. Proceedings. International Conference on Volume , Issue , 22-26 Sept. 2003 Page(s): 305 - 314 Digital Object Identifier 10.1109/ICSM.2003.1235436

Abstract:
Design patterns are micro architectures that have proved to be reliable, easy-to implement and robust. There is a need in science and industry for recognizing these patterns. We present a new method for discovering design patterns in the source code. This method provides a precise specification of how the patterns work by describing basic structural information like inheritance, composition, aggregation and association, and as an indispensable part, by defining call delegation, object creation and operation overriding. We introduce a new XML-based language, the Design Pattern Markup Language (DPML), which provides an easy way for the users to modify pattern descriptions to suit their needs, or even to define their own patterns or just classes in certain relations they wish to find. We tested our method on four open-source systems, and found it effective in discovering design pattern instances.


Journal/Proceedings/Conference: http://ieeexplore.ieee.org/xpl/RecentCon.jsp?punumber=8742


Download full paper: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&isnumber=&arnumber=1235436


Evaluating C++ Design Pattern Miner Tools

Evaluating C++ Design Pattern Miner Tools

Authors: Lajos Jenő Fülöp, Tamás Gyovai, Rudolf Ferenc

Proceedings of the Sixth IEEE International Workshop on Source Code Analysis and Manipulation Pages: 127 - 138 Year of Publication: 2006 ISBN:0-7695-2353-6

Abstract:
Many articles and tools have been proposed over the years for mining design patterns from source code. These tools differ in several aspects, thus their fair comparison is hard. Besides the basic methodology, the main differences are that the tools operate on different representations of the subject system and that the pattern definitions differ as well. In this paper we first provide a common measurement platform for three well-known pattern mining systems, Columbus, Maisa and CrocoPat. Then we compare these tools on four C++ open-source systems: DC++,WinMerge, Jikes and Mozilla. Columbus can discover patterns from the C++ source code itself, while Maisa and CrocoPat require the representation of a software system in a special textual format, so we extended Columbus to provide the common input for the two other tools. We compared these tools in terms of speed, memory consumption and the differences between the hits. While the first two aspects showed comparable results, the recognition capabilities were quite diverse. This is probably due to the significant difference in how the patterns to be recognized and formalized by the tools. Therefore we conclude that a more precise and formal description of design patterns would be desirable.


Journal/Proceedings/Conference: http://www.dcs.kcl.ac.uk/staff/mark/scam2006/


Download full paper: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&isnumber=&arnumber=4026862


Calculating Metrics from Large C++ Programs

Calculating Metrics from Large C++ Programs

Authors: István Siket, Rudolf Ferenc

6 th International Conference on Applied Informatics Eger, Hungary, January 27–31, 2004.

Abstract:
In this work we present a new method called compiler wrapping for extracting information from the source code of large software systems written in the C++ language. This new method can be used without having to modify the analyzed source code in any way. With the extracted information we can calculate different object oriented metrics and characterize the analyzed system. For source code analysis and metrics calculation we employed the Columbus reverse engineering framework. To demonstrate the operability of our new approach we tested it on the open source internet suite Mozilla and found it very effective in obtaining the desired metrics.


Journal/Proceedings/Conference:


Download full paper: http://www.inf.u-szeged.hu/~ferenc/research/siketi_calculating.pdf


Extracting Facts with Columbus from C++ Code

Extracting Facts with Columbus from C++ Code

Authors: Rudolf Ferenc, Árpád Beszédes, Tibor Gyimóthy

In Tool Demonstrations of the 8th European Conference on Software Maintenance and Reengineering (CSMR 2004)

Abstract:
Fact extraction from software systems is the fundamental building block in the process of understanding the relationships among the system’s elements. It is evident that in real life situations manual fact extraction must be supported by software tools which are able to analyze the subject system and provide useful information about it in various forms. These forms are most useful if they adhere to prescribed schemas and this way promote tool interoperability. In this work we outline our solution to tool supported fact extraction, which is built upon the reverse engineering framework Columbus and is supported by schemas for the C++ language. We describe the extraction process in detail and show how the extracted facts can be used in practice by processing the schema instances. We also introduce new features of the Columbus system not published previously, which among others include compiler wrapping and source code auditing.


Journal/Proceedings/Conference: http://www.cs.tut.fi/~csmr2004/


Download full paper: http://www.inf.u-szeged.hu/~beszedes/research/ferencr_extracting.pdf