Loading...

Using the Enhancing Software Development Process Repository for Better User Recommendations

by Ziaur Rahman (Author) Md. Kamrul Hasan (Author)

Project Report 2015 6 Pages

Computer Science - Programming

Excerpt

Better User Recommendations using Enhancing Software Development Process Repository

Ziaur Rahman, Md. Kamrul Hasan Department Computer Science and Engineering Systems and Software Lab (SSL) Islamic University of Technology (IUT), Bangladesh

Abstract

Reusing previously completed software repository to enhance the development process is a common phenomenon. If developers get suggestions from the existing projects they might be benefited a lot what they eventually expect while coding. The strategies available in this field have been rapidly changing day by day. There are a number of efforts that have been focusing on mining process and constructing repository. Some of them have emphasized on the web based code searching while others have integrated web based code searching in their customized tool. But web based approaches have inefficiency especially in building repository on which they apply mining technologies. To search the code snippets in response to the user query we need an enriched repository with a better representation and abstraction. To ensure that repository before mining process we have developed a con- cept based on Enhancing Software Development Process (ESDP). In ESDP approach multiple sources of codes from both online and offline storages are considered to construct the central repository with XML representation and applied mining techniques in the client side. The respective evaluation shows that ESDP approach works much better in response time and performance than many other existing approaches available today.

Keywords—Repository; Mining; ESDP; Searching;

I. INTRODUCTION

The application of data mining has great advantages and potentials in developing software. A software developer can be guided with the knowledge extracted from the previously completed software projects and artifacts. The source files, documentations and associated files can be a rich source for building repository. To extract knowledge, snippets or any other recommendation a number of similar projects should be mined following the proper strategy. Before applying the mining algorithm repository building process should be con- sidered with the highest priority. The recommendation will be more accurate and relevant if the repository on which the data mining technique is applied is more updated and enriched. There are some approaches available where the repository building is avoided1 or any other web repository2,3 is used before the mining process. If recommendation system or tool only depends on web based code searching system then a lot of issues are needed to resolve. The storage of the web search tool on which we search is not well documented and structured. In most cases the abstraction of the sources are not well represented. The searching algorithms that the code search systems use are almost same with that the general search engine applies. The general search engine often deals with the extra large files that are mostly not convenient to the software code seekers. It is also difficult for the developer to choose which one is more relevant to him. The general searching algorithm like Pigeon Rank, Spider and Crawler that they apply is not good enough to fulfill the developer’s needs. The search result is normally indexed or ranked as per the title, heading and meta-information of the source files irrespective what actually it has inside. Web based searching has server dependency. Even in the distributed system it has multiple server dependency that often cause hazards and consume valuable time of the developers.

It is seen in the existing approaches3,1 that a code search engine is used to get the desired item following a search query. In some cases, searching happened before instant mining that we call post-mining strategy. In doing so search dimension gets excessively larger. That is a clear hindrance against finding exact match. But if the search domain is fixed only in the repository of the code search engine then it will obviously keep us away from getting the exact match. Rapidly searching online has some drawbacks itself. Sometimes it takes longer time due to request and response latency of the server.This inconveniently kills valuable time of the program- mers. However, security authentication and connection issues are also vital issue. It gets threatened when it happens over the Internet. How much a developer is likely to get connected with an unknown server while he is on the version control system that does matter?

Considering these issues a concept of a system that we call Enhancing Software Development Process (ESDP) with an enriched repository and better code abstraction can help the developers in the client.The paper shows that the development performance is highly influenced by using the Enhancing Software Development Process (ESDP) repository.

II. BACKGROUND

The application of data mining technique has great advan- tages and potentials in developing software. Software devel- oper often needs searching existing project repositories. Using the code searching tool is one of the existing approaches that can guide developer by providing related code snippet and patterns. There are a number of efforts found that influences the software development process.PR Miner4, Taxonomy approach of Mining Repositories5, Perracotta6, MAPO 2,3, XSnippet7, Mining API Pattern8, PARSEWeb9, MAC1, Scenario Based API Recommendation System10 are some of the popular efforts in this area. Mining API Usages from Open Source Repositories (MAPO)2,3 is one of the earliest and MAC1 is one of the recent effort to mine API usage pattern. Here we have explained different existing approaches that are widely used in mining software repository (MSR).

PR-Miner4 extracts programming rules in general form and propose algorithms to detect rule violations. Taxonomy approach5 deals with the software artifacts or temporal information where a demonstrated and expressive taxonomy is derived from the analysis of this literature and presents the work via four dimensions. Perracotta6 works with the scaling dynamic inference techniques and also deals with the large programs through the imperfect traces along with approximate inference algorithm.

MAPO2,3 is able to identify call patterns from the API usages of an existing project. It works on a query that describes a method, class, package for a particular API. MAC 1 mines API code snippets for code reuse. It forms a transaction database. Then a pattern database is formed from the transaction database. According to the initial statement MAC is able to predict useful related API code snippets according to the initial statement. Thus it guides developers through related API usage patterns by evaluating the support, confidence and rank list of the frequent item.

Strathcona11 gives a number of relevant snippets by matching the structure of the code under development with the snippets belongs to the repository. CodeBroker12 is mostly similar tool to Strathcona. It automatically searches the repository by using comments provided by the developer. CodeFinder13 uses a query browser to help the developer to construct the queries that can be sent to the repository.

XSnippet 7 was developed by Tansalarak and Claypool. They extend Prospector and add additional queries, ranking heuristics and mining algorithms to query a code snippet repository for the relevant snippet at hand. PARSEWeb9 developed by Thummalapenta and Xie used Google code search for collecting relevant code snippets and mines the returned code snippets to find the solution.

Saul proposed an approach14 to find API methods that are closely related to a query API method of interest by discovering API methods.Then it shares a caller or a callee with the query API method.

Another attempt is GrouMiner15,16 a novel graph- based approach for mining the usage patterns of one or multiple objects. GrouMiner approach includes a graph-based representation for the multiple object usages, a pattern mining algorithm and an anomaly detection technique that are effi- cient, accurate and resilient to the software changes. The tool that automatically builds queries to send to the repository is the Hipikat tool. Hipikat17 creates links between different sources of information in a project, including source files, cvs commits, bug reports, newsgroup postings, and web articles.

In most of the related and existing works they either have build a customized repository using single source of projects or have used different code search system away from their frame- work. ESDP approach has its own repository in the client side that is built using different sources of projects and files. The difference is the use of particular source extraction technique before applying mining technique. We have considered our repository in the client side to avoid request-response latency problems to accelerate development process.

Abbildung in dieser Leseprobe nicht enthalten

Fig. 1. Enhancing Software Development Process (ESDP) Framework. Authors' own figure.

III. PROPOSED IDEA

ESDP has enriched and updated repository building strat- egy. We have applied ESDP repository to recommend the related snippets in response to the user queries. We have initially implemented the system as Integrated Development Environment (IDE) plug-in and consequently we have been developing our own working environment. In our system we have three heuristics that we respectively call as Building Repository, Source Abstraction for Data Mining and Searching and Recommendation Heuristics. All of the three steps belongs the ESDP framework. The framework is shown in the above Figure 1. In our ESDP system we have designed the system without server dependency. Even for storing the mined sources we have applied XML based system18 in the client system. That is why it is able to handle multiple search queries simul- taneously without any concurrency problem. ESDP repository is kept away from the server failure problem. The following steps will explain the three heuristics of the framework by turn.

A. Building Repository

Repository building is important in mining software repos- itory. If the repository is more updated and enriched then the mined system will be able to recommend relevant suggestions and patterns. There are a number of issues should be con- sidered before building a repository. Two of these are very fundamental that are listed below.

1) What are the sources of codes that are used to build the repository?
2) How often the repository is updated?

If the sources are limited then the mined repository will be comparatively light weighted. Before proposing ESDP system we have made investigation throughout different data mining based API recommendation system like MAPO2, 3, GROUMINER19,20 and MACs1. Sources are taken from open source repositories in MAPO. They have not used any other sources like regular update, external APIs, standard libraries as their sources of repository. GROUMINER 19,20 also have built repository from the open source projects available on the internet. But, MACs have mined sources according to the user query instantly given by the user following post mining approach. Before mining they have dynamically built the repository using Koders.

Some of the works have not considered APIs from the standard library for a particular platform like Java or C as well as APIs found from the regular searches and APIs from the external APIs in the system. If the open source projects and the code search engines like Koders.com or Google code search21 become the only source of building repository then the mined repository will be obsolete and expired after certain period of time. We have five different sources of repository that provides APIs and class sources to form ESDP central repository as shown in Figure 1.

Considering this phenomenon the usability of the previ- ously completed repository we also have taken sources of some successfully completed projects from a software company22. As instantaneous searching has some drawbacks so we have collected and stored the Trending Search Terms of a code search engine in the ESDP repository. But if the repository is not updated after certain duration then the mined repository will be obsolete to provide exact match.So initially we update our central within three months of interval. Lastly, ESDP API developer will write and augment newer APIs to survive and sustain with the critically changed API pattern that the programmers encounter.

B. Source Abstraction for Data Mining

In data mining heuristic basically two things happen. In the first step the central repository is preprocessed to an XML repository following a special type of XML conversion strategy. Then in the searching step a data mining algorithm is applied on the XML repository to build mined API XML repository. In the central repository the API and class files are stored as .java or .jar files. First we extract these to code readable files. Then we translate the codes to an abstract form. The common item form is expressed as shown in the Table I and II.

In the second real example as shown in the Table I and II, a method invocation method A with a parameter type of java.lang.Stringand return void type appearing in the method method C() that is inside the class Class B which is under the package com.

In our ESDP tool 17 types of items are considered purpos- ing research evaluation. But for the brevity only a few items are shown in the Figure 2.

Then we cover the abstraction code with the XML meta tag. An example is given in later section of Figure 4. The transaction represents the field declaration javax.swing.JButton of the class classB in the package pkga and its position.

TABLE I. SIMULATION PARAMETERS IN CLOSE LOOK.

Abbildung in dieser Leseprobe nicht enthalten

TABLE II. DETAIL OF SOURCE ABSTRACTION.

Abbildung in dieser Leseprobe nicht enthalten

A transaction is the set of entities simultaneously used in a block such as class block or method block. ESDP recommends the sequential API code snippets and each recommendation includes several statements. The amount of statements in a sequence is called k-sequence. We took the product of k and the support value of the sequence to the rank the API pattern in the mined XML repository. We have used a recently proposed sequential pattern mining algorithm, called prefixSpan (Prefix-Projected Sequential Pattern Mining)23. The algorithm follows the pattern growth method that does not require candidate generation. It mines the complete set of patterns, but greatly reduces the effort of candidate generation and also reduces the projected repository size and lead to the efficient processing.

C. Searching and Recommendation

In searching and Recommendation heuristics there are several steps need to be traversed to get a code the skeleton. After the mined XML repository is built, searching is kind of easy with a particular user query statement. Here we get a set of fragment code into a method block after getting searched. The code snippets are stored in the Mined API XML repository according to their frequency, ranking, support, confidence and with necessary methods and fields snippets as shown in the Figure 4. To look for a suggestion developers have to type user query to find the required matching in the Mined API XML Query. It is found by querying the sequential pattern rules with the statement. An example of user query that the user is writing a class called SearchTest is shown in the Figure 3.

While he is typing the line as user query he wants sug- gestions from the ESDP repository. The certain statement he marks as user query will be sent to the Mined API XML Repository to find the match. Because the statements and the associated methods and attributes are already stored in the Mined API XML Repository with their support, confidence, rank and sequence number.

We use it as the input to query the relevant statement sequences. The example shows that several statements se- quences are ranked with their scores as shown in the Figure 5. If the developer choose the first match from the given

Abbildung in dieser Leseprobe nicht enthalten

Fig. 4. Match is stored in Mined API XML Repository Authors' own figure.

Abbildung in dieser Leseprobe nicht enthalten

Fig. 2. Enhancing Software Development Process (ESDP) Framework. Authors' own figure.

Abbildung in dieser Leseprobe nicht enthalten

Fig. 5. Suggestion after Sequential Pattern Mining in response to the user query, Authors' own figure.

Abbildung in dieser Leseprobe nicht enthalten

Fig. 3. A user query is marked when the developer is writing codes Authors' own figure.

TABLE III. SOURCE ABSTRACTION.

Abbildung in dieser Leseprobe nicht enthalten

recommendation then the associated code snippets belongs to the background of that match will be retrieved from the XML repository.

Then the recommended statements are updated and refac- tored as shown in the preview window of ESDP plug-in inside the extended InetllijIdea IDE

Abbildung in dieser Leseprobe nicht enthalten

IV. EXPERIMATAL EVALUATION

To evaluate performance and effectiveness of the ESDP repository system several open source projects have been applied it as shown in the Table III. The experiments were carried out in a computer with Windows 7 Operating System, Intel Core I 5 Processor, RAM of 4GB with 3G internet connection of local operator.

The table shows the number of files, methods, usage patterns and the prominent API used in that project. These projects are applied to build our Mined API XML Repository. Then the searching is experimented using user queries on ESDP, MAC and MAPO to check the response time. Here the Table IV and Figure 6 shows the comparisons that are the required time to respond among different tools. We can see ESDP takes quite lesser time than others.

[...]

Details

Pages
6
Year
2015
ISBN (eBook)
9783668098664
ISBN (Book)
9783668098671
File size
689 KB
Language
English
Catalog Number
v311128
Institution / College
International Islamic University – Islamic University of Technology
Grade
Excellent
Tags
esdp user recommendation respository enhancing software development process

Authors

Previous

Title: Using the Enhancing Software Development Process Repository for Better User Recommendations