Title: Towards an efficient mechanism of storage, search and retrieval software components
1Towards an efficient mechanism of storage, search
and retrieval software components
- Taciana Amorim Vanderlei
- tav_at_cin.ufpe.br
2Content
- Motivation
- Component Search
- Data Mining
- Knowledge Discovery and Data Mining in Databases
- Data Mining Reuse Patterns
- Context
- Context in Information Retrieval
- Proposal
- Conclusion
- References
3Motivation
What is software engineering area looking for?
Quality
Productivity (time)
Cost Reduction
Software Reuse!!!
4Motivation
- McIlroy, 1969
- Use instead of build
- Keep a set of components reused
- Repository
- Advantages
- Productivity
- Quality
- Manutenability
- Flexibility
5Motivation
6Motivation
Browsing
7Motivation
8Motivation
Searching
9Component Search
10Component Search Decade of 90
- Retrieval methods
- Enumerated classification
- Inflexibility
- Problems with understanding large hierarchies
- Faceted classification Prieto-Díaz,1991
- Flexible
- Precise
- Better suited for large, continuously expanding
collection - Hard for users to find the right combination of
terms - Free-text indexing (automatic indexing)
- Simple to build and retrieval
- Need large bodies of text to become statistically
accurate
11Component Search Decade of 90
- Controlled vocabulary
- Faceted classification
- Indexing
- Uncontrolled vocabulary
- Free-text indexing
12Component Search
- William Frakes e Thomas Pole, 1994
- Empiricist Study
- Hierarchic, Faceted, Attributes and Key-words
- Similar results
- Precision
- Recovery rate
Maybe more important than classification!!!
13Component Search
- Graph
- Draw a circle
- (x,y,radius,intern radius)
14Component Search
- Problems with repository of software reuse
- Costly classification
- Domain analysis efforts.
- S. Henninger, 1997
- Utilizes minimal repository structure
- Improves the repository while people use it
(evolutive repository) - Best results.
15Component Search
- Questioning
- Seacord, 1999
- Repository is failing
- Centralized repositories
- Limited accessibility and scalability of the
repository - Exclusive control over cataloged components
- Oppressive bureaucracy
- Poor economy of scale.
- Few users
- Low per-user benefits
- High cost of repository mechanisms and
operations.
16Component Search
- Ideally
- Distributed repositories
- Open network
- R. Seacord, S. Hissan, K. Wallnau, 1998
- Agora
- A search engine for software components
- Combines introspection with Web search engines
reduce costs
Reuse in large scale!!!
17Component Search
- P. Hall, 1999
- Large repositories are not necessary.
- Bass et al., 2000
- Report CMU/SEI
- Inhibitors for software engineering based in
components still not had been spread - Fault of available components
- Difficulty in find them
18Component Search
- J. Grundy, 2000
- Concept of component aspects
- Formalism
- Mili et al.,1997 Zhuge, 2000
- Key-words, text
- Ambiguity
- Low precision
- Formal definition
- Functionality
- Semantic
19Component Search
- Y. Ye and G. Fischer, 2002
- Challenges faced by software developers
- How to motivate them to reuse?
- How to reduce the difficulty of locating
components from a large reuse repository? - CodeBroker
- Information delivery that autonomously locates
and presents software developers with
task-relevant and personalized components.
Active repository!!!
20Component Search
- Y. Ye and G. Fischer, 2002
21Component Search
- Clark, Clarke, De Panfilis, Granatella,
Predonzani, Sillitti, Succi and Vernazza, 2004 - CLARiFi
- Find a set of components that cover the
functional, non-functional, technological,
environmental, and compatibility requirements of
the desired system - Best compromise (better queries).
- D. Lucrédio, E. Almeida, and A. Prado, 2004
- Mechanism to efficiently search components
- Offer support for future component markets.
22Component Search
23Data Mining
24Knowledge Discovery and Data Mining in Databases
- V. Devedzic, 2001
- Knowledge Discovery in Databases (KDD) is the
process of automatic discovery of previously
unknown patterns, rules, and other regular
contents implicitly present in large volumes of
data. - Data Mining (DM) is the process of pattern
discovery in a data set from which noise has been
previously eliminated and which has been
transformed in such a way to enable the pattern
discovery process.
25Knowledge Discovery and Data Mining in Databases
- KDD and DM
- Provide an automation of data analysis tasks.
- Improve their business in terms of savings,
efficiency, quality, and simplicity. - Not general-purpose software systems - developed
for specific users to help them automate data
analysis in precisely defined, specific
application domains. - Data Mining Algorithms
- Procedures for pattern extraction from a set of
cleaned, preprocessed, transformed data. - No such a thing as a universally good DM
algorithm.
26Data Mining Reuse Patterns
- A. Michail, 1999
- Show how data mining can be used to discover
library reuse patterns in user-selected
applications. - Association rules
- Competitive advantage in a business.
Application
Library
27Data Mining Reuse Patterns
- A. Michail, 2000
- Taxonomies
- Mean is-a hierarchies where a nodes
descendents represent specializations of that
node - Mine for rules at different levels of
abstraction. - Generalized association rules
- Incorporating taxonomies
- Reuse relationships considered
- Class inheritance (class_inheritspclass)
- Class instantiation (class_instantiatespclass)
- Function invocation (class_callspclassfunc())
- Function overriding (class_overridespclassfunc
()) - Implicit invocation (class_receives_signalpcla
sssignal()).
28Data Mining Reuse Patterns
- A. Michail, 2000
- Global Pruning - applied to all rules discovered
by data mining. - Uninteresting Rules
- class_callslibAf() gt class_instantiateslib
A - Statistically Insignificant Rules
- X gt Y is not very interesting if X and Y just
happen to co-occur in transactions by chance. - Misleading Rules
- xy gt z with confidence level 60
- y gt z with confidence level 80
Misleading since the presence of x actually
decreases the likelihood of finding the item z.
29Data Mining Reuse Patterns
- A. Michail, 2000
- Local Pruning
- Rules that demonstrate reuse of a particular
library class. - Rules that are violated in a particular
application. - A. Michail, 2001
- CodeWeb alleviates problems with learning
characteristic library usage. - By using many real-life applications instead of a
few toy programs - By using automated techniques
- By leveraging existing applications and using
data mining technology.
Show the more specific rule which tends to be
more informative. However, may also wish to show
the more general rule if its confidence is much
greater than expected.
30Data Mining Reuse Patterns
KApplication reuse patterns A. Michail, 2000
31Data Mining Reuse Patterns
- R. Amin, M. Ó Cinnéide, and T. Veale, 2004
- LASER project
- Apply lexically-driven Analogy at the code level,
rather than at the UML-level - Demonstrate that both hierarchical reuse and
parallel reuse can be enhanced through the use of
lexically-driven Analogy - Yet only a pilot study using Java language.
- F. McCarey, M. Ó Cinnéide, and N. Kushmerick,
2004 - Software recommendation system based on
collaborative filtering. - Identification of similar users to the active
user.
32Data Mining Reuse Patterns
- Y. Yusof and O. Rana, 2004
- Digital library of Java source code
- Managed collection of information (in digital
format) with associated services. - Template mining for extracting information
- Matching mechanism
- Exact
- Generalization
- Reduction
- NameOnly.
33Data Mining Reuse Patterns
- P. Garg, T. Gschwind, and K. Inoue, 2004
- Multi-project software knowledge
- Collect and organize large amounts of data, from
tens of thousands of software projects. - Automation
- Cluster related components together
- Rank the components by their popularity among
software systems - Automatic substitution of components if and when
a problem is discovered in one of the
implementations.
34Context
35Context
- Definitions
- In HCI, a context feature is any information that
can be used to characterize and interpret the
situation in which a user interacts with an
application at a certain time. - In context-aware applications area, Dey and Abowd
(1998) define context as any information that
characterizes a situation related to the
interaction between humans, applications and the
surrounding environment. - In Artificial Intelligence, Brézillon (1999)
defines context as what does not intervene
explicitly in a problem solving but constrains
it.
36Context
- Informational perspective
- Information space that can be adapted according
to the users context - Complemented by tools/processes
- Construction and sharing of context
- Device perspective
- Concerns how devices can be made
- To gather, integrate and display information that
assists human work - Infrastructural view
- Computing device with information about its
environment
37Context
- Context may be
- Explicit
- Information retrieved directly
- Physical information
- Implicit
- Inferred from either the explicit context or from
the interactions between the system and the user - Context-based systems
- At the level of the knowledge
- Focus on context in relationships with the user
- Context-aware systems
- At the level of the data
- Users through a modeling of their dynamic
environment.
38Context
- P. Brézillon, 2003
- Context related to human-machine interaction
- Need to focus on users
- Reducing communication barriers
- What can be known about a user?
- How to support that information with task, user,
and system models? - SART project Brézillon et al., 2000
- Design and the development of a Context-based
Intelligent Assistant System (CIAS) - Formalism for the context-based representation of
knowledge - Contextual graphs
- Environment is not taken into account
39Context in Information Retrieval
- Lets introduce that
- A large number of recently proposed search
enhancement tools have utilized the notion of
context. - Improved relevance of search results even for
users not skilled in Web search. - Natural language processing techniques to the
captured context - Context to guide the search constitutes a
considerable algorithmic challenge.
40Context in Information Retrieval
- L. Finkelstein, E. Gabrilovich, Y. Matias, E.
Rivlin, Z. Solan, G. Wolfman and E. Ruppin, 2001 - Conceptual paradigm for performing search in
context - Automates the search process
- Providing even non-professional users with highly
relevant results. - Performs context search from documents on users
computers. - Semantic keyword extraction and clustering to
automatically generate new, augmented queries - Guiding users search by the context surrounding
the text - Eliminates possible semantic ambiguity and
vagueness
41Context in Information Retrieval
Intelligent Query Generation
Semantic Analysis
_____ ______ ______ _ _ __ __ __ ___ _ _____
____ _____ _____ ______ _____ ____
_______ __ _______ _____ ___ __
Query Dispatcher
Capture Text and context
Final Results
IntelliZap system overview information and
processing flow
42Context in Information Retrieval
- Y. Ye and G. Fischer, 2001
- Locating software from a large component
repository - Context-aware browsing
- Active repository
- CodeBroker
- Run in the background
- Infers software developers needs for components
- Increases the opportunity of component reuse
- Discover components not anticipated
- Free-text information retrieval techniques and
signature matching to retrieve task-relevant
components
43Context in Information Retrieval
Advantages Disadvantages
Browsing Low cognitive overheads. Does not scale up.
Searching Fast, direct. Formulating the right query is difficult. No search for unanticipated components.
Context-Aware browsing Supports information delivery. Difficulty in understanding the context.
Comparison of locating mechanisms Y. Ye and G.
Fischer, 2001
44Context in Information Retrieval
- B. L. Doan and P. Brézillon, 2004
- Suggest recommendations for helping the search
tools to decrease noise and silence. - Define the context in Information Retrieval on
the Web as the sum of the following contexts - User and its environment
- Information provided to the IRS (documents and
authors) - IRS
- Interactions between the user and the IRS.
- Improve the efficiency of the IRSs gt make
explicit the context the query belongs to
45Proposal
- Search Engine for software components
- Passive repository x Active repository
- Effective in promoting reuse
- Distributed repositories and Open network
- Security
- Electronic commerce
- Quality assurance
- Explicit context
- Context-based systems and Context-aware systems
- Automation of the reuse process
- Iterative process to find the best compromise
(better queries) - Cluster related components together
- Rank the components by their popularity among
software systems
46Conclusion
- For a reuse library tool to be successful, the
cost of reusing has to be perceived by potential
reusers as being significantly less than that of
developing from scratch Scott N. Woodfield,
David W. Embley, and Del T. Scott, 1987, and the
cost of performing searches is only one of
several costs associated with an instance of
reuse H. Mili, F. Mili, and A. Mili, 1995. - Components alone are not enough
To reuse a software component, you first have to
find it.
47Conclusion
- Although it is widely believed that software
reuse improves both the quality and productivity
of software development V. Basili, L. Briand,
and W. Melo, 1996, systematic reuse has not yet
met its expected success H. Zhuge, 2000. - One of the reasons for the historical failure of
components repositories comes from their
conception as centralized systems, but it started
to be changed Agora system R. Seacord, S.
Hissan, K. Wallnau, 1998.
48Conclusion
- Experimental results testify that using context
to guide search effectively offers even
inexperienced users an advanced search tool on
the Web. - An improvement of users support needs a
consideration of context. - The modeling, representation and use of context
appear to be the challenge of the coming years,
especially when face very complex problems, large
knowledge bases and multimedia. - Using context for search is not a new idea. The
problem is that everyone defines context a little
differently.
49References
- Retrieval References
- Banker, 1993 R. D. Banker, R. J. Kauffman, and
D. Zweig, Repository Evaluation of Software
Reuse, IEEE Transactions on Software Engineering,
Vol.19, No. 4, April, 1993, pp. 379-389. - Caldiera, 1991 G. Caldiera and V. Basili,
Identifying and Qualifying Reusable Software
Components, IEEE Computer, Vol. 24, No. 2,
February, 1991, pp. 6171. - Clark, 2004 J. Clark, C. Clarke, S. De
Panfilis, G. Granatella, P. Predonzani, A.
Sillitti, G. Succi and T. Vernazza, Selecting
components in large COTS repositories, Journal of
Systems and Software, Vol. 73, 2004, pp. 323-331.
50References
- Retrieval References
- Frakes, 1994 W. B. Frakes and T. P. Pole, An
Empirical Study of Representation Methods for
Reusable Software Componente, IEEE Transactions
on Software Engineering, Vol. 20, No. 8, August,
1994, pp.617-630. - Frakes, 2004 W. B. Frakes, A Case Study of a
Reusable Component Collection in the Information
Retrieval Domain, Journal of Systems and
Software, Vol. 72, No. 2, July, 2004, pp.
265-270. - Grundy, 2000 J. Grundy, Storage and retrieval
of Software Components using Aspects, in 2000
Australasian Computer Science Conference,
Canberra, Australia, 2000, IEEE CS Press, pp.
95-103. - Guo, 2000 J. Guo and Luqi, A Survey of Software
Reuse Repositories, in 7th IEEE International
Conference and Workshop on the Engineering of
Computer Based Systems, Edinburgh, Scotland,
April, 2000, pp. 92-100. - Hall, 1999 P. Hall, Architecture-driven
Component Reuse, Information and Software
Technology, Vol. 41, No. 14, November, 1999, pp.
963-968.
51References
- Retrieval References
- Henninger, 1994 S. Henninger, Using Iterative
Refinement to Find Reusable Software, IEEE
Software, Vol. 11, No. 5, September, 1994, pp.
48-59. - Henninger, 1997 S. Henninger, An Evolutionary
Approach to Constructing Effective Software Reuse
Repositories, ACM Transactions on Software
Engineering and Methodology (TOSEM), Vol. 6, No.
2, April, 1997, pp.111-140. - Inoue, 2003 K. Inoue, R. Yokomori, H. Fujiwara,
T. Yamamoto, M. Matsushita, S. Kusumoto,
Component Rank Relative Significance Rank for
Software Component Search, 25th International
Conference on Software Engineering (ICSE2003),
Portland, Oregon, U.S.A., May, 2003, pp. 14-24. - Isakowitz, 1996 T. Isakowitz and R. J.
Kauffman, Supporting Search for Reusable Software
Objects, IEEE Transactions on Software
Engineering, Vol. 22, No. 6, July, 1996, pp.
407-423. - Lucrédio, 2004 D. Lucrédio, E. S. Almeida and
A. F. Prado, A Survey on Software Components
Search and Retrieval, in the 30th IEEE EUROMICRO
Conference, Component-Based Software Engineering
Track, 2004, Rennes - France. IEEE Press. 2004.
52References
- Retrieval References
- Maarek, 1991 Y.S. Maarek, D. M. Berry, and G.E.
Kaiser, An Information Retrieval Approach for
Automatically Constructing Software Libraries,
IEEE Transactions on Software Engineering, Vol.
17, No. 8, August, 1991, pp. 800-813. - McIlroy, 1968 M. D. McIlroy, mass produced
software components, In NATO Software Engineering
Conference, 1968, pp. 138-155. - Mili, 1997 H. Mili, E. Ah-Ki, R. Godin and H.
Mcheick, Another nail to the coffin of faceted
controlled-vocabulary component classification
and retrieval, ACM SIGSOFT Software Engineering
Notes, Vol. 22, No. 3, May, 1997, pp. 89-98. - Neighbors, 1996 J. M. Neighbors, Finding
Reusable Software Components in Large Systems,
3rd Working Conference on Reverse Engineering
(WCRE '96), Monterey, CA, November, 1996, pp. 2.
- Podgurski, 1993 A. Podgurski, and L. Pierce,
Retrieving Reusable Software by Sampling
Behavior, ACM Transaction on Software Engineering
and Methodology, Vol. 2, No. 3, July, 1993, pp.
286-303.
53References
- Retrieval References
- Prieto-Diaz, 1987 R. Prieto-Diaz and P.
Freeman, Classifying Software for Reusability,
IEEE Software, Vol. 4, No. 1, January, 1987, pp.
6-16. - Prieto-Diaz, 1991 R. Prieto-Diaz, Implementing
Faceted Classification for Software Reuse,
Communications of the ACM, Vol. 34, No. 5, May,
1991, pp. 88-97. - Seacord, 1998 R. C. Seacord, S. A. Hissan and
K. C. Wallnau, Agora A Search Engine for
Software Components, IEEE Internet Computing,
Vol. 2, No. 6, November/December, 1998, pp.
62-70. - Seacord, 1999 R. C. Seacord, Software
Engineering component repositories, in
International Workshop on Component-Based
Software Engineering, Held in conjunction with
the 21st International Conference on Software
Engineering (ICSE), Los Angeles, CA, USA, 1999. - Ye, 2002 Y. Ye and G. Fischer, Supporting Reuse
by Delivering Task-Relevant and Personalized
Information, in ICSE 2002 24th International
Conference on Software Engineering, Orlando,
Florida, USA, 2002, pp. 513-523.
54References
- Retrieval References
- Zaremski, 1995 A. M. Zaremski and J. M. Wing,
Signature matching a tool for using software
libraries, ACM Transactions on Software
Engineering and Methodology (TOSEM), Vol. 4, No.
2, April, 1995, pp. 146-170. - Zhuge, 2000 H. Zhuge, A problem-oriented and
rule-based component repository, Journal of
Systems and Software, Vol. 50, No. 3, March,
2000, pp. 201-208. - Data Mining
- Michail, 1999 A. Michail, Data mining library
reuse patterns in user-selected applications, in
14th IEEE International Conference on Automated
Software Engineering, 1999. - Michail , 2000 A. Michail, Data Mining Library
Reuse Patterns using Generalized Association
Rules, in 22nd International Conference on
Software Engineering, 2000. - Michail, 2001 A. Michail, Code Web Data Mining
Library Reuse Patterns, in 23rd International
Conference on Software Engineering, July 2001,
pp. 827-828.
55References
- Data Mining
- Devedzic, 2001 V. Devedzic, Knowledge Discovery
and Data Mining in Databases, In Chang, S.K.
(ed.), Handbook of Software Engineering and
Knowledge Engineering, Vol.1 - Fundamentals,
World Scientific Publishing Co., Singapore, 2001,
pp. 615-637. - Amin, 2004 R. Amin, M. Ó Cinnéide, and T.
Veale, LASER A Lexical Approach to Analogy in
Software Reuse, In the International Workshop on
Mining Software Repositories, Edinburgh, 2004. - McCarey, 2004 F. McCarey, M. Ó Cinnéide, and N.
Kushmerick, A Case Study on Recommending Reusable
Software Components using Collaborative
Filtering, In International Workshop on Mining
Software Repositories, Edinburgh, 2004. - Yusof, 2004 Y. Yusof and O. Rana, Template
Mining in Source-Code Digital Libraries, In the
International Workshop on Mining Software
Repositories at IEEE International Conference on
Software Engineering, Edinburgh, Scotland, May
2004. - Garg, 2004 P. Garg, T. Gschwind, and K. Inoue,
Multi-Project Software Engineering An Example,
In the International Workshop on Mining Software
Repositories, Edinburgh, May 2004.
56References
- Context References
- Brézillon P. Brézillon, Context in problem
solving a survey, The Knowledge Engineering
Review, Vol. 14, No. 1, May, 1999, pp.47-80. - Finkelstein, 2001 L. Finkelstein, E.
Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G.
Wolfman, and E. Ruppin, Placing Search in
Context The Concept Revisited, In Proceedings of
the Tenth International World Wide Web
Conference, Hong Kong, May, 2001. - Ye, 2001 Y. Ye and G. Fischer, Context-Aware
Browsing of Large Component Repositories,
Proceedings of 16th International Conference on
Automated Software Engineering (ASE'01), Coronado
Island, CA, November, 2001, pp.99-106. - Brézillon, 2003 P. Brézillon, Using context for
Supporting Users Efficiently, Proceedings of the
36th Hawaii International Conference on Systems
Sciences, HICSS-36, Track "Emerging
Technologies", R.H. Sprague (Ed.), Los Alamitos
IEEE, CD-Rom, January, 2003, pp. 127-135. - Doan, 2004 B. L. Doan and P. Brézillon, How the
notion of context can be useful to search tools,
In World Conference "E-learn 2004", Washington,
DC, USA, November, 2004.
57Towards an efficient mechanism of storage, search
and retrieval software components