Towards an efficient mechanism of storage, search and retrieval software components

About This Presentation
Title:

Towards an efficient mechanism of storage, search and retrieval software components

Description:

Clique para editar o estilo do t tulo mestre. Clique para editar o estilo do ... CodeWeb alleviates problems with learning characteristic library usage. ... –

Number of Views:106
Avg rating:3.0/5.0
Slides: 58
Provided by: tacianaamo
Category:

less

Transcript and Presenter's Notes

Title: Towards an efficient mechanism of storage, search and retrieval software components


1
Towards an efficient mechanism of storage, search
and retrieval software components
  • Taciana Amorim Vanderlei
  • tav_at_cin.ufpe.br

2
Content
  • Motivation
  • Component Search
  • Data Mining
  • Knowledge Discovery and Data Mining in Databases
  • Data Mining Reuse Patterns
  • Context
  • Context in Information Retrieval
  • Proposal
  • Conclusion
  • References

3
Motivation
What is software engineering area looking for?
Quality
Productivity (time)
Cost Reduction
Software Reuse!!!
4
Motivation
  • McIlroy, 1969
  • Use instead of build
  • Keep a set of components reused
  • Repository
  • Advantages
  • Productivity
  • Quality
  • Manutenability
  • Flexibility

5
Motivation
6
Motivation
Browsing
7
Motivation
8
Motivation
Searching
9
Component Search
10
Component Search Decade of 90
  • Retrieval methods
  • Enumerated classification
  • Inflexibility
  • Problems with understanding large hierarchies
  • Faceted classification Prieto-Díaz,1991
  • Flexible
  • Precise
  • Better suited for large, continuously expanding
    collection
  • Hard for users to find the right combination of
    terms
  • Free-text indexing (automatic indexing)
  • Simple to build and retrieval
  • Need large bodies of text to become statistically
    accurate

11
Component Search Decade of 90
  • Controlled vocabulary
  • Faceted classification
  • Indexing
  • Uncontrolled vocabulary
  • Free-text indexing

12
Component Search
  • William Frakes e Thomas Pole, 1994
  • Empiricist Study
  • Hierarchic, Faceted, Attributes and Key-words
  • Similar results
  • Precision
  • Recovery rate

Maybe more important than classification!!!
13
Component Search
  • S. Henninger, 1994
  • Graph
  • Draw a circle
  • (x,y,radius,intern radius)
  • Ring
  • Tire
  • Car direction

14
Component Search
  • Problems with repository of software reuse
  • Costly classification
  • Domain analysis efforts.
  • S. Henninger, 1997
  • Utilizes minimal repository structure
  • Improves the repository while people use it
    (evolutive repository)
  • Best results.

15
Component Search
  • Questioning
  • Seacord, 1999
  • Repository is failing
  • Centralized repositories
  • Limited accessibility and scalability of the
    repository
  • Exclusive control over cataloged components
  • Oppressive bureaucracy
  • Poor economy of scale.
  • Few users
  • Low per-user benefits
  • High cost of repository mechanisms and
    operations.

16
Component Search
  • Ideally
  • Distributed repositories
  • Open network
  • R. Seacord, S. Hissan, K. Wallnau, 1998
  • Agora
  • A search engine for software components
  • Combines introspection with Web search engines
    reduce costs

Reuse in large scale!!!
17
Component Search
  • P. Hall, 1999
  • Large repositories are not necessary.
  • Bass et al., 2000
  • Report CMU/SEI
  • Inhibitors for software engineering based in
    components still not had been spread
  • Fault of available components
  • Difficulty in find them

18
Component Search
  • J. Grundy, 2000
  • Concept of component aspects
  • Formalism
  • Mili et al.,1997 Zhuge, 2000
  • Key-words, text
  • Ambiguity
  • Low precision
  • Formal definition
  • Functionality
  • Semantic

19
Component Search
  • Y. Ye and G. Fischer, 2002
  • Challenges faced by software developers
  • How to motivate them to reuse?
  • How to reduce the difficulty of locating
    components from a large reuse repository?
  • CodeBroker
  • Information delivery that autonomously locates
    and presents software developers with
    task-relevant and personalized components.
    Active repository!!!

20
Component Search
  • Y. Ye and G. Fischer, 2002

21
Component Search
  • Clark, Clarke, De Panfilis, Granatella,
    Predonzani, Sillitti, Succi and Vernazza, 2004
  • CLARiFi
  • Find a set of components that cover the
    functional, non-functional, technological,
    environmental, and compatibility requirements of
    the desired system
  • Best compromise (better queries).
  • D. Lucrédio, E. Almeida, and A. Prado, 2004
  • Mechanism to efficiently search components
  • Offer support for future component markets.

22
Component Search
23
Data Mining
24
Knowledge Discovery and Data Mining in Databases
  • V. Devedzic, 2001
  • Knowledge Discovery in Databases (KDD) is the
    process of automatic discovery of previously
    unknown patterns, rules, and other regular
    contents implicitly present in large volumes of
    data.
  • Data Mining (DM) is the process of pattern
    discovery in a data set from which noise has been
    previously eliminated and which has been
    transformed in such a way to enable the pattern
    discovery process.

25
Knowledge Discovery and Data Mining in Databases
  • KDD and DM
  • Provide an automation of data analysis tasks.
  • Improve their business in terms of savings,
    efficiency, quality, and simplicity.
  • Not general-purpose software systems - developed
    for specific users to help them automate data
    analysis in precisely defined, specific
    application domains.
  • Data Mining Algorithms
  • Procedures for pattern extraction from a set of
    cleaned, preprocessed, transformed data.
  • No such a thing as a universally good DM
    algorithm.

26
Data Mining Reuse Patterns
  • A. Michail, 1999
  • Show how data mining can be used to discover
    library reuse patterns in user-selected
    applications.
  • Association rules
  • Competitive advantage in a business.

Application
Library
27
Data Mining Reuse Patterns
  • A. Michail, 2000
  • Taxonomies
  • Mean is-a hierarchies where a nodes
    descendents represent specializations of that
    node
  • Mine for rules at different levels of
    abstraction.
  • Generalized association rules
  • Incorporating taxonomies
  • Reuse relationships considered
  • Class inheritance (class_inheritspclass)
  • Class instantiation (class_instantiatespclass)
  • Function invocation (class_callspclassfunc())
  • Function overriding (class_overridespclassfunc
    ())
  • Implicit invocation (class_receives_signalpcla
    sssignal()).

28
Data Mining Reuse Patterns
  • A. Michail, 2000
  • Global Pruning - applied to all rules discovered
    by data mining.
  • Uninteresting Rules
  • class_callslibAf() gt class_instantiateslib
    A
  • Statistically Insignificant Rules
  • X gt Y is not very interesting if X and Y just
    happen to co-occur in transactions by chance.
  • Misleading Rules
  • xy gt z with confidence level 60
  • y gt z with confidence level 80

Misleading since the presence of x actually
decreases the likelihood of finding the item z.
29
Data Mining Reuse Patterns
  • A. Michail, 2000
  • Local Pruning
  • Rules that demonstrate reuse of a particular
    library class.
  • Rules that are violated in a particular
    application.
  • A. Michail, 2001
  • CodeWeb alleviates problems with learning
    characteristic library usage.
  • By using many real-life applications instead of a
    few toy programs
  • By using automated techniques
  • By leveraging existing applications and using
    data mining technology.

Show the more specific rule which tends to be
more informative. However, may also wish to show
the more general rule if its confidence is much
greater than expected.
30
Data Mining Reuse Patterns
KApplication reuse patterns A. Michail, 2000
31
Data Mining Reuse Patterns
  • R. Amin, M. Ó Cinnéide, and T. Veale, 2004
  • LASER project
  • Apply lexically-driven Analogy at the code level,
    rather than at the UML-level
  • Demonstrate that both hierarchical reuse and
    parallel reuse can be enhanced through the use of
    lexically-driven Analogy
  • Yet only a pilot study using Java language.
  • F. McCarey, M. Ó Cinnéide, and N. Kushmerick,
    2004
  • Software recommendation system based on
    collaborative filtering.
  • Identification of similar users to the active
    user.

32
Data Mining Reuse Patterns
  • Y. Yusof and O. Rana, 2004
  • Digital library of Java source code
  • Managed collection of information (in digital
    format) with associated services.
  • Template mining for extracting information
  • Matching mechanism
  • Exact
  • Generalization
  • Reduction
  • NameOnly.

33
Data Mining Reuse Patterns
  • P. Garg, T. Gschwind, and K. Inoue, 2004
  • Multi-project software knowledge
  • Collect and organize large amounts of data, from
    tens of thousands of software projects.
  • Automation
  • Cluster related components together
  • Rank the components by their popularity among
    software systems
  • Automatic substitution of components if and when
    a problem is discovered in one of the
    implementations.

34
Context
35
Context
  • Definitions
  • In HCI, a context feature is any information that
    can be used to characterize and interpret the
    situation in which a user interacts with an
    application at a certain time.
  • In context-aware applications area, Dey and Abowd
    (1998) define context as any information that
    characterizes a situation related to the
    interaction between humans, applications and the
    surrounding environment.
  • In Artificial Intelligence, Brézillon (1999)
    defines context as what does not intervene
    explicitly in a problem solving but constrains
    it.

36
Context
  • Informational perspective
  • Information space that can be adapted according
    to the users context
  • Complemented by tools/processes
  • Construction and sharing of context
  • Device perspective
  • Concerns how devices can be made
  • To gather, integrate and display information that
    assists human work
  • Infrastructural view
  • Computing device with information about its
    environment

37
Context
  • Context may be
  • Explicit
  • Information retrieved directly
  • Physical information
  • Implicit
  • Inferred from either the explicit context or from
    the interactions between the system and the user
  • Context-based systems
  • At the level of the knowledge
  • Focus on context in relationships with the user
  • Context-aware systems
  • At the level of the data
  • Users through a modeling of their dynamic
    environment.

38
Context
  • P. Brézillon, 2003
  • Context related to human-machine interaction
  • Need to focus on users
  • Reducing communication barriers
  • What can be known about a user?
  • How to support that information with task, user,
    and system models?
  • SART project Brézillon et al., 2000
  • Design and the development of a Context-based
    Intelligent Assistant System (CIAS)
  • Formalism for the context-based representation of
    knowledge
  • Contextual graphs
  • Environment is not taken into account

39
Context in Information Retrieval
  • Lets introduce that
  • A large number of recently proposed search
    enhancement tools have utilized the notion of
    context.
  • Improved relevance of search results even for
    users not skilled in Web search.
  • Natural language processing techniques to the
    captured context
  • Context to guide the search constitutes a
    considerable algorithmic challenge.

40
Context in Information Retrieval
  • L. Finkelstein, E. Gabrilovich, Y. Matias, E.
    Rivlin, Z. Solan, G. Wolfman and E. Ruppin, 2001
  • Conceptual paradigm for performing search in
    context
  • Automates the search process
  • Providing even non-professional users with highly
    relevant results.
  • Performs context search from documents on users
    computers.
  • Semantic keyword extraction and clustering to
    automatically generate new, augmented queries
  • Guiding users search by the context surrounding
    the text
  • Eliminates possible semantic ambiguity and
    vagueness

41
Context in Information Retrieval
Intelligent Query Generation
Semantic Analysis
_____ ______ ______ _ _ __ __ __ ___ _ _____
____ _____ _____ ______ _____ ____
_______ __ _______ _____ ___ __
Query Dispatcher
Capture Text and context
Final Results
IntelliZap system overview information and
processing flow
42
Context in Information Retrieval
  • Y. Ye and G. Fischer, 2001
  • Locating software from a large component
    repository
  • Context-aware browsing
  • Active repository
  • CodeBroker
  • Run in the background
  • Infers software developers needs for components
  • Increases the opportunity of component reuse
  • Discover components not anticipated
  • Free-text information retrieval techniques and
    signature matching to retrieve task-relevant
    components

43
Context in Information Retrieval
Advantages Disadvantages
Browsing Low cognitive overheads. Does not scale up.
Searching Fast, direct. Formulating the right query is difficult. No search for unanticipated components.
Context-Aware browsing Supports information delivery. Difficulty in understanding the context.
Comparison of locating mechanisms Y. Ye and G.
Fischer, 2001
44
Context in Information Retrieval
  • B. L. Doan and P. Brézillon, 2004
  • Suggest recommendations for helping the search
    tools to decrease noise and silence.
  • Define the context in Information Retrieval on
    the Web as the sum of the following contexts
  • User and its environment
  • Information provided to the IRS (documents and
    authors)
  • IRS
  • Interactions between the user and the IRS.
  • Improve the efficiency of the IRSs gt make
    explicit the context the query belongs to

45
Proposal
  • Search Engine for software components
  • Passive repository x Active repository
  • Effective in promoting reuse
  • Distributed repositories and Open network
  • Security
  • Electronic commerce
  • Quality assurance
  • Explicit context
  • Context-based systems and Context-aware systems
  • Automation of the reuse process
  • Iterative process to find the best compromise
    (better queries)
  • Cluster related components together
  • Rank the components by their popularity among
    software systems

46
Conclusion
  • For a reuse library tool to be successful, the
    cost of reusing has to be perceived by potential
    reusers as being significantly less than that of
    developing from scratch Scott N. Woodfield,
    David W. Embley, and Del T. Scott, 1987, and the
    cost of performing searches is only one of
    several costs associated with an instance of
    reuse H. Mili, F. Mili, and A. Mili, 1995.
  • Components alone are not enough

To reuse a software component, you first have to
find it.
47
Conclusion
  • Although it is widely believed that software
    reuse improves both the quality and productivity
    of software development V. Basili, L. Briand,
    and W. Melo, 1996, systematic reuse has not yet
    met its expected success H. Zhuge, 2000.
  • One of the reasons for the historical failure of
    components repositories comes from their
    conception as centralized systems, but it started
    to be changed Agora system R. Seacord, S.
    Hissan, K. Wallnau, 1998.

48
Conclusion
  • Experimental results testify that using context
    to guide search effectively offers even
    inexperienced users an advanced search tool on
    the Web.
  • An improvement of users support needs a
    consideration of context.
  • The modeling, representation and use of context
    appear to be the challenge of the coming years,
    especially when face very complex problems, large
    knowledge bases and multimedia.
  • Using context for search is not a new idea. The
    problem is that everyone defines context a little
    differently.

49
References
  • Retrieval References
  • Banker, 1993 R. D. Banker, R. J. Kauffman, and
    D. Zweig, Repository Evaluation of Software
    Reuse, IEEE Transactions on Software Engineering,
    Vol.19, No. 4, April, 1993, pp. 379-389.
  • Caldiera, 1991 G. Caldiera and V. Basili,
    Identifying and Qualifying Reusable Software
    Components, IEEE Computer, Vol. 24, No. 2,
    February, 1991, pp. 6171.
  • Clark, 2004 J. Clark, C. Clarke, S. De
    Panfilis, G. Granatella, P. Predonzani, A.
    Sillitti, G. Succi and T. Vernazza, Selecting
    components in large COTS repositories, Journal of
    Systems and Software, Vol. 73, 2004, pp. 323-331.

50
References
  • Retrieval References
  • Frakes, 1994 W. B. Frakes and T. P. Pole, An
    Empirical Study of Representation Methods for
    Reusable Software Componente, IEEE Transactions
    on Software Engineering, Vol. 20, No. 8, August,
    1994, pp.617-630.
  • Frakes, 2004 W. B. Frakes, A Case Study of a
    Reusable Component Collection in the Information
    Retrieval Domain, Journal of Systems and
    Software, Vol. 72, No. 2, July, 2004, pp.
    265-270.
  • Grundy, 2000 J. Grundy, Storage and retrieval
    of Software Components using Aspects, in 2000
    Australasian Computer Science Conference,
    Canberra, Australia, 2000, IEEE CS Press, pp.
    95-103.
  • Guo, 2000 J. Guo and Luqi, A Survey of Software
    Reuse Repositories, in 7th IEEE International
    Conference and Workshop on the Engineering of
    Computer Based Systems, Edinburgh, Scotland,
    April, 2000, pp. 92-100.
  • Hall, 1999 P. Hall, Architecture-driven
    Component Reuse, Information and Software
    Technology, Vol. 41, No. 14, November, 1999, pp.
    963-968.

51
References
  • Retrieval References
  • Henninger, 1994 S. Henninger, Using Iterative
    Refinement to Find Reusable Software, IEEE
    Software, Vol. 11, No. 5, September, 1994, pp.
    48-59.
  • Henninger, 1997 S. Henninger, An Evolutionary
    Approach to Constructing Effective Software Reuse
    Repositories, ACM Transactions on Software
    Engineering and Methodology (TOSEM), Vol. 6, No.
    2, April, 1997, pp.111-140.
  • Inoue, 2003 K. Inoue, R. Yokomori, H. Fujiwara,
    T. Yamamoto, M. Matsushita, S. Kusumoto,
    Component Rank Relative Significance Rank for
    Software Component Search, 25th International
    Conference on Software Engineering (ICSE2003),
    Portland, Oregon, U.S.A., May, 2003, pp. 14-24.
  • Isakowitz, 1996 T. Isakowitz and R. J.
    Kauffman, Supporting Search for Reusable Software
    Objects, IEEE Transactions on Software
    Engineering, Vol. 22, No. 6, July, 1996, pp.
    407-423.
  • Lucrédio, 2004 D. Lucrédio, E. S. Almeida and
    A. F. Prado, A Survey on Software Components
    Search and Retrieval, in the 30th IEEE EUROMICRO
    Conference, Component-Based Software Engineering
    Track, 2004, Rennes - France. IEEE Press. 2004.

52
References
  • Retrieval References
  • Maarek, 1991 Y.S. Maarek, D. M. Berry, and G.E.
    Kaiser, An Information Retrieval Approach for
    Automatically Constructing Software Libraries,
    IEEE Transactions on Software Engineering, Vol.
    17, No. 8, August, 1991, pp. 800-813.
  • McIlroy, 1968 M. D. McIlroy, mass produced
    software components, In NATO Software Engineering
    Conference, 1968, pp. 138-155.
  • Mili, 1997 H. Mili, E. Ah-Ki, R. Godin and H.
    Mcheick, Another nail to the coffin of faceted
    controlled-vocabulary component classification
    and retrieval, ACM SIGSOFT Software Engineering
    Notes, Vol. 22, No. 3, May, 1997, pp. 89-98.
  • Neighbors, 1996 J. M. Neighbors, Finding
    Reusable Software Components in Large Systems,
    3rd Working Conference on Reverse Engineering
    (WCRE '96), Monterey, CA, November, 1996, pp. 2.
  • Podgurski, 1993 A. Podgurski, and L. Pierce,
    Retrieving Reusable Software by Sampling
    Behavior, ACM Transaction on Software Engineering
    and Methodology, Vol. 2, No. 3, July, 1993, pp.
    286-303.

53
References
  • Retrieval References
  • Prieto-Diaz, 1987 R. Prieto-Diaz and P.
    Freeman, Classifying Software for Reusability,
    IEEE Software, Vol. 4, No. 1, January, 1987, pp.
    6-16.
  • Prieto-Diaz, 1991 R. Prieto-Diaz, Implementing
    Faceted Classification for Software Reuse,
    Communications of the ACM, Vol. 34, No. 5, May,
    1991, pp. 88-97.
  • Seacord, 1998 R. C. Seacord, S. A. Hissan and
    K. C. Wallnau, Agora A Search Engine for
    Software Components, IEEE Internet Computing,
    Vol. 2, No. 6, November/December, 1998, pp.
    62-70.
  • Seacord, 1999 R. C. Seacord, Software
    Engineering component repositories, in
    International Workshop on Component-Based
    Software Engineering, Held in conjunction with
    the 21st International Conference on Software
    Engineering (ICSE), Los Angeles, CA, USA, 1999.
  • Ye, 2002 Y. Ye and G. Fischer, Supporting Reuse
    by Delivering Task-Relevant and Personalized
    Information, in ICSE 2002 24th International
    Conference on Software Engineering, Orlando,
    Florida, USA, 2002, pp. 513-523.

54
References
  • Retrieval References
  • Zaremski, 1995 A. M. Zaremski and J. M. Wing,
    Signature matching a tool for using software
    libraries, ACM Transactions on Software
    Engineering and Methodology (TOSEM), Vol. 4, No.
    2, April, 1995, pp. 146-170.
  • Zhuge, 2000 H. Zhuge, A problem-oriented and
    rule-based component repository, Journal of
    Systems and Software, Vol. 50, No. 3, March,
    2000, pp. 201-208.
  • Data Mining
  • Michail, 1999 A. Michail, Data mining library
    reuse patterns in user-selected applications, in
    14th IEEE International Conference on Automated
    Software Engineering, 1999.
  • Michail , 2000 A. Michail, Data Mining Library
    Reuse Patterns using Generalized Association
    Rules, in 22nd International Conference on
    Software Engineering, 2000.
  • Michail, 2001 A. Michail, Code Web Data Mining
    Library Reuse Patterns, in 23rd International
    Conference on Software Engineering, July 2001,
    pp. 827-828.

55
References
  • Data Mining
  • Devedzic, 2001 V. Devedzic, Knowledge Discovery
    and Data Mining in Databases, In Chang, S.K.
    (ed.), Handbook of Software Engineering and
    Knowledge Engineering, Vol.1 - Fundamentals,
    World Scientific Publishing Co., Singapore, 2001,
    pp. 615-637.
  • Amin, 2004 R. Amin, M. Ó Cinnéide, and T.
    Veale, LASER A Lexical Approach to Analogy in
    Software Reuse, In the International Workshop on
    Mining Software Repositories, Edinburgh, 2004.
  • McCarey, 2004 F. McCarey, M. Ó Cinnéide, and N.
    Kushmerick, A Case Study on Recommending Reusable
    Software Components using Collaborative
    Filtering, In International Workshop on Mining
    Software Repositories, Edinburgh, 2004.
  • Yusof, 2004 Y. Yusof and O. Rana, Template
    Mining in Source-Code Digital Libraries, In the
    International Workshop on Mining Software
    Repositories at IEEE International Conference on
    Software Engineering, Edinburgh, Scotland, May
    2004.
  • Garg, 2004 P. Garg, T. Gschwind, and K. Inoue,
    Multi-Project Software Engineering An Example,
    In the International Workshop on Mining Software
    Repositories, Edinburgh, May 2004.

56
References
  • Context References
  • Brézillon P. Brézillon, Context in problem
    solving a survey, The Knowledge Engineering
    Review, Vol. 14, No. 1, May, 1999, pp.47-80.
  • Finkelstein, 2001 L. Finkelstein, E.
    Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G.
    Wolfman, and E. Ruppin, Placing Search in
    Context The Concept Revisited, In Proceedings of
    the Tenth International World Wide Web
    Conference, Hong Kong, May, 2001.
  • Ye, 2001 Y. Ye and G. Fischer, Context-Aware
    Browsing of Large Component Repositories,
    Proceedings of 16th International Conference on
    Automated Software Engineering (ASE'01), Coronado
    Island, CA, November, 2001, pp.99-106.
  • Brézillon, 2003 P. Brézillon, Using context for
    Supporting Users Efficiently, Proceedings of the
    36th Hawaii International Conference on Systems
    Sciences, HICSS-36, Track "Emerging
    Technologies", R.H. Sprague (Ed.), Los Alamitos
    IEEE, CD-Rom, January, 2003, pp. 127-135.
  • Doan, 2004 B. L. Doan and P. Brézillon, How the
    notion of context can be useful to search tools,
    In World Conference "E-learn 2004", Washington,
    DC, USA, November, 2004.

57
Towards an efficient mechanism of storage, search
and retrieval software components
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com