Title: Component Search and Retrieval
1Component Search and Retrieval
Advanced Reuse Seminars Eduardo Cruz
2Information Retrieval - 1948
- Structured Documents
- Unstructured Documents
- No software documentation standard
- Semi-Structured Documents
Calvin Northrup Mooers
3Mooers' Law An information retrieval
system will tend not to be used whenever it is
more painful and troublesome for a customer to
have information than for him not to have it,
1959
Calvin Northrup Mooers
4Mass Production Software components
Mcllroy, 1968
5- software industry is weakly founded,
- and that one aspect of this weakness
- is the absence of a
- software components subindustry
- McIlroy, 1968
6- The storage and retrieval of software assets
- is nothing but a specialized form of
- information storage and retrieval
- Mili, 1998
7Software Library
- Browsing Inspecting without a predefined
criterion - Retrieval Satisfy a predefined matching
criterion
8Classification Scheme
- Facet-based
- Better than hierarchical classification
- Manual classification different facets
- Automatic classification
- Controlled Vocabulary
- Semantic information
- Uncontrolled Vocabulary
- Big software libraries
- Little or no descriptors
9Recall and Precision
- High Precision Most retrieved elements are
relevant - High Recall Few elements left behind
- Spreading Activation (Relaxed Search) Related
matches are retrieved - Coverage The average number of assets that are
visited over the total size of the library
10Asset Representation
- Library representation is made in full knowledge
of the artifact. User representation is made in
ignorance of the artifact - Asset representation is purposefully abstract to
capture important features while overlooking
miner or irrelevant details - Asset's surrogate is used in retrieval literature
11Asset retrieval Goals
- Exact retrieval Black box reuse
- Approximate retrieval White box reuse
- Generative modification Reusing the design
- Compositional modification using building
blocks of the retrieved asset
12Usually non included information
- Interface description
- Non-functional requirements
- Interoperability
13Situational Model x System Model
Component retrieval model Lucrédio et. al, 2004
14- Repository representation is made
- in full knowledge of the artifact at hand
- User representation is made
- in ignorance of the artifact
- Mili, 1998
15Scott Henninger
16Tools
17Component Search Tools
- Web
- Delphi Search Engine
- Ispey
- CSourceSearch.net (2004)
- Gonzui
- SourceBank
- Koders (2004)
- Codase (2005)
- Aplications
- Agora (1998)
- Codebroker (2002)
- Koders Enterprise (2004)
- Maracatu (2005)
18(No Transcript)
19Delphi Search Engine
20Ispey.com
21SPARS-J (2003)
22SourceBank
Filter
23CSourceSearch.Net (2004)
24Koders.com (2004)
25CODASE Launched Sep 9, 2005
Multiple Search Options
Example Searches
Browsing
based on the number of people in your company,
starting from 5,000 USD
26CODASE - Browsing
27Other Tools
28(No Transcript)
29AGORA - Location and Indexing (1998)
INTERNET
INDEX
AltaVistaSearchIndex Server
Filter
AltaVista Query Server
Web Server
30Component Rank (1998)
0.2
0.4
0.2
V1
V2
D12 0.5
D13 0.5
0.2
D23 1
0.2
0.4
V3
Nodes v Edges e Graph G Weight w Distribution
Ratio d
D31 1
0.4
31- Classes defining data structures and their
containers are highly ranked
32Clustered Component Graph
V7
V3
V26
V14
V5
V1 V4 , V2 V6
33- NO MORE
- MULTIPLE
- DISCONNECTED
- COMPONENTS
34Component Rank System Architecture
.java file component
INPUT
(1) Similarity Measurement
(3) Use Relation Extraction
(2) Clustering
(4) Component Graph Construction
(6) De-Clustering to Original Component Graph
(5) Component Rank Computation by Repetition
OUTPUT
Order of Weights Component Rank of .java files
35Simple Copied Components
1/4
1/4
Copied Components
Other Components
1/4
1/4
Clustering Before Weight Computation
1/6
1/3
Non-clustered component Graph
1/6
1/3
Clustering After Weight Computation
36- DO NOT COUNT
- SIMPLY DUPLICATED
- COMPONENTS
37Copied AND MODIFIED Components
1/5
2/5
A
Copied and Modified Components
Other Components
Original Components
B
C
1/5
1/5
1/5
Clustering Before Weight Computation
1/5
1/3
A
Non-clustered component Graph
B
C
1/6
1/6
1/6
Clustering Before Weight Computation
38Beyond Searching and Browsing
- Searching and browsing
- Require users to initiate the information seeking
process - Information access and Information Delivery
39CodeBroker (2001)
- Components repositories are often so large that
software developers cannot learn about all of the
components - Component repositories are not static
- New components added
- Old components updated
- Context-Aware browsing
40- May not have suficient knowledge about the reuse
repository - May perceive that reuse costs more than
developing from scratch - May not be able to use the repository by
formulating a proper query - May not be able to understand the found components
41Information Islands
L4 Entire Information Space
Belief
Vaguely Known
Well Known
Unknown components
42CodeBroker
L4 Entire Information Space
L3 Belief
L2 Vaguely Known
L1 Well Known
Information Use L1 Use by Memory L2 Use by
Recall L3 Use by Anticipation L4 Use by
Delivery
Already Known Components
Task Relevant Information
Irrelevant Components
43Program Aspects
- Concept
- Formal
- Informal
- Indentation, comments, identifier names
(semantic) - Executability
- Code
- Constraint environment
- Signature
44Information delivery
- Feedback
- After execution of the action
- Feedforward
- Affects the execution of the action
45Information delivery
- Interruptive
- Noninterruptive
46Latent Semantic Analysis (LSA)
- Synonymy
- Polysemy
- Text documents and queries are represented as
vectors in the semantic space, based on the words
contained and the similarity between a query and
a document is determined by the distance of their
respective vectors
47(No Transcript)
48Comments
signature
Discourse model
User model
49Koders Enterprise (2004)
50M.A.R.A.C.A.T.U. Modern Architecture for
Retrieving All Components At The Universe (2005)
51Using Structural Context to Recommend Source
Code Examples
- Reid Holmes and Gail C. Murphy
University of British ColumbiaSoftware Practices
Lab
52The Problem A Concrete Example
- Frameworks can improve developer productivity.
But developers can become stuck trying to use the
APIs - Imagine trying to use the Eclipse APIs to place
text in the status line of the Eclipse IDE - Eclipse has 38,000 public methods
53Using Structural Context to Recommend Source
Code Examples - Reid Holmes and Gail C. Murphy
Project Repository
Development Environment
54Strathcona Extract Structural Context
ViewPart
55Strathcona Example Navigation
- Visual representation
- Highlights key relationships between example and
query - Multiple examples can be quickly viewed
56Strathcona Viewing Example Source
- Code view
- Example shows how to get a status line manager
- Example is not a perfect match, but good enough
to help
57Conclusion
- Information Delivery
- Similarity Analyser
- Ranking Metrics
- Context
- Automatic Facet Classification
- Uncontrolled vocabulary additional terms
58References
- McIlroy, 1968 M. D. McIlroy, Mass Produced
Software Components , NATO Software Engineering
Conference Report, Garmisch, Germany, October,
1968, pp. 79-85. - Mili, 1998 A. Mili, R. Mili, R. T. Mittermeir,
A survey of software reuse libraries, Annals of
Software Engineering, Vol. 5, 1998, pp. 349-414 - Seacord, 1998 Robert C. Seacord, Scott A.
Hissam, Kurt C. Wallnau. "Agora A Search Engine
for Software Components," IEEE Internet
Computing, vol. 02, no. 6, pp. 62-70,
November/December, 1998 - Szyperski, 1999 Szyperski C., Component
Software Beyond Object-Oriented Programming.
Addison Wesley, 1999 - Dey, 2001 Dey, A.. Understanding and Using
Context. Personal Ubiquitous Comput. 5, 1 (Jan.
2001) - Greengrass, 2001 Greengrass, Ed. Information
retrieval A survey. DOD Technical Report
TR-R52-008-001, 2001 - Ye, 2001 Ye, Y. and Fischer, G. Context-Aware
Browsing of Large Component Repositories. In
Proceedings of the 16th IEEE international
Conference on Automated Software Engineering
(November 26 - 29, 2001). ASE. IEEE Computer
Society, Washington, DC, 99. - Ye, 2002 Y. Yunwen and G. Fischer. Information
delivery in support of learning reusable software
components on demand. In Proceedings of the 7th
international conference on Intelligent user
interfaces, California, USA - Ye, 2002 Ye, Y. and Fischer, G. Supporting
Reuse by Delivering Task Relevant and
Personalized Information. In Proceedings of the
24th International Conference on Software
Engineering. p. 513-523, Orlando, Florida, May,
2002
59Bibliography
- Inoue, 2003 K. Inoue et al. "Component Rank
Relative Significance Rank for Software Component
Search", Proceedings of ICSE 2003 - Maxville, 2003 Valerie Maxville, Chiou Peng
Lam, Jocelyn Armarego. "Selecting Components a
Process for Context-Driven Evaluation," apsec, p.
456, 10th Asia-Pacific Software Engineering
Conference (APSEC'03), 2003 - Maxville, 2004 Valerie Maxville, Jocelyn
Armarego, Chiou Peng Lam. "Intelligent Component
Selection," compsac, pp. 244-249, 28th Annual
International Computer Software and Applications
Conference (COMPSAC'04), 2004. - Prado, 2004 Lucrédio, D. Almeida, E, S.
Prado, A, F. A Survey on Software Components
Search and Retrieval, In the 30th IEEE EUROMICRO
Conference, Component-Based Software Engineering
Track, 2004, Rennes - France. IEEE Press,2004 - Holmes, 2005 Holmes, R. and Murphy, G. C. 2005.
Using structural context to recommend source code
examples. In Proceedings of the 27th
international Conference on Software Engineering
(St. Louis, MO, USA, May 15 - 21, 2005). ICSE '05
60Imperfect technology in a working market is
sustainable perfect technology without any
market will vanish Szyperski, 1999