Title: Building a rich ontology from AGROVOC
1Building a rich ontology from AGROVOC
- Dagobert Soergel
- College of Information Studies, University of
Maryland - dsoergel_at_umd.edu, www.dsoergel.com
FAO Agricultural Ontology Server
Workshop Beijing, April 27 - 29, 2004
2The problem
- AI and Semantic Web applications need
full-fledged ontologies that support reasoning - Constructing such ontologies is expensive
- While existing KOS do not provide the full set of
precise concept relationships needed for
reasoning,existing KOS, both large and small,
represent much intellectual capital KOS
Knowledge Organization System - How can this intellectual capital be put to use
in constructing full-fledged ontologies - Specifically From AGROVOC to a full-fledged
Food and Agriculture Ontology
3Some applications of a Food and Agriculture
Ontology
- Advice on crops and crop management
(fertilization, irrigation) - Advice on pest management
- Tracking contaminants through the food chain
- Advice on safe food processing
- Computing nutrition labels
- Advice on healthy eating
- Improved searching
4AGROVOC relationships compared with more
differentiated relationships of a Food and
Agriculture Ontology
5AGROVOC Food and Agriculture Ontology
Undifferentiated hierarchical relationships milk NT cow milk NT milk fat cows NT cow milk Cheddar cheese BT cow milk Differentiated relationships milk ltincludesSpecificgt cow milk ltcontainsSubstancegt milk fat cows lthasComponentgt cow milk Cheddar cheese ltmadeFromgt cow milk
Rule 1 Part X ltmayContainSubstancegt Substance Y IF Animal W lthasComponentgt Part X AND Animal W ltingestsgt Substance Y Rule 2 Food Z ltcontainsSubstancegt Substance Y IF Food Z ltmadeFromgt Part X AND Part X ltcontainsSubstancegt Substance Y
6From AGROVOC to FA Ontology
- Define the FA Ontology structure
- Fill in values from AGROVOC to the extent
possible - Edit manually with computer assistanceusing the
rules-as-you go approach andan ontology editor - make existing information more precise
- add new information
7Define ontology structureOverall model
8Relationships between Relationships
Relationships between concepts
Concept
Relationship
annotation relationship
designated by
Relationships between terms
Lexicalization/ Term
Other information language/culture subvocabulary/
scope audience type, etc.
manifested as
Relationships between strings
String
9Define ontology structureRelationship types
10Isa Isa
Relationship Inverse relationship
X ltincludesSpecificgt X ltinheritsTogt Y Y ltisagt X Y ltinheritsFromgt X
11Holonymy / meronymy (the generic whole-part relationship) Holonymy / meronymy (the generic whole-part relationship)
Relationship Inverse relationship
X ltcontainsSubstancegt Y X lthasIngredientgt Y X ltmadeFromgt Y X ltyieldsPortiongt Y X ltspatiallyIncludesgt Y X lthasComponentgt Y X ltincludesSubprocessgt Y X lthasMembergt Y Y ltsubstanceContainedIngt X Y ltingredientOfgt X Y ltusedToMakegt X Y ltportionOfgt X Y ltspatiallyIncludedIngt X Y ltcomponentOfgt X Y ltsubprocessOfgt X Y ltmemberOfgt X Y
12Further relationship examples Further relationship examples
Relationship Inverse relationship
X ltcausesgt Y X ltinstrumentForgt Y X ltprocessForgt Y X ltbeneficialForgt Y X lttreatmentForgt Y X ltharmfulForgt Y X lthasPestgt Y X ltgrowsIngt Y X lthasPropertygt Y X lthasSymptomgt Y X ltsimilarTogt Y X ltoppositeTogt Y X lthasPhasegt Y X ltingestsgt Y X ltmadeFromgt Y Y ltcausedBygt X Y ltperformedByInstrumentgt X Y ltusesProcessgt X Y ltbenefitsFromgt X Y lttreatedWithgt X Y ltharmedBygt X Y ltafflictsgt X Y ltgrowthEnvironmentForgt X Y ltpropertyOfgt X Y ltindicatesgt X Y ltsimilarTogt X Y ltoppositeTogt X Y ltphaseOfgt X Y ltingestedBygt X Y ltusedToMakegt X
13Fill in values from AGROVOC
- Fill in values from AGROVOC to the extent
possible - Arrange in structured sequence (to the extent
possible based on the information in AGROVOC) to
facilitate editing(The editor can deal with
similar problems at the same time.)
14Undifferentiated relationships from AGROVOC Edited relationships
milk NT cow milk milk NT goat milk milk NT buffalo milk milk NT milk fat milk RT milk protein milk RT lactose cows RT cow milk goats RT goat milk ewes RT ewe milk goat milk RT goat cheese ewe milk RT ewe cheese acid soils BT chemical soil types acrisols BT genetic soil types alkaline soils BT chemical soil types aluvial soils BT lithological soil types chemical soil types BT soil types Cichorium BT Asteraceae Cichorium endivia BT Cichorium Cichorium intybus BT Cichorium Cichorium intybus RT coffee substitutes Cichorium intybus RT root vegetables blood NT blood protein blood NT blood lipids
15Edit manually with computer assistance
- Use the rules-as-you-go approach andgood
ontology editing software that handles large
ontologies efficiently - make existing information more precise
- add new information
- Assumption
- Entity types of concepts are known from AGROVOC
or other sources (Langual, UMLS, WordNet) for
example - milk fat is a Substance
- Asteraceae is a taxon
- The editor may need to determine the entity type
16The rules-as-you-go approachExploit patterns to
automate the conversion processExample
- 1. An editor has determined that
- milk NT cow milk should become milk
ltincludesSpecificgt cow milk - She recognizes that this is an example of the
general pattern milk NT milk ? milk
ltincludesSpecificgt milk (where is the
wildcard character) - Given this pattern, the system can derive
automatically - milk NT goat milk should become milk
ltincludesSpecificgt goat milk -
- Result
17Undifferentiated relationships from AGROVOC Edited relationships
milk NT cow milk milk NT goat milk milk NT buffalo milk milk NT milk fat milk RT milk protein milk RT lactose cow RT cow milk goats RT goat milk ewes RT ewe milk goat milk RT goat cheese ewe milk RT ewe cheese acid soils BT chemical soil types acrisols BT genetic soil types alkaline soils BT chemical soil types aluvial soils BT lithological soil types chemical soil types BT soil types Cichorium BT Asteraceae Cichorium endivia BT Cichorium Cichorium intybus BT Cichorium Cichorium intybus RT coffee substitutes Cichorium intybus RT root vegetables blood NT blood protein blood NT blood lipids milk ltincludesSpecificgt cow milk milk ltincludesSpecificgt goat milk milk ltincludesSpecificgt buffalo milk
18The rules as you go approachExploit patterns to
automate the conversion process
- 1. Editor milk NT milk fat ? milk
ltcontainsSubstancegt milk fat - Pattern Substance NT/RT Substance ?
Substance ltcontainsSubstancegt Substance - Thereforemilk RT milk protein ? milk
ltcontainsSubstancegt milk protein - Result
19Undifferentiated relationships from AGROVOC Edited relationships
milk NT cow milk milk NT goat milk milk NT buffalo milk milk NT milk fat milk RT milk protein milk RT lactose cows RT cow milk goats RT goat milk ewes RT ewe milk goat milk RT goat cheese ewe milk RT ewe cheese acid soils BT chemical soil types acrisols BT genetic soil types alkaline soils BT chemical soil types aluvial soils BT lithological soil types chemical soil types BT soil types Cichorium BT Asteraceae Cichorium endivia BT Cichorium Cichorium intybus BT Cichorium Cichorium intybus RT coffee substitutes Cichorium intybus RT root vegetables blood NT blood protein blood NT blood lipids milk ltincludesSpecificgt cow milk milk ltincludesSpecificgt goat milk milk ltincludesSpecificgt buffalo milk milk ltcontainsSubstancegt milk fat milk ltcontainsSubstancegt milk protein milk ltcontainsSubstancegt lactose goat milk ltcontainsSubstancegt goat cheese ewe milk ltcontainsSubstancegt ewe cheese blood ltcontainsSubstancegt blood protein blood ltcontainsSubstancegt blood lipids
20The rules as you go approachExploit patterns to
automate the conversion process
- 1. Editor
- cows RT cow milk ? cows lthasComponentgt cow milk
- Pattern Animal RT BodyPart ? Animal
lthasComponentgt BodyPart - Therefore
- goats NT goat milk ? goat lthasComponentgt goat
milk -
- Result
21Undifferentiated relationships from AGROVOC Edited relationships
milk NT cow milk milk NT goat milk milk NT buffalo milk milk NT milk fat milk RT milk protein milk RT lactose cow RT cow milk goats RT goat milk ewes RT ewe milk goat milk RT goat cheese ewe milk RT ewe cheese acid soils BT chemical soil types acrisols BT genetic soil types alkaline soils BT chemical soil types aluvial soils BT lithological soil types chemical soil types BT soil types Cichorium BT Asteraceae Cichorium endivia BT Cichorium Cichorium intybus BT Cichorium Cichorium intybus RT coffee substitutes Cichorium intybus RT root vegetables blood NT blood protein blood NT blood lipids milk ltincludesSpecificgt cow milk milk ltincludesSpecificgt goat milk milk ltincludesSpecificgt buffalo milk milk ltcontainsSubstancegt milk fat milk ltcontainsSubstancegt milk protein milk ltcontainsSubstancegt lactose cows lthasComponentgt cow milk goats lthasComponentgt goat milk ewes lthasComponentgt ewe milk goat milk ltcontainsSubstancegt goat cheese ewe milk ltcontainsSubstancegt ewe cheese blood ltcontainsSubstancegt blood protein blood ltcontainsSubstancegt blood lipids
22The rules as you go approachExploit patterns to
automate the conversion process
- 1. Editor
- acid soils BT chemical soil types ? acid soils
ltisagt chemical soil types - Pattern X BT type ? X ltisagt type
- Therefore
- acrisols BT genetic soil types ? acrisols ltisagt
genetic soil types -
- Result
23Undifferentiated relationships from AGROVOC Edited relationships
milk NT cow milk milk NT goat milk milk NT buffalo milk milk NT milk fat milk RT milk protein milk RT lactose cow RT cow milk goats RT goat milk ewes RT ewe milk goat milk RT goat cheese ewe milk RT ewe cheese acid soils BT chemical soil types acrisols BT genetic soil types alkaline soils BT chemical soil types aluvial soils BT lithological soil types chemical soil types BT soil types Cichorium BT Asteraceae Cichorium endivia BT Cichorium Cichorium intybus BT Cichorium Cichorium intybus RT coffee substitutes Cichorium intybus RT root vegetables blood NT blood protein blood NT blood lipids milk ltincludesSpecificgt cow milk milk ltincludesSpecificgt goat milk milk ltincludesSpecificgt buffalo milk milk ltcontainsSubstancegt milk fat milk ltcontainsSubstancegt milk protein milk ltcontainsSubstancegt lactose cows lthasComponentgt cow milk goats lthasComponentgt goat milk ewes lthasComponentgt ewe milk goat milk ltcontainsSubstancegt goat cheese ewe milk ltcontainsSubstancegt ewe cheese acid soils ltisagt chemical soil types acrisols ltisagt genetic soil types alkaline soils ltisagt chemical soil types aluvial soils ltisagt lithological soil types chemical soil type ltisagt soil types blood ltcontainsSubstancegt blood protein blood ltcontainsSubstancegt blood lipids
24The rules as you go approachExploit patterns to
automate the conversion process
- 1. EditorCichorium BT Asteraceae ?
Cichorium ltisagt Asteraceae - Pattern Taxon BT Taxon ? Taxon ltisagt Taxon
- Therefore
- Cichorium endivia BT Cichorium ? Cichorium
endivia ltisagt Cichorium -
- Result
25Undifferentiated relationships from AGROVOC Edited relationships
milk NT cow milk milk NT goat milk milk NT buffalo milk milk NT milk fat milk RT milk protein milk RT lactose cow RT cow milk goats RT goat milk ewes RT ewe milk goat milk RT goat cheese ewe milk RT ewe cheese acid soils BT chemical soil types acrisols BT genetic soil types alkaline soils BT chemical soil types aluvial soils BT lithological soil types chemical soil types BT soil types Cichorium BT Asteraceae Cichorium endivia BT Cichorium Cichorium intybus BT Cichorium Cichorium intybus RT coffee substitutes Cichorium intybus RT root vegetables blood NT blood protein blood NT blood lipids milk ltincludesSpecificgt cow milk milk ltincludesSpecificgt goat milk milk ltincludesSpecificgt buffalo milk milk ltcontainsSubstancegt milk fat milk ltcontainsSubstancegt milk protein milk ltcontainsSubstancegt lactose cows lthasComponentgt cow milk goats lthasComponentgt goat milk ewes lthasComponentgt ewe milk goat milk ltcontainsSubstancegt goat cheese ewe milk ltcontainsSubstancegt ewe cheese acid soils ltisagt chemical soil types acrisols ltisagt genetic soil types alkaline soils ltisagt chemical soil types aluvial soils ltisagt lithological soil types chemical soil type ltisagt soil types Cichorium ltisagt Asteraceae Cichorium endivia ltisagt Cichorium Cichorium intybus ltisagt Cichorium blood ltcontainsSubstancegt blood protein blood ltcontainsSubstancegt blood lipids
26The rules as you go approachExploit patterns to
automate the conversion process
- 1. EditorCichorium intybus RT coffee
substitutes ? Cichorium intybus
ltusedToMakegt coffee substitutes - Pattern Taxon RT FoodProduct ? Taxon
ltusedToMakegt FoodProduct - ThereforeCichorium intybus RT root vegetables
- ? Cichorium intybus ltusedToMakegt root
vegetables -
- Result
27Undifferentiated relationships from AGROVOC Edited relationships
milk NT cow milk milk NT goat milk milk NT buffalo milk milk NT milk fat milk RT milk protein milk RT lactose cow RT cow milk goats RT goat milk ewes RT ewe milk goat milk RT goat cheese ewe milk RT ewe cheese acid soils BT chemical soil types acrisols BT genetic soil types alkaline soils BT chemical soil types aluvial soils BT lithological soil types chemical soil types BT soil types Cichorium BT Asteraceae Cichorium endivia BT Cichorium Cichorium intybus BT Cichorium Cichorium intybus RT coffee substitutes Cichorium intybus RT root vegetables blood NT blood protein blood NT blood lipids milk ltincludesSpecificgt cow milk milk ltincludesSpecificgt goat milk milk ltincludesSpecificgt buffalo milk milk ltcontainsSubstancegt milk fat milk ltcontainsSubstancegt milk protein milk ltcontainsSubstancegt lactose cows lthasComponentgt cow milk goats lthasComponentgt goat milk ewes lthasComponentgt ewe milk goat milk ltcontainsSubstancegt goat cheese ewe milk ltcontainsSubstancegt ewe cheese acid soils ltisagt chemical soil types acrisols ltisagt genetic soil types alkaline soils ltisagt chemical soil types aluvial soils ltisagt lithological soil types chemical soil type ltisagt soil types Cichorium ltisagt Asteraceae Cichorium endivia ltisagt Cichorium Cichorium intybus ltisagt Cichorium Cichorium intybus ltusedToMakegt coffee substitutes Cichorium intybus ltusedToMakegt root vegetables blood ltcontainsSubstancegt blood protein blood ltcontainsSubstancegt blood lipids
28The rules as you go approachDiscussion
- Main idea Formulate constraints to assist the
editor - Ontology may have many relationship types,
perhaps gt 100 - Constraints limit the relationship types that are
possible in a specific case show the editor only
these - If the constraints limit possible relationship
types to 1, conversion is automatic - Constraints may depend on Thesaurus to be
converted -
29Constraints
Thesaurus Relationships Possible ontology relationships
NT / BT lthasMembergt ltmemberOfgt ltincludesSpecificgt ltisagt lthasComponentgt ltcomponentOfgt ltspatiallyIncludesgt ltspatiallyIncludedIngt etc.
RT ltsimilarTogt ltsimilarTogt ltgrowsIngt ltEnvironmentForGrowinggt lttreatmentForgt lttreatedWithgt lthasMembergt ltmemberOfgt lthasComponentgt ltcomponentOfgt ltmadeFromgt ltusedToMakegt etc.
30Constraints
Thesaurus Relationships entity types or values Possible ontology relationships
milk NT milk Substance NT Substance X BT type Taxon BT Taxon GeogrEntity BT GeogrEntity BodyPart BT BodyPart ChemSubstance BT ChemSubstance milk ltincludesSpecificgt milk Substance ltcontainsSubstancegt Substance X ltisagt type Taxon ltisagt Taxon GeogrEntity ltspatiallyIncludedIngt GeogrEntity BodyPart ltisComponentOfgt BodyPart ChemSubstance ltisagt ChemSubstance
31Constraints
Thesaurus Relationships entity types or values Possible ontology relationships
Substance RT Substance LivingOrganism RT BodyPart Taxon RT FoodProduct GeogrEntity RT GeogrGrouping Process RT Object ChemSubstance RT Function Substance ltcontainsSubstancegt Substance Substance ltcontainedInSubstancegt Substance Substance ltusedToMakegt Substance Substance ltmadeFromgt Substance LivingOrganism lthasComponentgt BodyPart Taxon ltusedToMakegt FoodProduct GeogrEntity ltisMemberOfgt GeogrGrouping Process ltperformedByInstrumentgt Object Process ltaffectsgt Object ChemSubstance ltusedForgt Function
32Checking by editor
- Relationship instances created by editor by
selecting from a constraint-generated menuare
final - Relationship instances created automatically
must be presented to the editor - If the editor determines that the relationship
instances are almost always correct, she checks a
box accept without checking
33Overall conversion process
- One master editor must go through the file from
start to finish,processing the relationship
instances and creating patterns,creating new
relationship types as needed - Assistant editors can apply the patterns.
- In the first pass, the master editor should deal
with the easy cases. - Deal with the remaining cases later.Groups of
similar relationship instances can be seen more
easily in a smaller set
34Adding new relationship types and new
relationship instances
- AGROVOC does not contain all relationship types
or relationship instances for AI applications - Need to add data. For example
- Organism X lthasPestgt Organism Y
- ChemSubstance X ltactsAgainstgt Organism Y
- Organism X ltactsAgainstgt Organism Y
- Plant X ltgrowsIngt Environment Y
- FoodProduct X ltsuitableForgt Diet Y
35Conclusion
- The rules-as-you-go approach is a realistic
method for developing a rich ontology from an
existing thesaurus