A Ranking Algorithm for Semantic search engine - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

A Ranking Algorithm for Semantic search engine

Description:

A Ranking Algorithm for Semantic search engine spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars. – PowerPoint PPT presentation

Number of Views:312
Avg rating:3.0/5.0
Slides: 24
Provided by: Sal698
Category:

less

Transcript and Presenter's Notes

Title: A Ranking Algorithm for Semantic search engine


1
A Ranking Algorithm for Semantic search engine
spam and fake detection case study
  • By Soheila Dehghanzadeh.
  • Web technology lab weekly seminars.

2
Agenda
  • Web spam definition
  • A brief overview of Search engines
  • Search engine phases
  • Crawling
  • Indexing
  • Index lookup
  • Ranking lookup results
  • My proposed ranking algorithm ?

3
Web spam and fake
  • In web of data anyone is able to say anything
    about anything.
  • Low quality data should not be mentioned in top
    search results.

4
A Search Engine
5
A Search Engine
6
web of data vs. web of documents.
  • WODocNo link type and no trustworthiness (just
    popularity).
  • WOData should consider link type and link
    context (for provenance and proof of trust).

7
Crawling Indexing phase
  • Using ldspider to crawl linked data.
  • Using hexastore for complete indexing the crawled
    data. Special thanks to Panagiotis Karras for
    providing hexastore implementation in python.

8
Index lookup results for extensionsome Results
may not include keyword but they have high
quality and relevance.Result expansion to hide
the locality effect. Some sites is referred many
times but in this special context other
professional sites lookup results are more
interested.
Web of data
result
Crawler
Raw rdf
lookup
ranking
indexing
index
9
HexaStore
  • Index structure that we use in our search engine.
  • Each RDF element type deserve to have special
    index structure build round it.
  • Every possible ordering of the importance or
    precedence of the three elements in the indexing
    scheme is materialized.
  • Each index structure in a hexastore centers
    around one RDF element and defines a
    perioritation between the other 2 elements.

10
Sample spo indexing in a hexastore
Si
P(I,1) P(I,2) P(I,Ni)
O(i2,1)
O(i2,2)





O(i2,ki2)
O(iNi,1)
O(iNi,2)





O(iNi,kiNi)
O(i1,1)
O(i1,2)





O(i1,ki1)
Space complexity Sposppsosopospo
11
My idea!
  • Import the base result set to jena and extend it.
  • Extending the base set with ontology reasoning
    rules so that extra resources and relations will
    be added through reasoning rules.
  • The added resources
  • The added relation has no context so their
    trustworthiness is an aggregation function on
    (x,y,rule) relations---
  • Resources will be added only through sameAs
    predicate
  • Resources will be ranked according to relevance
    to query terms (using ontobroker pagerank
    objectrank- triplerank HITS,.)
  • Query
  • Keyword query
  • Structured query
  • Ontology based query (using an interface to get
    query) - ontobroker
  • Relation (properties) will be ranked according to
    contexts(provenance) using relation ranking
    methods such as semRank or we can look at
    contexts pageRank.
  • Note that First we rank resources and second we
    rank relations . However it depends on the user
    query whether it is looking for relations or
    resources.

12
  • Lookup on quads for keyword (Soheila)
  • Q1 http//um.s11,givenname,Soheila,UM
  • Q2http//NIOC/p25,fullname,Soheila
    Dehghan,NIOC
  • Q3http//nigc-khrz/e66,firstname,Soheila,NIGC
  • Q4 http//fake/f4,name,Soheila,fake

Cheese
B.Gates
Scott
Buy(Spam)
Meet(CNN)
Dancewith(FK)
SA(FB)
SA(UM)
http//facebook/u122
http//isport /us122
Q1
SA(UM)
SA(LI)
SA(NIGC)
Q3
Q2
http//linkedIn/u12
SA(FK)
SA(FK)
SA(FK)
Q4
13
Result set expansion methods
  • step1 using sameas predicate on found Qaudes and
    extend ResultSet to Q1,,Qr
  • index LookUp
  • Q1(S),SameAs,?,??Qr(S),SameAs,?,?
  • ?,SameAs,Q1(S),?? ?,SameAs,Qr(S),?
  • (Q1,,Q4?Q1,,Q4,FBURI,LinkedInURI,isportURI) in
    our case.
  • apply PR on Extended graph with SameAs which
    SameAs links are replaced with PR weight of
    sameAs context.(to know the trustwothiness of
    each contexts).

14
Result set expansion methods
  • Step2 LookUp all properties of Q1(s),,Qr(s)
  • Q1(s),?,?,??,?,Q1(s),?
  • Qr(s),?,?,??,?,Qr(s),?
  • Step4 add inferred relation using domain
    ontology(context is composed of
    ontologyinference process)
  • Step4 rank Q1,,Qr according to their TpageRank
    (computed online from graph of step1 ), rank
    relations according to their context
    pageRank(which is computed by Google offline)
  • Note contexts who has PR lower than a
    treshhold wont be mentioned.they maybe Spam or
    Fake Sites.

15
Structured query on quads indexes
  • Single pivot
  • (S,?,?,?),(?,p,?,?),(?,?,o.?),(?,?,?.C)
  • Double pivot
  • (S,p,?,?),(s,?,O,?),(s,?,?,C),(?,P,O,?),(?,P,?,C),
    (?,?,o,C)
  • Triple pivot
  • (s,p,o,?),(s,p,?,c),(s,?,o,c),(?,p,o,c)

16
  • Step1 if the specified parts was URI then a
    direct lookup is performed by search engine .
    Otherwise if user have specified keyword for each
    parts then firstly a keyword search will be done
    and then for each result URI a lookup will be
    performed.

17
Lookup on quads for ontological queries
18
DehghanZadeh (GAS)
GAS
Worked at(GAS)
OwlsameAs(FUM)
Sally(NIOC)
OwlsameAs(NIOC)
Soheila(FUM)
played in(NIOC)
Studied in(FUM)
NIOC Team
FUM
Supervisor (FUM)
Kahani(FUM)
19
Related works for ranking web of data
  • Objectrank
  • Ding
  • Sindice ti-idf
  • EntityRank.
  • semRank
  • ReConRank
  • ontobroker

20
Proof of trust ?
  • Jena inference Explanation will be used to
    represent as a proof of trust

21
Evaluation
  • Compare Spam ranks
  • Compare query time
  • Compare index size

22
Any question?
23
Best things in the life are free. Thanks for
attention.
Write a Comment
User Comments (0)
About PowerShow.com