SoQL - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

SoQL

Description:

SoQL A Language for Querying and Creating Data in Social Networks. Royi ... Sizable participant record. Proliferation to business and organizational cultures ... – PowerPoint PPT presentation

Number of Views:333
Avg rating:3.0/5.0
Slides: 20
Provided by: ResearchM53
Category:
Tags: soql | sizable

less

Transcript and Presenter's Notes

Title: SoQL


1
SoQL A Language for Querying and Creating Data
in Social Networks
  • Royi Ronen and Oded Shmueli
  • Technion Israel Institute of Technology

March 29th, 2009 M3SN, Shenghai, China
2
Introduction
  • As social networks become popular
  • A lot of data
  • Many participant
  • Many connections
  • Sizable participant record
  • Proliferation to business and organizational
    cultures
  • Many querying scenarios which can benefit from a
    domain-specific language
  • SoQL is proposed as a step in this direction

3
Example
1
Charlie
Bob
5
2
4
Alice
3
Eve
Dave
9
6
7
Frank
Gloria
TF(id,weight)
8
0.7 1
0.6 2
0.4 3
0.4 4
0.3 5
0.9 6
0.85 7
0.85 8
0.5 9
TN(name,company,e-mail,position,experience)
4 Manager alice_at_hal.com HAL Alice
3 Manager bob_at_acme.net ACME Bob
2 Engineer cha_at_cia.gov CIA Charlie
5 Teacher dave_at_mtv.com MTV Dave
6 Scientist eve_at_acme.net ACME Eve
7 Technician fr_at_hal.com HAL Frank
6 Producer glor_at_abc.org ABC Gloria
4
Bobs Information Needs
  • Bob works for ACME, and is looking for a job in
    HAL
  • Bob is looking for a path which connects him to a
    manager in HAL, which in addition
  • is at most 4 nodes long, and
  • does not have any participant, except for Bob,
    working for ACME
  • Results are to be ordered by the multiplication
    of weights along the path, excluding the first
    edge
  • Higher quality social paths

5
Bobs query
  • SELECT COUNT(PATH.nodes.), PATH
  • FROM PATH (Bob TO X AS P1 TO Y AS P2)
  • WHERE Y.company 'HAL' and
  • Y.position 'manager' and
  • ATMOST 0 IN P2.nodes SATISFY (company'ACME')
    and
  • COUNT(P1.nodes.) 2 and
  • COUNT(PATH.nodes.) lt 4
  • ORDER BY MULT(P2.edges.weight)

The Path
Conditions on attributes
Path Predicates
Aggregation Path predicates
6
Result
  • 4 (Bob, Dave, Gloria, Alice)
  • 3 (Bob, Charlie, Alice)
  • 4 (Bob, Dave, Eve, Alice)
  • Multiplication values are 0.765, 0.3, 0.16

7
Model
  • Undirected graph
  • Reciprocal friends model
  • Nodes and edges have attributes
  • New Data Types
  • Path An ordered set of distinct nodes, every
    two successive nodes are connected
  • Group A set of nodes

8
Model
  • Results are finite
  • Social networks are constantly growing
  • But finite at any point

9
Aggregation over Path/Group
  • Aggregation over path/group is possible
  • E.g., the number of nodes in path P1
  • SELECT COUNT() FROM P1.nodes
  • Or, as in the previous example
  • MULT(P2.edges.weight)

10
Path Predicates
  • ALL SATISFY (condition)
  • ATMOST n
  • ATLEAST n
  • ALL EXCEPT UPTO n
  • MAJORITY

11
Another information need
  • Bob would like to find a group such that
  • The group contains Bob and three others
  • There exists a path of up to three edges from Bob
    to each of the three
  • There exists a path of up to two edges between
    every two of the three
  • All three have experience gt 5

12
SELECT FROM GROUP
Group with Paths
SELECT GROUP FROM GROUP (Bob AS G1,
DISTINCT(X,Y,Z) AS G2) WITH PATH (Bob TO X AS
P1), PATH (Bob TO Y AS P2), PATH (Bob TO Z AS
P3) WHERE COUNT(P1.edges.)lt3
and COUNT(P2.edges.)lt3 and COUNT(P3.edges.)lt
3 and ALL IN G2.nodes SATISFY (experiencegt5)
and ALL SUBGROUPS(U,V) IN G2 SATISFY (PATH(U
TO V AS P4) COUNT(P4.edges.)lt2))
IN
Aggregation on paths
Group Predicate
Subgroups
and COUNT(GROUP.nodes.)lt5
13
Group Predicates
  • Group predicates refer to either
  • nodes in a group or
  • paths involving members of the group
  • When referring to nodes, operators are the same
    as for paths
  • ALL IN G2.nodes SATISFY (experiencegt5)
  • When referring to paths, as in
  • ALL SUBGROUPS(U,V) IN G2
    SATISFY(PATH(U TO V AS P4)
  • COUNT(P4.edges.)lt2)
  • operators are
  • ALL SUBGROUPS,
  • ATLEAST n SUBGROUPS,
  • ALL EXCEPT UPTO n SUBGROUPS,
  • MAJORITY SUBGROUPS

14
CONNECT
  • Let R be a one-column relation of paths
  • The paths are used for an automated process of
    referral intended to create a connection to the
    last node in the path

CONNECT USING PATH FROM R WHERE TIMEOUT36,
ATTEMPTS5, PARALLEL2, HISTORYtrue
15
CONNECT
  • Let R be a one-column relation of groups
  • An automated process will attempt
  • Form a group, like, e.g., Facebook, or
  • Create an edge between each pair in the group

CONNECT GROUP FROM R WHERE TIMEOUT48,
ATTEMPTS1, PARALLEL1
16
Implementation Issues
  • Path/Group sizes are not necessarily predefined
    or known a priori
  • Deployment parameters needed
  • Maximum tuples in a result (Googles 1k)
  • Maximal length of any path
  • Maximal size of any group
  • Time Limit

17
Finding paths
  • Top-k self joins can be used to avoid large
    intermediate results
  • In, e.g., distributed data, random walks can be
    used to extract candidates for paths in the
    result
  • At any point, if the path can not satisfy the
    query, the walk aborts
  • Many walking agents can provide a good
    approximation

18
Conclusions
  • SoQL is a domain-specific, SQL-like query
    language for the social networks domain
  • Creation of data is possible using the Path and
    the Group data types is possible
  • Future work
  • More expressive predicates, e.g., disjointness of
    two paths
  • Implementation
  • Advanced, optimized evaluation techniques for
    centralized and distributed environments

19
Thank You
Write a Comment
User Comments (0)
About PowerShow.com