Title: Query Processor
1Query Processor
- A query processor is a module in the DBMS that
performs the tasks to process, to optimize, and
to generate execution strategy for a high-level
query - For a DDBMS, the QP also does data localization
for the query based on the fragmentation scheme
and generates the execution strategy that
incorporates the communication operations
involved in processing the query
2Query Optimizer
- Queries expressed in SQL can have multiple
equivalent relational algebra query expressions - The distributed query optimizer must select the
ordering of relational algebra operations, sites
to process data, and possibly the way data should
be transferred. This makes distributed query
processing significantly more difficult
3Complexity of Relational Algebra Operations
- The relational algebra is used to express the
output of the query. The complexity of relational
algebra operations play a role in defining some
of the principles of query optimization. All
complexity measures are based on the cardinality
of the relation - Operations Complexity
- Select, Project (w/o duplicate elimination) O(n)
- Project (with duplicate elimination), Group O(n
logn) - Join, Semi-join, Division, Set Operators O(n
logn) - Cartesian Product O(n2 )
-
4Characteristics of Query Processors
- Languages
- Input language can be relational algebra or
calculus output language is relational algebra
(annotated with communication primitives). The
query processor must efficiently map input
language to output language - Types of Optimization
- The output language specification represents the
execution strategy. There can be many such
strategies, the best one can be selected through
exhaustive search, or by applying heuristic
(minimize size of intermediate relations). For
distributed databases semijoins can be applied
to reduce data transfer.
5When to Optimize
- Static done before executing the query (at
compilation time), cost of optimization amortized
over multiple executions, mostly based on
exhaustive search. Since sizes of intermediate
relations need to be estimated, it can result in
sub-optimal strategies. - Dynamic done at run time every time the query
is executed, can make use of exact sizes of
intermediate relations, expensive, based on
heuristics - Hybrid mixes static and dynamic approaches the
approach is mainly static, but dynamic query
optimization may take place when high difference
between predicted and actual sizes are detected
6Characteristics of Query Processors
- Statistics
- fragment cardinality and size
- size and number of distinct values for each
attribute. detailed histograms of attribute
values for better selectivity estimation. - Decision Sites
- one site or several sites participate in
selection of strategy - Exploitation of network topology
- wide area network communication cost
- local area network parallel execution
7Characteristics of Query Processors
- Exploitation of replicated fragments
- larger number of possible strategies
- Use of Semijoins
- reduce size of data transfer
- increase of messages and local processing
- good for fast or slow networks?
8Layers of Query Processing
Calculus Query on Distributed Relations
QUERY DECOMPOSITION
CONTROL SITE
Algebra Query on Distributed Relations
DATA LOCALIZATION
Fragment Query
GLOBAL OPTIMIZATION
Optimized Fragment Query With Communication
Operations
LOCAL SITE
LOCAL OPTIMIZATION
Optimized Local Queries
9Query Decomposition
- Normalization
- The calculus query is written in a normalized
form (CNF or DNF) for subsequent manipulation - Analysis
- The query is analyzed for semantic correctness
- Simplification
- Redundant predicates are eliminated to obtain
simplified queries - Restructuring
- The calculus query is translated to optimal
algebraic query representation
10Query Decomposition Normalization
- Lexical and syntactic analysis
- check validity
- check for attributes and relations
- type checking on the qualification
- There are two possible forms of representing the
predicates in query qualification Conjunctive
Normal Form (CNF) or Disjunctive Normal Form
(DNF) - CNF (p11 ? p12 ?... ? p1n) ? ... ? (pm1 ? pm2
?... ? pmn) - DNF (p11 ? p12 ?... ? p1n) ? ... ? (pm1 ? pm2
?... ? pmn) - OR's mapped into union
- AND's mapped into join or selection
11Query Decomposition Analysis
- Queries are rejected because
- the attributes or relations are not defined in
the global schema or - operations used in qualifiers are semantically
incorrect - For only those queries that do not use
disjunction or negation semantic correctness can
be determined by using query graph - One node of the query graph represents result
sites, others operand relations, edge between
nodes operand nodes represent joins, and edge
between operand node and result node represents
project
12Query Graph and Join Graph
SELECT Ename, Resp FROM E, G, J WHERE E. ENo
G. ENO AND G.JNO J.JNO AND JNAME CAD''
AND DUR gt 36 AND Title Prog''
E. ENo G. ENO
G.JNO J.JNO
Ename
Resp
JNAME CAD''
Title Prog''
E. ENo G. ENO
G.JNO J.JNO
DUR gt 36
13Disconnected Query Graph
- Semantically incorrect conjunctive multivariable
query without negation have query graphs which
are not connected
SELECT Ename, Resp FROM E, G, J WHERE E. ENo
G. ENO AND JNAME CAD'' AND DUR gt 36 AND
Title Prog''
Ename
Resp
Title Prog''
E. ENo G. ENO
DUR gt 36
JNAME CAD''
14Simplification Eliminating Redundancy
- Elimination of redundant predicates using well
known idempotency rules - p ? p p p ? p p p ? true true
- p ? false p p ? true p p ? false
false - p1 ? (p1 ? p 2 ) p1
- p1 ? (p1 ? p 2 ) p1
- Such redundant predicates arise when user query
is enriched with several predicates to
incorporate view relation correspondence, and
ensure semantic integrity and security
15Eliminating Redundancy-- An Example
SELECT TITLE FROM E WHERE (NOT (TITLE
Programmer'') AND (TITLE Programmer'' OR
TITLE Elec.Engr'') AND NOT (TITLE
Elec.Engr'')) OR ENAME J.Doe''
SELECT TITLE FROM E WHERE ENAME J.Doe''
16Eliminating Redundancy-- An Example
p1 ltTITLE Programmer''gt p2 ltTITLE
Elec. Engr''gt p3 ltENAME J.Doe''gt
Let the query qualification is ( p1 ? (p1 ?
p2) ? p2) ? p3
The disjunctive normal form of the query is (
p1 ? p1 ? p2) ? ( p1 ? p2 ? p2) ? p3
(false ? p2) ? ( p1 ? false) Ú p3 false ?
false ? p3 p3
17Query Decomposition Rewriting
- Rewriting calculus query in relational algebra
- straightforward transformation from relational
calculus to relational algebra, and - restructuring relational algebra expression to
improve performance
18Rewriting -- Transformation Rules (I)
- Commutativity of binary operations
- R ? S ? S ? R
- R ? S ? S ? R
- Associativity of binary operations
- (R ? S) ? T ? R ? ( S ? T )
- Idempotence of unary operations grouping of
projections and selections - ?A ( ? A (R )) ? ? A (R ) for A?A? A
- ?p1(A1) ( ? p2(A2) (R )) ? ? p1(A1) ?p2(A2) (R )
19Rewriting -- Transformation Rules (II)
- Commuting selection with projection
- ?A1, , An ( ? p (Ap) (R )) ? ?A1, , An ( ? p
(Ap) ( ?A1, , An, Ap(R ))) - Commuting selection with binary operations
- ? p (Ai)(R ? S) ? (? p (Ai)(R)) ? S
- ? p (Ai)(R S) ? (? p (Ai)(R)) S
- ? p (Ai)(R ? S) ? ? p (Ai)(R) ? ? p (Ai)(S)
- Commuting projection with binary operations
- ?C(R ? S) ? ?A(R) ? ?B (S) C A ? B
- ?C(R S) ? ?C(R) ?C (S)
- ?C (R ? S) ? ?C (R) ? ?C (S)
20An SQL Query and Its Query Tree
?ENAME
?(ENAMEltgtJ.DOE )?(JNAMECAD/CAM )? (Dur12 ?
Dur24)
SELECT Ename FROM J, G, E WHERE G.EnoE.ENo
AND G.JNo J.JNo AND ENAME ltgt J.Doe'
AND JName CAD' AND (Dur12 or
Dur24)
PROJ
ASG
EMP
21Query Decomposition Rewriting
?ENAME
? JNO, ENAME
? JNO
?JNAMECAD/CAM
? JNO, ENO
? ENO, ENAME
?Dur12 ? Dur24
?ENAMEltgtJ.DOE
PROJ
ASG
EMP
22Data Localization
- Input Algebraic query on distributed relations
- Determine which fragments are involved
- Localization program
- substitute for each global query its
materialization program - optimize
23Data Localization-- An Example
EMP is fragmented into EMP1 ?ENO? E3
(EMP) EMP2 ? E3 lt ENO? E6 (EMP) EMP3 ?ENO
gtE6 (EMP)
?ENAME
?Dur12 ? Dur24
?ENAMEltgtJ.DOE
ASG is fragmented into ASG1 ?ENO? E3
(ASG) ASG2 ?ENO gtE3 (ASG)
?JNAMECAD/CAM
PROJ
?
?
ASG1
ASG1
ASG2
EMP1
EMP1
EMP1
24Reduction with Selection
EMP is fragmented into EMP1 ?ENO? E3
(EMP) EMP2 ? E3 lt ENO? E6 (EMP) EMP3 ?ENO
gtE6 (EMP)
SELECT FROM EMP WHERE ENOE5
Given Relation R, FRR1, R2, , Rn where Rj
?pj(R) ?pj(Rj) ? if ?x ? R ?(pi(x)?pj(x))
25Reduction with join
SELECT FROM EMP, ASG WHERE EMP.ENOASG.ENO
EMP is fragmented into EMP1 ?ENO? E3
(EMP) EMP2 ? E3 lt ENO? E6 (EMP) EMP3 ?ENO
gtE6 (EMP)
ASG is fragmented into ASG1 ?ENO? E3
(ASG) ASG2 ?ENO gtE3 (ASG)
26Reduction with Join (I)
(R1 ? R2) S ? (R1 S) ? (R2
S)
27Reduction with Join (II)
?
Given Ri ?pi(R) and Rj ?pj(R) Ri Rj ? if
?x ? Ri , ?y? Rj ?(pi(x)?pj(y))
Reduction with join 1. Distribute join over
union 2. Eliminate unnecessary work
28Reduction for VF
- Find useless intermediate relations
- Relation R defined over attributes A A1, A2,
, An vertically fragmented as Ri ?A (R) where
A? A - ?K,D (Ri) is useless if the set of projection
attributes D is not in A
EMP1 ?ENO,ENAME (EMP) EMP2 ?ENO,TITLE (EMP)
?ENAME
SELECT ENAME FROM EMP
EMP1
29Reduction for DHF
Distribute joins over union Apply the join
reduction for horizontal fragmentation
SELECT FROM EMP, ASG WHERE ASG.ENO
EMP.ENO AND EMP.TITLE Mech. Eng.
30Reduction for DHF (II)
Joins over union
?
31Reduction for HF
- Remove empty relations generated by contradicting
selection on horizontal fragments - Remove useless relations generated by projections
on vertical fragments - Distribute joins over unions in order to isolate
and remove useless joins
32Reduction for HF --An Example
EMP1 ?ENO?E4 (?ENO,ENAME (EMP)) EMP2
?ENOgtE4 (?ENO,ENAME (EMP)) EMP3 ?ENO,TITLE
(EMP) QUERY SELECT ENAME FROM EMP WHERE ENO
E5
33Why Optimization An Example Query
Database
EMP(eno, ename, title) ASG(eno, jno, resp, dur)
RA tree
Query
Find the name of the employees who are managing a
project?
SQL
Select ename From EMP e, ASG g Where e.Eno g.
Eno And resp manager
34Example - Strategies
Plan B
Site 5
Fragment Schema
?respmanager
EMP1 ?ENO lt 100(EMP) at site 1 EMP2 ?ENO gt
100(EMP) at site 2 ASG1 ?ENO lt 100(ASG) at
site 3 ASG2 ?ENO gt 100(ASG) at site 4
?
?
ASG1
EMP1
ASG2
EMP2
Query site Site 5
Plan A
ASG1
ASG2
35Example DB Statistics Costs
- Database Statistics
- EMP has 400 tuples,
- ASG has 1000 tuples,
- there are 20 managers in G
- the data is uniformly distributed among sites.
- ASG and EMP are locally clustered on attributes
RESP and ENO, respectively - Costs
- tuple access tacc 1 unit,
- tuple transfer ttrans 10 units,
36Costs for Example Plan
- The cost of Plan A
- Produce ASG 20 ? tacc 20 (processing
locally) - Transfer ASG 20 ttrans 200 (transfer to
EMP site) - Produce EMP (1010) tacc 2 40 (join at
the EMP site) - Transfer EMP 20 ttrans 200 (send to Site
5) - Total cost 460
- The cost of Plan B
- Transfer EMP 400 ttrans 4,000 (send EMP
to Site 5) - Transfer ASG 1000 ttrans 10,000 (send ASG
to Site 5) - Produce ASG 1000 tacc 1,000 (selection
at Site 5) - Join EMP and ASG 400 20 tacc 8,000
(join at Site 5) - Total cost 23,000
37Query Optimization
- Problems in query optimization
- Determining the physical copies of the fragments
upon which to execute the fragment query
expressions (also known as materialization) - Selecting the order of execution of operations
- Selecting the method for executing each operation
- The above problems are not independent, for
instance, the choice of the best materialization
for a query depends on the order in which
operations are executed. But they are treated as
independent. Further, - We bypass (1) by taking materialization for
granted - We bypass (3) by clustering all operations at the
same site as a local database system dependent
problem
38Query Optimization - Objectives
- The selection of alternative query execution
strategies is made based on predetermined
objectives - Two main objectives
- minimize the total processing time (total cost)
- network and computers at nodes do not get loaded.
- Response time cannot be guaranteed
- minimize the response time
- allocation must facilitate parallel execution of
the query - but throughput may decrease and cost can be
higher than total cost - Total processing time (cost) is the sum of all
the time (cost) incurred in executing the query
(CPU, I/O, data transfer) - Response time is the elapsed time from the
initiation till the completion of the query
39Optimization Algorithms The Issues
- Cost model
- cost components
- weights for each components
- costs for primitive operations
- Search space
- The set of equivalent algebra expressions (query
trees) - Search strategies
- How do we move inside the search space
- Exhaustive search, heuristics,
40Cost Models
- The cost measures are I/O and CPU for
centralized DBMSs and I/O, CPU and data transfer
costs for DDBMS - Total cost CPU cost I/O cost communication
cost - CPU cost Ccpu insts
- I/O cost C i/o i/os
- Communication Cost Cmsgmsgs Ctrbytes
- Ccpu, C i/o, Ctr and Cmsg are all assumed to be
constants. - Response time sum (sequential operations)
- Ccpus_insts
- Ci/os_i/os
- Cmsgs_msg ctrs_bytes
- S_x stands for maximum number of sequential xs
that need to be executed to process the query
41Intermediate Result Size
- The size of the intermediate relations produced
during the execution facilitates the selection of
the execution strategy - This is useful in selecting an execution strategy
that reduces data transfer - The sizes of intermediate relations need to be
estimated based on cardinalities of relations and
lengths of attributes - RA1, A2,..., An fragmented as R1,R2,, Rn the
statistical data collected typically are - len(Ai), length of attribute Ai in bytes
- min(Ai) and max(Ai) for ordered domains
- card(dom(Ai)) unique values in domAi
- Number of tuples in each fragment card(Rj)
42Intermediate Size Estimation
- Join selectivity factor
- SFj(r,s) card(r s) / card(r) card(s)
- Selecton selectivity factor
- SFS(F) card(?f(r)) / card(r)
- size(r) card(r) len(r)
- Cardinality of intermediate relations
- SFS(A value) 1/card(dom(A))
- SFS(A gt value) max(A) - value/max(A)-min(A)
- SFS(A lt value) value - min(A)/max(A)-min(A)
- Sfs(p(Ai)?p(Aj)) sfs(p(Ai)) sfs(p(Aj))
- Sfs(p(Ai)? p(Aj)) sfs(p(Ai)) sfs(p(Aj)) -
sfs(p(Ai)) sfs(p(Aj)) - SFS(A ? values) SFS(A value) card(values)
43Intermediate Size Estimation (II)
- Projection
- card(?a(r)) card(r)
- Cartesian product
- card(r X S) card(r) card(s)
- Join
- card(R AB S) card(s)
- if A is key in R, B is foreign key in S
- card(R AB S) SFJ(R,S) card(r)
card(s) - Union
- Upper bound card(r) card(s)
- Lower bound maxcard(r), card(s)
44Cost of Processing Primitive Operations
- Selection
- Projection
- Union
- Join
- nested-loops
- sort-merge
- hash-based
- For distributed join, semi-join is proposed to
perform joins
45Semi-join
Amount of data transferred R S
- join is replaced with a project followed by
semi-join and then join - the project and join operations are done at one
site, and semi-join at another site - amount of data transferred R S
46Semi-join versus Join
- using sem-ijoin increases local processing costs
because a relation must be scanned twice (join,
project) - For joining intermediate relations produced
during sem-ijoin one cannot exploit indices on
the base relations - Sem-ijoin may not be good when communication
costs are low
47Search Space
SELECT ENAME, RESP FROM EMP, ASG, PROJ WHERE
EMP.ENOASG.ENO AND ASG.PNOPROJ.PNO
- Search space is characterized by alternative
execution plans - Most optimizers focus on join trees
- For N relations, there are O(N!) equivalent join
trees
?
ASG
EMP
PROJ
48Restricting Search Space
deep tree
- O(N!) is large
- Considering join methods, the search space is
even bigger - Restrict by means of heuristics
- Ignore cartisian product
-
- Restrict the shape of the join tree
- Only consider deep trees
- .
R4
R1
R3
R2
bushy tree
R4
R3
R1
R2
Left-deep tree
R1
R2
R3
R4
49Search Strategy
- How to move in the search space to find the
optimal plan - Deterministic
- Start from base relations and build plans by
adding relations at each step - Dynamic programming breadth-first
- Greedy depth-first
- Randomized
- Search for the optimal one around a particular
starting point - simulated annealing
- iterative improvement
50Search Strategies -- Example
Deterministic
Randomized
51Distributed Query Optimization Algorithms
- System R and R
- Hill Climbing and SDD-1
52System R (Centralized) Algorithm
- Simple (one relation) queries are executed
according to the best access path. - Execute joins
- Determine the possible ordering of joins
- Determine the cost of each ordering
- Choose the join ordering with the minimal cost
- For joins, two join methods are considered
- Nested loops
- Merge join
-
53System R Algorithm -- Example
- Names of employees working on the CAD/CAM
project - Assume
- EMP has an index on ENO,
- ASG has an index on PNO,
- PROJ has an index on PNO and an index on PNAME
54System R Algorithm -- Example
- Choose the best access paths to each relation
- EMP sequential scan (no selection on EMP)
- ASG sequential scan (no selection on ASG)
- PROJ index on PNAME (there is a selection on
PROJ based on PNAME) - Determine the best join ordering
- EMP ASG PROJ
- ASG PROJ EMP
- PROJ ASG EMP
- ASG EMP PROJ
- EMP ? PROJ ASG
- PROJ ? EMP ASG
- Select the best ordering based on the join costs
evaluated according to the two methods
55System R Example (cont'd)
PROJ
EMP
ASG
ASG EMP
PROJ ASG
EMP PROJ
- Best total join order is one of
56System R Algorithm
- (PROJ ASG) EMP has a useful index on
the select attribute and direct access to the
join attributes of ASG and EMP. -
- Final plan
- select PROJ using index on PNAME
- then join with ASG using index on PNO
- then join with EMP using index on ENO
57System R Distributed Query Optimization
- Total-cost minimization. Cost function includes
local processing as well as transmission. - Algorithm
- For each relation in query tree find the best
access path - For the join of n relations find the optimal join
order strategy - each local site optimizes the local query
processing
58Data Transfer Strategies
- Ship-whole. entire relation is shipped and stored
as temporary relation, merge join algorithm is
used, done in pipeline mode - Fetch-as-needed. this method is equivalent to
semijoin of the inner relation with the outer
relation tuple
59Join Strategy 1
- External relation R with internal relation S, let
LC be local processing cost, CC be data transfer
cost, let average number of tuples of S that
match one tuple of R be s - Strategy 1. Ship the entire outer relation to the
site of internal relation - TC LC(get R)
- CC(size(R))
- LC(get s tuples from S)card(R)
60Join Strategy 2
- Ship the entire inner relation to the site of the
outer relation - TC LC(get S)
- CC(size(S))
- LC(store S)
- LC(get R)
- LC(get s tuples from S)card(R)
61Join Strategy 3
- Fetch tuples of the inner relation for each tuple
of the outer relation -
- TC LC(get R)
- CC(len(A)) card(R)
- LC(get s tuples from S) card(R)
- CC(slen(S))card(R)
62Join Strategy 4
- Move both relations to 3rd site and join there
- TC LC(get R)
- LC(get S)
- CC(size(S))
- LC(store S)
- CC(size(R))
- LC(get s tuples from S)card(R)
- Conceptually, the algorithm does an exhaustive
search among all alternatives and selects one
that minimizes total cost
63Hill Climbing Algorithm - Algorithm
- Inputs
- query graph, locations of relations, and relation
statistics - Initial solution
- the least costly among all when the relations
are sent to a candidate result site denoted by
ES0, and the site as chosen site - Splits ES0 into
- ES1 ship one relation of join to the site of
other relation - ES2 these two relations are joined locally and
the result is transmitted to the chosen site - If cost(ES1) cost(ES2) LC gt cost (ES0) select
ES0, - else select ES1 and ES2.
- The process can be recursively applied to ES1 and
ES2 till no more benefit occurs
64Hill Climbing Algorithm - Example
?SAL
TITLE
PAY
EMP
?PNAMECAD/CAM
ASG
Ignore the local processing cost Length of tuples
is 1 for all relation
PROJ
65HCA - Example
Site1 EMP(8)
ES1
?
Site2 PAY(4)
ES2
Solution 1 Cost
TITLE
?
Site4 ASG(10)
Site3 PROJ(1)
?
ES3
Site1 EMP(8)
ES1
Site2 PAY(4)
ESo is the BEST
ES2
Solution 2 Cost
Site4 ASG(10)
ES3
Site3 PROJ(1)
66Hill Climbing Algorithm - Comments
- Greedy algorithm determines an initial feasible
solution and iteratively tries to improve it. - If there are local minimas, it may not find the
global minima - If the optimal solution has a high initial cost,
it wont be found since it wont be chosen as the
initial feasible solution.
Site1 EMP(8)
Site2 PAY(4)
Site4 ASG(10)
Site3 PROJ(1)
COST
67SDD-1 Algorithm
- SDD-1 algorithm generalized the hill-climbing
algorithm to determine ordering of beneficial
semijoins and uses statistics on the database,
called database profiles. - Cost of semijoin
- Cost (R SJA S) CMSG CTRsize(?A(S))
- Benefit is the cost of transferring irrelevant
tuple - Benefit(R SJA S) (1-SFSJ(S.A)) size(R) CTR
- A semijoin is beneficial if cost lt benefit.
68SDD-1 The Algorithm
- initialization phase generates all beneficial
semijoins, and an execution strategy that
includes only local processing - most beneficial semijoin is selected statistics
are modified and new beneficial semijoins are
selected - the above step is done until no more beneficial
joins are left - assembly site selection to perform local
operations - postoptimization removes unnecessary semijoins
69SDD1 - Example
SELECT FROM EMP, ASG, PROJ WHERE EMP.ENO
ASG.ENO AND ASG.PNO PROJ.PNO
Site 2 ASG
ENO
PNO
Site 1 EMP
Site 3 PROJ
70SDD1 - First Iteration
- SJ1 ASG SJ EMP
- benefit (1-0.3)3000 2100
- cost 120
- SJ2 ASG SJ PROJ
- benefit (1-0.4)3000 1800
- cost 200
- SJ3 EMP SJ ASG
- benefit (1-0.8)1500 300
- cost 400
- SJ4 PROJ SJ ASG
- benefit 0
- cost 400
- SJ1 is selected
- ASG size is reduced to 30000.3900
- ASG ASG SJ EMP
- Semijoin selectivity factor is reduced it is
approximated by SFSJ(G.ENO) 0.80.3 0.24
71SDD-1 - Second Third Iterations
- Second iteration
- SJ2 ASG SJ PROJ benefit(1-0.4)900540
- cost200
- SJ3 EMP SJ ASG benefit(1-0.24)15001140
- cost400
- SJ3 is selected
- EMP EMP SJ ASG size(EMP) 15000.24 360
- Third Iteration
- SJ2 ASG SJ PROJ
- benefit(1-0.4)900540
- cost200
- it is selected
- reduces size of G further to 9000.4360
72Local Optimization
- Each site optimizes the plan to be executed at
the site - A centralized query optimization problem
73SDD-1 - Assembly Site Selection
- After reduction
- EMP is at site 1 with size 360
- ASG is at site 2 with size 360
- PROJ is at site 3 with size 2000
- Site 3 is chosen as assembly site
- no semijoins reduced in post optimization.
Site1 EMP
Site2 ASG
Site3 PROJ
(ASG SJ EMP) SJ PROJ ? site 3 (EMP SJ ASG) ? site
3 join at site 3