Title: Locally Decodable Codes
1Locally Decodable Codes
Uri Nadav
2Contents
- What is Locally Decodable Code (LDC) ?
- Constructions
- Lower Bounds
- Reduction from Private Information Retrieval
(PIR) to LDC
3Minimum Distance
- For every x?y that satisfy d(C(x),C(y)) d
- Error correction problem is solvable for less
than d/2 errors - Error Detection problem is solvable for less than
d errors
4Error-correction
Codeword
Encoding
x
C(x)
Input
Worst case error assumption
Errors
Corrupted codeword
y
Decoding
i
xi
Bit to decode
Decoded bit
5Query Complexity
- Number of indices decoder is allowed to read from
(corrupted) codeword - Decoding can be done with query complexity
O(C(x)) - We are interested in constant query complexity
6Adversarial Model
- We can view the errors model as an adversary that
chooses positions to destroy, and has access to
the decoding/encoding scheme (but not to random
coins)
The adversary is allowed to insert at most ?m
errors
7Why not decode in blocks?
- Adversary is worst case so it can destroy more
than d fraction of some blocks, and less from
others.
Nice errors
Worst Case
Many errors in the same block
8Ideal Code C0,1n??m
- Constant information rate n/m gt c
- Resilient against constant fraction of errors
(linear minimum distance) - Efficient Decoding (constant query complexity)
No Such Code!
9Definition of LDC
C0,1n??m is a (q,?,?) locally decodable code
if there exists a prob. algorithm A such that
?x ? 0,1n, y ? ?m with distance d(y,C(x))lt?m
and ?i ? 1,..,n, Pr A(y,i)xi gt ½ ?
The Probability is over the coin tosses of A
A reads at most q indices of y (of its choice)
Queries are not allowed to be adaptive
A has oracle access to y
A must be probabilistic if qlt ?m
10Example Hadamard Code
- Hadamard is (2,d, ½ -2d) LDC
- Construction
Relative minimum distance ½
Encoding
x1
x2
xn
ltx,1gt
ltx,2gt
ltx,2n-1gt
source word
codeword
11Example Hadamard Code
Pick a?R0,1n
2 queries
reconstruction formula
ltx,agt
ltx,aeigt
Decoding
ltx,1gt
x1
x2
xn
ltx,2gt
ltx,2n-1gt
source word
codeword
If less than d fraction of errors,
then reconstruction probability is at least 1-2d
12Another Construction
Probability of 1-4? for correct decoding
13Generalization
2k queries m2kn1/k
14Smoothly Decodable Code
C0,1n??m is a (q,c,?) smoothly decodable code
if there exists a prob. algorithm A such that
1
?x ? 0,1n and ?i ? 1,..,n, Pr A(C(x),i)xi
gt ½ ?
The Probability is over the coin tosses of A
A has access to a non corrupted codeword
2
A reads at most q indices of C(x) (of its choice)
Queries are not allowed to be adaptive
3
?i ? 1,..,n and ?j ? 1,..,m, Pr A(,i)
reads j c/m
The event is A reads index j of C(x) to
reconstruct index i
15LDC is also Smooth Code
- Claim Every (q,d,e) LDC is a (q,q/d,e) smooth
code. - Intuition If the code is resilient against
linear number of errors, then no bit of the
output can be queried too often (or else
adversary will choose it)
16Proof LDC is Smooth
- A - a reconstruction algorithm for (q,d,e) LDC
- Si j PrA query j gt q/dm
- There are at mostq queries, so sum of prob. over
j is q , thus Si lt dm
Set of indices read too often
17ProofLDC is Smooth
- A uses A as black box, returns whatever A
returns as xi - A gives A oracle access to corrupted codeword
C(x), return only indices not in S
- A reconstructs xi with probability at least 1/2
e, because there are at most Si lt dm errors
A is a (q,q/d, e) Smooth decoding algorithm
18Proof LDC is Smooth
indices that A reads too often
C(x)
what A wants
A
what A gets
C(x)
0
0
0
indices that A fixed arbitrarily
19Smooth Code is LDC
- A bit can be reconstructed using q uniformly
distributed queries, with e advantage , when no
errors - With probability (1-qd) all the queries are to
non-corrupted indices.
Remember Adversary does not know decoding
procedures random coins
20Lower Bounds
- Non existence for q 1 KT
- Non linear rate for q 2 KT
- Exponential rate for linear code, q2 Goldreich
et al - Exponential rate for every code, q2
Kerenidis,de Wolf (using quantum arguments)
21Information Theory basics
H(x) -?Prxi log(Prxi)
I(x,y) H(x)-H(xy)
22Information Theory cont
- Entropy of multiple variable is less than the sum
of entropies! (equal in case of all variables
mutually independent - H(x1x2xn) ? H(xi)
- Highest entropy is of a uniformly distributed
random variable.
23IT result from KT
24Proof
25Single query (q1)
Claim If C0,1n??m, is (1,d,e) locally
decodable then
No such family of codes!
26Good Index
- Index j is said to be good for i, if
-
- PrA(C(x),i)xi A reads j gt ½ e
27Single query (q1)
By definition of LDC
Conditional prob. summing over disjoint events
- There exist at least a single j1 which is good
for i.
28Perturbation Vector
- Def Perturbation vector ?j1,j2, takes random
values uniformly distributed from ?, in position
j1,j2, and 0 otherwise.
0
0
j1 ?
0
0
j2 ?
0
Destroys specified indices in most unpredicted
way
29Adding perturbation
A resilient Against at least 1 error
So, there exists at least one index, j2 good
for i.
j2 ? j1 , because j1 can not be good!
30Single query (q1)
A resilientAgainst dm errors
So, There are at least dm indices of The
codeword good for every i. By pigeonhole
principle , there exists an index j in 1..m,
good for dn indices.
31Single query (q1)
- Think of C(x1..dn) projected on j as a
function from the dn indices of the input. The
range is ?, and each bit of the input can be
reconstructed w.p. ½ e. Thus by IT result
32Case q2
- m O(n)q/(q-1)
- Constant time reconstruction procedures are
impossible for codes having constant rate!
33Case q2 Proof Sketch
- A LDC C is also smooth
- A q smooth codeword has a small enough subset of
indices, that still encodes linear amount of
information - So, by IT result, m(q-1)/q O(n)
34Applications?
- Better locally decodable codes have applications
to PIR - Applications to the practice of fault-tolerant
data storage/transmission?
35What about Locally Encodable
- A Respectable Code is resilient against O(m)
fraction of errors. - We expect a bit of the encoding to depend on many
bits of the encoding
Otherwise, there exists a bit which influence
less than 1/n fraction of the encoding.
36Open Issues
- Adaptive vs Non-Adaptive Queries
guess first q-1 answers with succeess probability
?q-1
37Logarithmic number of queries
- View message as polynomial pFk-gtF
- of degree d (F is a field, F gtgt d)
- Encode message by evaluating p at all Fk points
- To encode n-bits message, can have F polynomial
in n, and d,k around - polylog(n)
38To reconstruct p(x)
- Pick a random line in Fk passing through x
- evaluate p on d1 points of the line
- by interpolation, find degree-d univariate
polynomial that agrees with p on the line - Use interpolated polynomial to estimate p(x)
- Algorithm reads p in d1 points, each uniformly
distributed
39x(d1)y
x2y
xy
x
40Private Information Retrieval (PIR)
- Query a public database, without revealing the
queried record. - Example A broker needs to query NASDAQ database
about a stock, but dont wont anyone to know he
is interested.
41PIR
- A k server PIR scheme of one round, for database
length n consists of
42PIR definition
- These function should satisfy
43Simple Construction of PIR
- 2 servers, one round
- Each server holds bits x1,, xn.
- To request bit i, choose uniformly A subset of
n - Send first server A.
- Send second server Ai (add i to A if it is not
there, remove if is there) - Server returns Xor of bits in indices of request
S in n. - Xor the answers.
44Lower Bounds On Communication Complexity
- To achieve privacy in case of single server, we
need n bits message. - (not too far from the one round 2 server scheme
we suggested).
45Reduction from PIR to LDC
- A codeword is a Concatenation of all possible
answers from both servers - A query procedure is made of 2 queries to the
database