Title: Sphinx ? ???????? ? ???????
1Sphinx ? ???????? ? ???????
- ?????? ???????
- Sphinx Technologies
2??? ?? Sphinx?
- Sphinx ??? ?????????????? ?????????
3??? ?? Sphinx?
- Sphinx ??? ?????????????? ?????????
- ????? ??????
- ????????, ????????????? ????????
- ????? ?? ??????
- ?????????? ? ??????????? ??????????
- ????? ?????? ????????
- ???????? ????????? ?????
4??? ?? Sphinx?
- ????? ??, ???? ?? ?????? ?? ?????!
- ????????? ?????????? ???????
- ?????????? ?????
- ?????????, ??????? SQL ???????
- ????????
- ???????? ?????????
- ?????????????
- ? ??? 10-20-30 ?????? ?????????? ?????
5??? ????? ? ????? ?????
- ? ??? ?? ?? ??????? ?? ????????
- ????? ??? ???
- ??? ???????? ? ????????????
- ??? ???????? ? ??????????
- ? ??? ?????????
- ??? Sphinx ??????? ??????
- ??? ?????????????? ??????
- ??? ???? highload
6????? ??????????
7????? ??????????
- ???? ??? ?????????
- Indexer ?????? ???????
- Searchd ???????? ?? ???????
- ??? ???? API
- ??????, ??????? ????? ???????? ? searchd ?? ????
- PHP, PECL, Python, Perl, Java, Ruby, C99, C,
Haskell, C, MySQL SE
8??? ???????? indexer
- ???? ????????? ??????
- ?????? ????? (MySQL, PgSQL, xmlpipe)
- ??? ????? (sql_query, sql_attr_xxx)
- ??? ????? (sql_query_pre, sql_query_post)
- ???? ?????????? ???????
- ??? ????????????? (???????????, ????????,
??????????, HTML stripper) - ???? ?????? ?????
9??? ???????? ? ???????
- ?????? ??? ?????????????? ????????
- ???????
- ?????? ?????????? ?? ???????? ??????
- ?????? ??????? ?? ??????????
- ??????????? ???????? ??????????
- Integer (?? 1 ?? 32 ??? 64 ???)
- Float
- MVA (????????????? ?????? 32-?????? ?????)
- ?? ???????? ???????? ????????? ??????
10??? ???????? searchd
- ???????? ??????, ????????? ?????
- ????? ???????????? ??????
- ?? ?????????? ????????? ????????
- ? ???????? ?? ????????? searchd
- ??? ????? ??????? ????????
- ??? ????? ????????? ???????? (??????)
11??? ??? ????????? ????????
12???????? ??????? ? ?????
- ???? ???????
- Bandwidth ??????? ????? ????????
- Latency ??????? ?????? ??????
- Availability ??????? ???? (??????????) ????????
- ??? ?????????
- ???????? ??? ?????????????? ???????
- ????????? ??? ???????????? ????????
13???????????? ???????
14??? ???????? ?????
- ??? ??????? ?????????? ???????
- ?????? ?????? ?????????? (??????????,
??????????????? ???????) - ????????? (?????? WHERE)
- ????????? (??????? ???? ??????????)
- ????????? (?????? ORDER BY)
- ?????????? (?????? GROUP BY)
- ????????? ?????????? ?? ???? ????????
15???? ?????? ??????
- ?????????? ?????? ??????????
- 1 ???????? ????? 1 IO (?????? ??????????)
- ?????? ???????? ??? ???????? ??????????
- ????????? ??????????????? () ????? ???????
- ?? ????, ????? ?????? ???? ???????? ????
- ??? ?????? ???? ???, ??? ? ???????? ??? ????????
??????? ???? ???????? 2x IO/CPU - ??????
- The Who ????? ?????? ??????
- ?????? ????? ?????? ??????????
16???? ??????????
- docinfoinline
- ???????? ???????? ? ?????? ??????????
- ??? ???????? ??????????? ????? ???!
- ???????? ????? ????? ?????? ? ?????
- docinfoextern
- ???????? ???????? ? ????????? ?????? (?????)
- ????????? ?????????? ? RAM
- ??? ?? docid ???????? ?????
- ??????? ????????
- C???????? ????? ?????????? ? ????????
17???? ????????????
- ?????? - ??????? ?? ranker-?
- ????????? ??????? ???? -
- ??????? ??? ?????????????
- ?? ????? ???????? - ??????? ????!
- ????????? ????? ???????????
- ????? ??????? - phrase proximity BM25
- ????? ??????? - none (weight1)
- ????????? - ????? ?????????? ? ??????????
18???? ??????????
- ????????? ????? ???????????
- ??? ??????? ?? ???????? ?????????? (?????????
?????? ? ??????? _at_id asc) - ??? ??????? ?? max_matches
- ??? ?????? max, ??? ???? ???????
- 1-10K ?????????, 100K ???????
- 10-20 ???????
19???? ???????????
- ??????????? ?????? ?????? ?????? ??????????
- ???? ????? ???????????
- ???? max_matches
- ????????, ?? max_matches ??????? ???????? _at_count
? _at_distinct
20(?????????) ???????????
21(?????????) ???????????
- ?????? ???????????? ? ??????????
- ??????? ?????? ???????? ????
- ????????????? (multi queries)
- ????????? ?????? (partitioning)
- ????????? ????? ??????
- ??? ??????? ??????
22????????????
- ?????? ??????, ??. SetRankingMode()
- ?? ????????? phraseBM25
- ??????????? ??????? ????
- ??? ?? ?????????!
- ?????? ?????????? ????? ????????
- ?????? ?????????? ????????????
- ???? ipod, ????????? ?? ????
23? ??????????
- ????? ????????? ????????????
- ????? ????????? ?? ????, ??? ?? ?????????
- ????? ??????????????
- ??. src/sphinxcustomsort.inl _at_custom
- ????? ??????????????
- ????????? ???????? ? ??????? _at_id asc
- _at_id asc gt date asc ??????????
- _at_id asc gt date desc ????? ???????? id
24??????? ?????? ????????
- ????????? ????
- ??? ??????????, ????????? ??????????? ????????
????? ? ???????? (_authorid123) - ??? ??????, ????????? ??? ? ??????
- ???????? ??????
- ??? ???????, ??? ??????
- ???????? ?????
- ???????? ??????, ?? ?????? ?? ?????
25??????? ?????? ????????
- ???? ?????? ?????? ???????? ????
- ???? ?????????? ????? ??????????
- ????? CPUIO, ?????? ?????? CPU
- ??????? ????????? ????????????? ????????
??????? - ?????? ????-? ???? ?????????? ? ?????!
- ?????? ????-? ????? ?????????? ? ??????!
26?????????????
- ????? ??????? ????? ???????? ??????
- ?????? ???????? network roundtrip
- ?????? ????? ????????? ???????????
- ????? ?????? ? ?????? ?????? ?????? ??????
?????????? ??????????? - 2x ??????????? ??????????? ??????
27?????????????
client new SphinxClient () q laptop //
coming from website user client-gtSetSortMode (
SPH_SORT_EXTENDED, _at_weight desc) client-gtAddQu
ery ( q, products ) client-gtSetGroupBy (
SPH_GROUPBY_ATTR, vendor_id ) client-gtAddQuery
( q, products ) client-gtResetGroupBy
() client-gtSetSortMode ( SPH_SORT_EXTENDED,
price asc ) client-gtSetLimit ( 0, 10
) result client-gtRunQueries ()
28(?????????) ???????????
- ?????? ???????????? ? ??????????
- ??????? ?????? ???????? ????
- ????????????? (multi queries)
- ????????? ?????? (partitioning)
- ????????? ????? ??????
- ??? ??????? ??????
29Partitioning
30Partitioning
- ???????? ??????? ?????? ????? ??. ???. ??????
- ???????? ? ???????????????
- ?????????, ??????????????? ?????? ?????????
- ???????? ? ???????????
- ?????????, ???? ?????? ?? ?????? ????????
- ???????? ? CPU/HDD?
- ?????????, ???????? ?? ?????? cores/HDDs/boxes
31????????? ??? ??????????
- ?????????? ??????? ??????
- ?? ??????? ????? ????????? ??????????
- ????????? ????? ????????? ?????
- 1-10 ???????? ???????? ???????
- ????????? ?????????? ? 50 (3024...)
- ????????? ?????????? ? 2000 (!!!)
32????????? ??? ??????????
- ?????????, ?? 100 ??????? ?? ?????????? ??????
???????? - ???????????? ???? ?????? ?????? ????
- ?????????? ??????????? (Query(), 3rd arg)
- ????????? ?????? ??? ???????????? ??????????
?????????????? ?????? - ??? ?????????? ?? ????????? ?????? ??
- ??? ???????????? ???????? ??? (!)
33????????? ??? CPU/HDD
- ?????????????? ??????, ????? ???? ?????? ??
?????????? ??????????? - ??????????? searchd ??? ?? ????
index dist1 type distributed local
chunk01 agent localhost3312chunk02 agent
localhost3312chunk03 agent
localhost3312chunk04
34???? highload
35???? highload
- ??? ??? ??? ?????????? ???????
- ?????? ????? ???? bandwidth
- ?????? ????? no-SPoF (??. ?????-?? HA)
- ???????? ?????? ??????? ?? ??????????
- ?????????? ???????? ???????????
- ?????????? ???????? ????????
- ????? ?????????????? ????? hard crash?
- ????????? ?? ?????????? ????????????
36?????? ??????
- ????? ???????? ??????????
- ??????-??????
- ??????? ??????-????????
- Merge
- Partitioning ?? ????????? ?????
- ????? ???????? ???????
- ???????????? ????????? ???????
- Partitioning ?? ????????? ?????
37?????? partitioning
- ???????? ??????????
- ????????? ???? ?????? ????? ???
- ??????????? ??????? ??????????
- Pro ????? ????????????/??????????
- Pro ????????? HA (???????? hotspare!)
- Pro ?????????? ??????????
- Contra ?????? ?????? bandwidth
38?????? partitioning
- ???????? ??? ?????????? partitioning
- ????????? ?????? ?? ??????????? ?????
- ???????? star topology
- ?????????? star-of-stars topology
- Pro ????? ????????????
- Pro ????? ??????? ?????? latency
- Contra ????????? ??? ???????????
- Contra ???? HA, ?????????? ??? ?????
39?????? partitioning
- ??????????????? ?????
- Partitioning ?????????? ??????
- ???????????? ????? ????????? ??????? ??????? LB
(?????), ?? ?????? TCP port - Pro ?????? ??? ??????
- Contra ??????? ????? ????????????? ?
????????????
40??????? ????????
- ????????? ????? searchd ?? ??????
- ???????? ????? ???????
- ?????, ????? ????? ?????????
- Raw HDD, ? ?? RAID
- ????? ?????????? ????? ????????
- ????? IO stepping ??? ?????? (!) ???????
- ?????? indexer
- ??????????? ???? ?????? crontab
41???????? ??????
- ??????? ????????? ????????? ? IO
- ???????? HDD, ?????? ??????!
- ???????? RAM
- ????????? ????????? ? CPU
- ??? ??????????? ????????
- vmstat ??? ? ????????? ????? CPU?
- oprofile ??? ????????? ????? CPU?
- iostat ????????? ????? HDD?
- ???? ????, ???? ????? searchd --iostats
42???????? ??????
- ??????????? ??????????
- ?????? ??? ???????? (us/sy/bi/bo), ??!
- ??????? HDD ????? ????????? ? iops
- ??????? CPU ????? ????????? ? sy
- ??????? ?????????? ???????? ? us
43(?????????) ???????????
- ?????? ???????????? ? ??????????
- ??????? ?????? ???????? ????
- ????????????? (multi queries)
- ????????? ?????? (partitioning)
- ????????? ????? ??????
- ??? ??????? ??????
44??? ??????? ??????
- ???? ????? ?????? ?? ????????
- Cutoff (??. SetLimits())
- ??????? ?????? ????? N ?????? ??????????
- ? ?????? ???????, ?? ????????
- MaxQueryTime (??. SetMaxQueryTime())
- ??????? ?????? ????? M ???????????
- ? ?????? ???????, ?? ????????
45??? ??????? ??????
- ???? ????? ?????? ?? ????????
- Consulting ?
- ????? ???????? ????????????
- ????? ???????? ????????????
46????????