Title: Merq
1Merq
2Jeremys Notes
- merq.c - Merlin query filter. Similarity,
superstructure, or smarts search on input smiles.
- Normal output is smiles, number of hits, then
each hit smiles, space separated. - HITSONLY omits the first two fields.
- ONEHITPERLINE omits the first two fields, and
adds a newline - after each hit smiles, to facilitate
postprocessing (more - smarts filtering perhaps).
- Author Jeremy Yang
- Rev 10 Nov 2000
3What Does merq Do?
- Reads a list of smiles or smarts
- Performs a similarity, superstructure, or smarts
search of a database on each - Reports the number of hits and the smiles of the
hits for each input smiles/smarts
4Why Did I Need merq?
- Wondered about the similarity of one vendors
database to that of anotheris there some magic
about certain vendors compounds? - Distrust of clustering
- More about that later
5What Have I Used merq For?
- To check uniqueness of vendor databases
- Are all the vendors selling the same compounds?
This could happen because both the commercial
reagents and the chemistries known to generalize
are available to everyone. - If so, we dont need to worry about some
unquantified quality attractive hit as part of
the decision of which vendor to use.
60.85 Similarity to Maybridge
Number
Percent
Number
File
o
f Structures
i
n File
Similar
t
o Buy
chemstar
59568
28.35
16890
timtec
28387
25.47
7230
asinex
134957
22.12
29848
chembridge
51945
20.77
10790
scientific exchange
18501
19.38
3585
specs specs4
112595
18.25
20543
zelinsky
111418
16.93
18867
sherk
6869
16.92
1162
enamine
89010
14.99
13347
ibs
100634
12.77
12849
aventis
51140
11.48
5869
Total
140980
7Cross-similarities of Vendor Compounds
8Distribution of Number of 0.85 Similars within
Different Vendor Databases
1
92 75 65 60
0.8
Cumulative Fraction of Database
0.6
0.4
0.2
0
0
20
40
60
80
100
Number of Similar Structures
9Distribution of Number of 0.85 Similars within
Different Vendor Databases
92 75 65 60
Number of Similar Structures
10Distribution of Number of 0.85 Similars within
Different Vendor Databases
Number of Similar Structures
11Wards Clusters of the MAO Dataset of 1645
Compounds
Compounds with four similar compounds
Compounds with no similar compounds
1
0.8
0.6
Fraction at this cluster size
0.4
0.2
0
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
11
Cluster size
12Wards Clusters of a Dataset of 19533 Compounds
Sizes of Clusters--Compounds with no Similars
n3633
2500
2000
1500
1000
500
0
1
3
5
7
9
11
13
15
17
19
Sizes of Clusters--Compounds with 9 Similars
Sizes of Clusters--Compounds with One other
Similar, n1148
n39
800
10
700
8
600
500
6
400
4
300
200
2
100
0
0
1
3
5
7
9
1
4
7
10
13
16
19
22
25
11
13
15
17
19
21