Title: The technology for searching documents with similar contents
1The technology for searching documents with
similar contents
SoftInform Search Technology
www.searchinform.com
2The Role of Search Technologies in Major
Information Systems and Document Processing
Systems
One of the major challenges facing companies at
present is the quick search of documents in large
data volumes.
The organization of data access is in direct
relation with the technologies and software that
are quick and efficient in processing
information.
Our experience with corporate customers has
revealed that a large company encounters a number
of problems directly caused by information
search.
www.searchinform.com
3The Role of Search Technologies in Major
Information Systems and Document Processing
Systems
Common Corporate Users Problems
Reduced search session duration
Fuzziness of informational content
Consolidating information from various sources
Initial information flows audit
www.searchinform.com
4The Role of Search Technologies in Major
Information Systems and Document Processing
Systems
The Role of Search Technologies in Major
Information Systems and Document Processing
Systems
Reduced Search Session Duration
Phrasal search Standard search session
- Selecting key words - Viewing results -
Repeated key phrase selection and viewing
results - etc
For example, we need to find the information on
company purchase. The key phrase "purchase of
companies" will yield a list of documents. After
you view the results, the search session
continues (since not always the first results are
satisfactory). Selecting key words again
"company mergers" and "company acquisitions",
etc. As a result you end up selecting the
required key phrases for the search and
processing a large number of documents. The
important thing to remember is that you can never
guarantee that you will remember all the required
phrases.
Major problems The time wasted on selecting
correct key words and viewing unnecessary
documents. Objective Reducing search session
duration
www.searchinform.com
5The Role of Search Technologies in Major
Information Systems and Document Processing
Systems
The Problem of Informational Content Fuzziness
The database or the informational system of the
company may store documents from various sources
that contain similar or identical information.
The same text may appear under different
headings, with slight changes or addendums which
may add a certain confusion to utilizing this
document.
For example, one expert may comment on document
1, while another expert comments on document
2. Therefore, search results randomly produce
one of the documents with one of the expert's
comments to it.
www.searchinform.com
6The Role of Search Technologies in Major
Information Systems and Document Processing
Systems
Consolidating Information from Various Sources.
This is a major problem, since the amount of
information keeps growing and large companies
have to spend enormous finances on combining the
information from various systems into a single
one.
Another problem is deploying the system at the
company's site, which may be quite painful for
the managing personnel for a certain time upon
deployment.
The information search technologies may serve as
a consolidating element for various informational
systems. Searching, automatically categorizing
and operating not instead of, but together with
the existing systems will structure the
informational components of any large company
within one application.
www.searchinform.com
7The Role of Search Technologies in Major
Information Systems and Document Processing
Systems
Initial Information Flows Audit
Duplicating documents from various sources or
added by different users are quite common in an
informational database.
As a rule, information is accumulated over many
years. In order to utilize all the advantages of
new search technologies to the maximum, you have
to get rid of unnecessary duplicates. After we
complete this stage of work, unit managers are
usually horrified at the mess in their work
organization.
www.searchinform.com
8Description of SoftInform Search Technology
All the these, as well as many other problems are
solved by the search technology developed by
SoftInform.
The major advantage and difference of SoftInform
Search Technology from the existing technologies
and search systems is the function of searching
documents with contents similar to query text
patented by the SoftInform company. It is the
unique ability of SoftInform technology that
provides a most efficient solution to most of the
problems related with processing and searching
information at a company.
www.searchinform.com
9Description of SoftInform Search Technology
SoftInform Search Technology is a technology for
searching documents with similar contents in text
files of virtually any format, in databases and
informational systems
The Potential of SoftInform Search Technology
- Uniques similar documents search
- Indexing speed 6 Gb/hour
- Small index size (20-25 of the whole volume of
the information being indexed)
- Support of virtually all known text file
formats - Various data sources concenpt
- Scaleability
- Consolidation of company information
www.searchinform.com
10Description of SoftInform Search Technology
Similar search operates will all words used in
the document. In addition, it looks up all
word-forms of each word (morphological analysis
system) and synonyms dictionary.
As soon as the results have been processed, the
resulting list will display all documents most
similar to the query text.
When you search similar documents, the order of
the strings or phrases is of no importance (you
can shuffle parts of the text, cut and add words,
sentences or paragraphs).
www.searchinform.com
11Description of SoftInform Search Technology
The new technology of the SoftInform company -
SoftInform Search Technology is a revolutionary
solution in the information search field.
SoftInform Search Technology is
language-independent. All the language specific
peculiarities are provided for by additional
modules.
The morphology and synonyms lists of any language
are easily connected to the system core.
www.searchinform.com
12Solution to problems by means of SoftInform
Search Technology
Reduced Search Session Duration
- A key phrase query
- Viewing results
- As soon as you find a required document, click
"find documents with similar contents" and view a
list of relevant documents on the topic you have
specified.
Thus, instead of wasting several hours on
searching the required information (viewing the
results and selecting key words) you can, by
means of the search similar technology, tackle
the job within a few minutes.
www.searchinform.com
13Solution to problems by means of SoftInform
Search Technology
Getting Rid of Informational Content Fuzziness
Duplicating documents result in wasting a lot of
time on processing them (viewing, editing, etc.),
which leads to reduced staff efficiency.
The search similar technology from SoftInform
solves this problem by comparing the documents
fed into the company's database with already
existing data and detecting duplicates.
www.searchinform.com
14Solution to problems by means of SoftInform
Search Technology
Consolidating Information from Various Sources
Systems based on SoftInform Search Technology are
easily integrated into an informational system of
the company, connecting various data sources and
having a client-server architecture.
Another advantage of SoftInform search
technologies is that the system is easily
embedded (built over) any informational system.
Nothing (that could entail additional expenses)
needs to be changed. Our search engine is
compatible with any software installed at a
company, from CRM systems and DBMS to task
management systems.
www.searchinform.com
15Solution to problems by means of SoftInform
Search Technology
Initial Information Flows Audit
Duplicates and unnecessary "similar" files can be
detected by the similarity analysis report
function. The operation will run dozens of times
faster than during conventional comparison.
Duplicating documents are often located in
various informational sources. Generating a
report on duplicating documents is one of the
components of the total informational flows audit
at a company that can be performed by our
experts. Upon such an audit we can offer the
company a tailored solution to various problems,
related with searching and structuring
information within the framework of the company.
www.searchinform.com
16 Existing Applications
SoftInform Search Technology is not a theoritical
finding. At present software developed on the
basis of this technology is successfully used in
various projects
Application of SoftInform Search Technology
- SearchInform. The framework for performing
search, including corporate search. This
application is a great illustration of SoftInform
Search Technology potential. - SST Hummingbird,
- SST Locia Soft. The examples of efficient
introduction of the SoftInform technologies into
document management systems. - The YurCall Center Project SoftInform Search
Technology is utilized in the project on
providing legal services over the phone, where
the speed of information search is of vital
importance. - University systems. Tracking "similar" reports,
theses, etc. automated libraries, and much more.
www.searchinform.com
17SoftInform Search Technology Popularity
At present SoftInform Search Technology is well
known in the Russian market. Our experience
reveals that the search technologies will be in
great demand for many years to come.
Best Soft 2005 According to PC Magazine
One of the tokens proving the need in full text
search both for end users and first and
foremost for organizations and companies is the
keen interest of the Russian IT-press towards our
technology and the software based on it.
www.searchinform.com
18An Ideal Business Solution
SearchInform is not just a "box" software
version. It is a full-fledged tool for creating
corporate systems of any level and for various
purposes.
SearchInform Search Technology is the key to
information processing problems facing up-to-date
companies.
www.searchinform.com
19Practical Application
Now I would like to provide specific examples
that illustrate how our system operates.
Let's proceed to the demonstration of
SearchInform Search Technology.
www.searchinform.com
20Thank you for your attention
QUESTIONS?
www.searchinform.com