DEiXTo - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

DEiXTo

Description:

DEiXTo Powerful web data extraction tool Freeware GUI tool (built with Turbo Delphi, Windows-only) Free, cross-platform Command Line Executor (in Perl) DEiXToBot ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 9
Provided by: knto
Category:
Tags: deixto

less

Transcript and Presenter's Notes

Title: DEiXTo


1
DEiXTo
2
DEiXTo
  • Powerful web data extraction tool
  • Freeware GUI tool (built with Turbo Delphi,
    Windows-only)
  • Free, cross-platform Command Line Executor (in
    Perl)
  • DEiXToBot agent (implemented in Perl)
  • W3C Document Object Model (DOM)
  • DOM-based extraction rules (wrappers).
  • Extracted data can be exported to a wide variety
    of formats (tab delimited, XML, RSS, etc).
  • Command Line Executor
  • has database support via the Database independent
    interface for Perl
  • supports additional formats Excel, CSV,
    OpenDocument Spreadsheet (.ods), HTML

3
GUI DEiXTo
  • user friendly graphical interface
  • enhanced, tree based, extraction rules
  • HTML tag filtering
  • fast, flexible and high performance tree pattern
    matching algorithm
  • regular expression support
  • can follow "Next Page" links and submit simple
    forms
  • can export results to XML and tab delimited
    formats and create RSS feeds
  • XML encoded wrapper project files (.wpf) that can
    be executed at will
  • last but not least, it's freeware!

4
DEiXTo Command Line Executor (CLE)
  • portable, efficient and fast command line
    executor of GUI DEiXTo generated wrappers
  • provides options and flexibility that you cannot
    get with GUI DEiXTo
  • supports additional output formats such as CSV,
    Excel and OpenDocument Spreadsheet
  • provides database support via DBI (the Database
    independent interface for Perl)
  • supports HTML output using an HTML template
    processor and an editable template file
  • overwrite, append and prepend output modes for
    all supported formats
  • can be scheduled to execute wrappers
    automatically (e.g. using cron in GNU/Linux)
  • it is free and open source, distributed under the
    GNU General Public License (GPL) Version 3!

5
DEiXToBot
  • A Mechanize agent (essentially a browser
    emulator) capable of extracting data of interest.
  • Flexible and efficient.
  • Allows extensive customization.
  • Supports multiple patterns on a single page and
    combination of their results.
  • Allows post-processing of the extracted data and
    enables you to transform it to any format you
    wish.
  • Programming skills required though to utilize it.

6
Corgialenios Library use case
  • From HTML unstructured data
  • To ESE format!

7
DEiXTo Services
  • We can definitely help you to
  • transform the contents of your digital library
    into OAI-PMH or another suitable format
  • quickly populate product catalogues with full
    specifications
  • search various web resources in real time and
    extract the results returned
  • prepare large, focused datasets for scientific
    tasks (i.e. data mining)
  • monitor prices of the competition
  • ltyour extraction task goes here!gt

8
Happy DEiXTo users!
  • For further information, please visit
    http//deixto.com
Write a Comment
User Comments (0)
About PowerShow.com