Database and Data-Intensive Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Database and Data-Intensive Systems

Description:

Title: Center for Data-intensive Systems Author: csj Last modified by: ira Created Date: 12/18/2006 2:28:14 PM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 14
Provided by: CSJ64
Category:

less

Transcript and Presenter's Notes

Title: Database and Data-Intensive Systems


1
Database andData-Intensive Systems
2
Data-Intensive Systems
  • From monolithic architectures to diverse systems
  • Dedicated/specialized systems, column stores
  • Data centers, web architectures, distributed
    architectures
  • From business data to all data
  • Streaming and sensor data, semi-structured and
    unstructured data
  • Multidimensional data, temporal data,
    spatio-temporal data
  • Examples
  • Clustering of high-dimensional data
  • Tracking and continuous queries for moving
    objects
  • Mobile service infrastructure
  • Location privacy
  • Spatio-textural search/hyper-local web search
  • Multimedia similarity search
  • This is where much of our research lives.

3
Staff
  • Ira Assent, associate professor
  • Christian S. Jensen, professor
  • Vaida Ceikute, Ph.D. student
  • Xiaohui Li, visiting Ph.D. student
  • NN, Ph.D. student
  • GEOCROWD indoor positioning and services
    infrastructure
  • NN, Ph.D. student
  • GEOCROWD spatial web objects
  • NN, Ph.D. student
  • eData Anomaly Detection in e-Science
  • NN, Ph.D. student
  • Streamspin
  • NN, Ph.D. student
  • WallViz
  • NN, Ph.D. student
  • REDUCTION
  • NN, Ph.D. student
  • REDUCTION

4
Graduate Course Portfolio dDO
  • Data management for moving objects (Q3)
  • The course covers selected research advances in
    the general area of indexing and update and query
    processing for moving objects.
  • Moving object tracking
  • Specific indexing techniques
  • R-tree based indexing
  • B-tree based indexing
  • Techniques for the efficient handling of frequent
    updates
  • Techniques for range and k nearest neighbor query
    processing, including one-time as well as
    continuous queries

5
Graduate Course Portfolio MDDB
  • Multidimensional databases (Q4)
  • Selected techniques for the management of
    multidimensionally represented data
  • Multidimensional data and applications
  • Data warehouses and data mining
  • Similarity search and query processing
  • Efficient handling indexing and associated query
    processing
  • Multistep similarity search
  • Indexing multidimensional data
  • Skyline query processing
  • Data mining techniques
  • Subspace clustering
  • Classification
  • Outlier detection

6
Graduate Course Portfolio Index
  • Indexing of disk-based data (Q1)
  • Indexing techniques for disk-based data for
    different types of data, as well as their support
    for queries and updates
  • General overview over indexes and query
    processing
  • Spatial indexing structures
  • Space partitioning indexing structures
  • Indexes for high dimensional data
  • Metric approaches
  • Special techniques for complex data types
  • Coming up for the first time this fall

7
Graduate Course Portfolio dDB2
  • Database management systems (Q2)
  • The course aims to give the participants a solid
    conceptual foundation for making competent use of
    a database management system.
  • Logical and physical query optimization and query
    processing
  • Concurrency control techniques
  • Database tuning
  • Central concepts and techniques in relation to
    supporting temporal and multi-dimensional data
  • Coming up for the first time this fall

8
Projects
  • Streamspin
  • Enable sites that are for mobile services what
    YouTube is for video
  • Easy mobile service creation and sharing
  • Advanced spatial and social context functionality
  • Be an open, extensible, and scalable service
    delivery infrastructure
  • MOVE
  • Knowledge extraction from massive data about
    moving objects
  • Cross-cutting activities, showcases, and
    evaluation
  • Representation of movement data and
    spatio-temporal databases
  • Analysis of movement and spatio-temporal data
    mining
  • WallViz
  • Collaborative analysis, joint decision making on
    wall-sized displays
  • scale to massive data collections
  • support ad-hoc queries
  • automatically provide entry points for analysis

http//www.move-cost.info
8
9
Projects (2)
  • GEOCROWD
  • Creating a Geospatial Knowledge World
  • advance the state-of-the-art in collecting,
    storing, analyzing, processing, reconciling, and
    publishing user-generated geospatial information
    on the Web
  • REDUCTION
  • Reducing the environmental footprint of fleets of
    vehicles
  • Optimizing the behavior of drivers
  • Supporting eco-routing of vehicles
  • Enabling transparency in multi-modal
    transportation
  • eData
  • Robust analysis in the context of imperfect data
    in e-Science
  • Detect and correct anomalies effectively
  • on-line, interactive, lineage-preserving, and
    semi-automatic
  • Scalable algorithms

10
How We Typically Work
  • We target some real problem that we find
    interesting.
  • We define the problem precisely.
  • We develop a solution that is typically a data
    structure or an algorithm, i.e., a concrete
    technique.
  • To evaluate, we build prototypes.
  • These are built for the purpose of studying the
    properties of our solutions.
  • We are often interested in performance, e.g.,
    runtime, space usage, communication cost.
  • For some solutions we state formal properties
    that we then prove, e.g., the correctness of a
    particular technique
  • Brief isolate and define problem, construct,
    then evaluate

11
Example 1 Spatial Web Querying
  • Setting
  • Google 90 billion queries/month, 20 billion
    with local intent.
  • We want to integrate exact locations of websites
    (for shops, bars, etc.) and users into web
    querying.
  • Queries
  • Results must match the query text and must be
    near the user.
  • Results of continuous queries must be updated as
    the user moves.
  • Challenges?
  • Support such queries with low computation cost on
    the server and
  • with little communication between server and
    client.
  • Solution
  • Invent an index that supports both text and
    location
  • Use a safe zone to reduce the communication
    between user and server for continuous queries

12
Example 2 Fraud detection
  • There are billions of financial transactions per
    minute
  • How do we uncover fraud?
  • Scalability
  • In-time for reaction
  • Manageable results
  • Possible solution sketch
  • Identify attributes of suspicious transactions
  • Sort incoming transactions into a tree-structure
    of historic data
  • When processing time is up, output degree of
    suspicion based on similarity to valid or
    fraudulent historic data

13
Interested?
  • Come talk to us!
  • We currently have M.Sc. and PhD. thesis openings
Write a Comment
User Comments (0)
About PowerShow.com