Title: Leiden Institute of Advanced Computer Science LIACS
1Leiden Institute of Advanced Computer
ScienceLIACS
- Joost N. Kok, Wetenschappelijk Directeur
2LIACS
- Het Informatica Instituut van de Universiteit
Leiden - Onderdeel Faculteit Wiskunde en
Natuurwetenschappen
3(No Transcript)
4- Volgens Cornell University
-
- Computer Scientists are more in demand today than
ever before. In fact, more and more fields, from
the arts and humanities to music, medicine,
linguistics and communication, architecture, and
the natural sciences rely on CS to advance their
inventions and powers of discovery. And where we
are today is just the beginning! - http//tinyurl.com/3zoz3ct
5Volgens CNN (2010) is de beste baan in Amerika
die van Software Architect Money and
PayScale.com rate the top 100 careers with great
pay and growth prospects. Top 100 rank 1
Software Architect Sector Information Technology
What they do Like architects who design
buildings, they create the blueprints for
software engineers to follow -- and pitch in with
programming too. Plus, architects are often
called on to work with customers and product
managers, and they serve as a link between a
company's tech and business staffs. What's to
like The job is creatively challenging, and
engineers with good people skills are liberated
from their screens. Salaries are generally higher
than for programmers, and a typical day has more
variety. Requirements Bachelor's degree, and
either a master's or considerable work experience
to demonstrate your ability to design software
and work collaboratively. http//tinyurl.com/2v5k
z6n
6Volgens de New York Times (juni 2011) groeien in
Amerika de studentenaantallen bij Informatica
hard Computer science is a hot major again. It
had been in the doldrums after the dot-com bust a
decade ago, but with the social media gold rush
and the success of "The Social Network," computer
science departments are transforming themselves
to meet the demand. At Harvard, the size of the
introductory computer science class has nearly
quadrupled in five years. The spike has raised
hopes of a ripple effect throughout the American
education system -- so much so that Mehran
Sahami, the associate chairman for computer
science at Stanford, can envision "a national
call, a Sputnik moment." http//tinyurl.com/3u7p
nuf
7Volgens het Centraal Plan Bureau (25 jul 2011) is
ICT van groot belang voor de economie De
grootste productiviteitswinsten in een economie
worden niet behaald door het hebben van ICT, maar
door het gebruik ervan ICT als Innovatie As. Om
gebruik te kunnen maken van ICT heeft een land
een aanzienlijke eigen softwarebasis nodig. De
Nederlandse softwaresector verschaft deze basis
met een jaarlijkse bijdrage aan de Nederlandse
economie van ruim 17 miljard euro. . Uit dit
onderzoek bleek dat er in 2010 meer dan 24.000
softwarebedrijven in Nederland waren die samen
ruim 17 miljard euro bijdroegen aan de
Nederlandse economie, oftewel 2,8 procent.
Hiermee is de softwaresector qua economische
bijdrage minstens zo groot als enkele topsectoren
in Nederland. http//tinyurl.com/3k9ag6l
8Leiden University
- Leiden University has six faculties that are made
up of institutes - Together they offer about 50 bachelor's
programmes and almost 100 master's programmes -
9Leiden University
- University Executive Board is entrusted with the
management and administration of the university
as a whole (Rector, President, Vice Rector) - Board of Governors
10Leiden University
- Six Faculties
- Archaeology
- Humanities
- Law
- Leiden University Medical Center (LUMC)
- Science
- Social and Behavioural Sciences
-
11(No Transcript)
12Leiden University
- Each faculty has a Faculty Board chaired by a
Dean - The Executive Board has regular meetings with the
Board of Deans on matters of university policy
13Science Faculty
- The mission of the Science Faculty is to carry
out excellent research and to provide outstanding
undergraduate and postgraduate education - The link between the interrelated core activities
of research and education is strongly emphasized
within the Faculty - Institutes Mathematics, Physics, Astronomy,
Chemistry, Biology, Bio-Pharmaceutical Sciences
and Computer Science
14Science Faculty
- Faculty
- Faculty Board
- Faculty Council
- Institutes
- Management Team (Scientific Director, Director of
Education) - Institute Council
15LIACS
- Management Team
- Scientific Director
- Director of Education
- Managing Director
- Opleidingscommissie
- Instituutsraad
- Examencommissie
- Raad van Toezicht
16Onderzoeksclusters van LIACS
- Algorithms
- Foundations of Software Technology
- Computer Systems
- Imagery and Media
- Technology Innovation Management
17 LIACS Research Clusters
- Algorithms - prof.dr. Thomas Bäck prof.dr.
Joost Kok - Computer Systems - prof.dr. Ed Deprettere
prof.dr. Harry Wijshoff - Foundation of Software Technology - prof.dr.
Farhad Arbab prof.dr. Joost Kok - Imaging - dr. Michael Lew dr.ir. Fons Verbeek
- Technology Innovation Management prof. dr.
Bernhard Katzy
18Professors _at_ LIACS
19LIACS
- Full Professors
- Associate Professors
- Assistant Professors
- Postdocs
- PhD students
- Support staff
20(No Transcript)
21Taken
- 40 onderzoek 40 onderwijs 20 management
- 80 onderwijs 20 onderzoek
- 1e, 2e en 3e geldstroom
22Onderwijs
- Bachelor Informatica
- Master Computer Science
- Master Media Technology
- Master ICT in Businesss
23Master Degrees
- Three Masters
- Computer Science (including Bioinformatics
Track) - Media Technology
- ICT in Business
- Two years
24PhD Education
- 60 PhD students _at_ LIACS
- Promovendi
- Buiten Promovendi
- Graduate School
- Onderzoeksscholen
- IPA
- ACSI
- SIKS
25Algorithms Cluster _at_ LIACS
26Natural computing
- Natural computing focuses on computational
methods gleaned from natural models, such as
evolutionary computation, molecular computing,
neural computing, cellular automata, and swarm
intelligence.
27Natural Computing
- Computers are to Computer Science as Comic Books
to Literature
28(No Transcript)
29Evolutionary Algorithms for Multi-Parameter
Physics
- Evolutionary algorithms are applied to problems
in multi-parameter physics, such as e.g. the
control of femto-second lasers to impact
molecules in a desired way.
30(No Transcript)
31The Fourth Paradigm
- Data-Intensive Scientific Discovery
One of the greatest challenges for 21st-century
science is how we respond to this new era of
data-intensive science. This is recognized as a
new paradigm beyond experimental and theoretical
research and computer simulations of natural
phenomenaone that requires new tools,
techniques, and ways of working. Douglas Kell,
University of Manchester
32Data Mining definitions
- Secondary analysis of data
- Induction of understandable useful models and
patterns from data - Algorithms for large quantities of data
33- Data Mining is the non-trivial process of
identifying valid, novel, potentially useful, and
ultimately understandable patterns in data
34(No Transcript)
35Typical Data Mining Results
- Forecasting what may happen in the future
- Classifying people or things into groups by
recognizing patterns - Clustering people or things into groups based on
their attributes - Associating what events are likely to occur
together - Sequencing what events are likely to lead to
later events
36Different types of problems
- Data mining problems / tasks often fall in one
of the following categories - Classification
- Regression
- Clustering
- Discovering associations
- Probabilistic modelling
37From Querying to Mining
Are there any occurrences of GAAT in this string?
Standard database technology solves such questions
How many occurrences of AAT are there in this
string?
Which substrings of length 4 occur at least 2
times?
Data mining technology can sometimes solve
such questions (computations may be (too) heavy)
Which substrings (of any length) occur
significantly moreoften in the white string than
in the black string?
Why is the virus to the left resistant to my
drug, and the one to the right not?
Science fiction
38Subgroup Discovery
- How to find comprehensible subgroups in large
amounts of data? - As an example subtypes in complex diseases.
- Different types of input.
39(No Transcript)
40(No Transcript)
41(No Transcript)
42Grand Challenges
43Robosail
44(No Transcript)
45(No Transcript)
46(No Transcript)
47the Hollandse Brug
48Intelligente Brug
49Intelligent Bridge
- Ultimate goal is to have early warnings
- Open environment for experimentation
- Large effort
- Getting project money
- Placing sensors
- Establishing connectivity
- Data Management
- Dealing with problems like radar
- Platform for education
50Bridge Sensors
51Main Challenges in InfraWatch
- Data management
- gt 5 Gb of data per day (without video)
- datawarehousing on-site vs. off-site storage
analysis - Multi-modal data
- Different resolutions
- sensor 100 Hz, video 30 Hz, weather 0.1 Hz
- Stream mining
- continuous vs. discrete streams (events)
- Physical models
- Weigh in motion (WIM)
- Practical problems with sensors
- noise, sensor failure, sensor drift (wear of
bridge and sensor)
52Data Management Architecture
- We developed a data management architecture to
interface with BigGrid clusters
53(No Transcript)
54Sensor Viewer
- We constructed a mediaplayer to view the data
over time.
55Controlled Experiment 10-axl truck
strain
vibration
56Controlled Experiment traffic jam
strain
57Mining data for dynamical invariants
- By seeking dynamical invariants, we go from
finding just predictive models to finding deeper
conservation laws. - Without any prior knowledge about physics,
kinematics, or geometry, the algorithm discovered
Hamiltonians, Lagrangians, and other laws of
geometric and momentum conservation.
Hod Lipson Cornell
ECML/PKDD 2010
58Equation discovery
- Discovery of laws, expressed in the form of
equations, in collections of measured data. - System identification
- methods work under the assumption that the
structure of the model, i.e., the form of the
equations, is known. - Equation discovery
- aims at identifying both an adequate structure of
the equations and appropriate values of the
parameters.
59Equation Discovery
- Sensors are multivariate time-series
- Idea model dependencies using Equation Discovery
- Provides insight into the sensor network
- Lagramge system (Todorovski and Dzeroski)
- algebraic equations
- differential equations
60Equation Discovery
61Related project
62Graph Mining
Internet Map lumeta.com
Hyves
Protein Interactions genomebiology.com
Friendship Network Moody
63Graph Mining Tasks
- Object-Related
- Link-Based Object Ranking
- Link-Based Object Classification
- Object Clustering (Subgroup Detection)
- Object Identification (Entity Resolution)
- Link-Related
- Link Prediction
- Graph-Related
- Subgraph Discovery
- Graph Classification
- Generative Models for Graphs
64Visualisation
- Intelligent/Intelligible Data Analysis
- Intelligent Methods
- Intelligent Human Interaction
- Intelligible Understandable
- First step
- Visualisation of the data
65DNA Visualisation
- Long patterns over small alphabets are hard to
find - ababababababababababababababababababababababa . .
. - (ab)w
- abbbababaaababbabbbababaaababbabbbababaaababb . .
. - (abbbababaaababb)w
- abaaaababbbbabaaaababbbbabaaaababbbbabaaaabab . .
. - (abaaaa babbbb)w
66(No Transcript)
67DNA Visualisation
- Associate each nucleotide A, C, T, G with a
dimension - Four nucleotides gt four dimensions
- Build a structure in four dimensions
- Project to three dimensions
68DNA Visualisation
- Expectation
- A non-predictable walk for information rich parts
of the DNA - A true random walk for random parts
- Lines (or approximate lines) for repeating parts
of the DNA - Large identical substrings in the DNA can easily
detected
69DNA Visualisation
- Select four three-dimensional vectors.
- The vectors should be of comparable length
- The four vectors should add up to 0
- Every subset of three vectors should be
independent.
70DNA Visualisation
71The first 160,000 nucleotides of the human
Y-chromosome
72The first 160,000 nucleotides of the human
Y-chromosome
73(No Transcript)
7440,000100,000 of the chromosome 1 (human)
75Algorithms Cluster _at_ LIACS
76(No Transcript)