Title: DIMACS Special Focus on Computational and Mathematical Epidemiology
1DIMACS Special Focus on Computational and
Mathematical Epidemiology
2The Role of the Mathematical Sciences in
Epidemiology
- Emergence of new infectious diseases
- Lyme disease
- HIV/AIDS
- Hepatitis C
- West Nile Virus
- Evolution of antibiotic-resistant strains
- tuberculosis
- pneumonia
- gonorrhea
3- Great concern about the deliberate introduction
of diseases by bioterrorists - anthrax
- smallpox
- plague
- Understanding infectious systems requires being
able to reason about highly complex biological
systems, with hundreds of demographic and
epidemiological variables. - Intuition alone is insufficient to fully
understand the dynamics of such systems.
4- Experimentation or field trials are often
prohibitively expensive or unethical and do not
always lead to fundamental understanding. - Therefore, mathematical modeling becomes an
important experimental and analytical tool.
5- Mathematical models have become important tools
in analyzing the spread and control of infectious
diseases, especially when combined with powerful,
modern computer methods for analyzing and/or
simulating the models.
6What Can Math Models Do For Us?
7What Can Math Models Do For Us?
- Sharpen our understanding of fundamental
processes - Compare alternative policies and interventions
- Help make decisions.
- Prepare responses to bioterrorist attacks.
- Provide a guide for training exercises and
scenario development. - Guide risk assessment.
- Predict future trends.
8- In order for math. and CS to become more
effectively utilized, we need to - make better use of existing tools
9- In order for math. and CS to become more
effectively utilized, we need to - develop new tools
- establish working partnerships between
mathematical scientists and biological
scientists - introduce the two communities to each others
problems, language, and tools - .
10- introduce outstanding junior researchers from
both sides to the issues, problems, and
challenges of mathematical and computational
epidemiology
11- involve biological and mathematical scientists
together to define the agenda and develop the
tools of this field. - These are all fundamental goals of this special
focus.
12Methods of Math. and Comp. Epi.
- Math. models of infectious diseases go back to
Daniel Bernoullis mathematical analysis of
smallpox in 1760.
13- Hundreds of math. models since have
- highlighted concepts like core population in
STDs
14- Made explicit concepts such as herd immunity for
vaccination policies
15- Led to insights about drug resistance, rate of
spread of infection, epidemic trends, effects of
different kinds of treatments.
16- The size and overwhelming complexity of modern
epidemiological problems calls for new
approaches. - New methods are needed for dealing with
- dynamics of multiple interacting strains of
viruses through construction and simulation of
dynamic models - spatial spread of disease through pattern
analysis and simulation - early detection of emerging diseases or
bioterrorist acts through rapidly-responding
surveillance systems.
17Statistical Methods
- Long used in epidemiology.
- Used to evaluate role of chance and confounding
associations. - Used to ferret out sources of systematic error in
observations. - Role of statistical methods is changing due to
the increasingly huge data sets involved, calling
for new approaches.
18Dynamical Systems
19Dynamical Systems
- Used for modeling host-pathogen systems, phase
transitions when a disease becomes epidemic, etc. - Use difference and differential equations.
- Little systematic effort to apply todays
powerful computational tools to these dynamical
systems and few computer scientists are involved. - We hope to change this situation.
20Probabilistic Methods
- Important role of stochastic processes, random
walk models, percolation theory, Markov chain
Monte Carlo methods.
21Probabilistic Methods Continued
- Computational methods for simulating stochastic
processes in complex spatial environments or on
large networks have started to enable us to
simulate more and more complex biological
interactions.
22Probabilistic Methods Continued
- However, few mathematicians and computer
scientists have been involved in efforts to bring
the power of modern computational methods to bear.
23Discrete Math. and Theoretical Computer Science
- Many fields of science, in particular molecular
biology, have made extensive use of DM broadly
defined.
24Discrete Math. and Theoretical Computer Science
Contd
- Especially useful have been those tools that make
use of the algorithms, models, and concepts of
TCS. - These tools remain largely unused and unknown in
epidemiology and even mathematical epidemiology.
25DM and TCS Continued
- These tools are made especially relevant to
epidemiology because of - Geographic Information Systems
-
26DM and TCS Continued
- Availability of large and disparate computerized
databases on subjects relating to disease and the
relevance of modern methods of data mining.
27DM and TCS Continued
- The increasing importance of an evolutionary
point of view in epidemiology and the relevance
of DM/TCS methods of phylogenetic tree
reconstruction.
28How does a Special Focus Work?
- Get researchers with different backgrounds and
approaches together. - Stimulate new collaborations.
- Set the agenda for future research.
- Act as a catalyst for new developments at the
interface among disciplines. - DIMACS has been doing this for a long time.
29Components of a Special Focus
- Working Groups
- Tutorials
- Workshops
- Visitor Programs
- Graduate Student Programs
- Postdoc Programs
- Dissemination
30Working Groups
31Working Groups Continued
- Interdisciplinary, international groups of
researchers. - Come together at DIMACS.
- Informal presentations, lots of time for
discussion. - Emphasis on collaboration.
- Return as a full group or in subgroups to pursue
problems/approaches identified in first meeting. - By invitation but contact the organizer.
- Junior researchers welcomed. Nominate them.
32Tutorials
33Tutorials Continued
- Integrate research and education.
- Introduce mathematical scientists to relevant
topics in epidemiology and biology - Introduce epidemiologists and biologists to
relevant methods of math., CS, statistics,
operations research. - Financial support available by application.
34Workshops
35Workshops Continued
- More formal programs.
- Widely publicized.
- One-time programs.
- Some educational component encourage
participation by graduate students tutorials. - Interdisciplinary flavor.
- Can spawn new working groups.
- Financial support available in limited
amountscontact the organizer.
36Visitor Programs
37Visitor Programs Continued
- Interdisciplinary groups of researchers will
return after working group meetings. - Workshop participants can come early or stay
late. - Visits can be arranged independent of workshops
or working group meetings. Contact DIMACS Visitor
Coordinator. - Visits by junior researchers and students will be
encouraged. - We want to make DIMACS a center for collaboration
in mathematical and computational epidemiology
for the next 5 years (and beyond).
38Grad. Student/Postdoc Programs
39Grad. Student/Postdoc Programs
- Each working group, workshop, tutorial will
support students/postdocs. Contact organizer. - Students/postdocs visiting for longer will have a
host/mentor. Contact DIMACS visitor coordinator. - Local graduate students will get involved through
participation in working groups and small
research projects. - We hope to raise funds for postdoctoral fellows
to participate by spending a year or more at
DIMACS.
40Dissemination
- DIMACS technical report series.
- Working group and workshop websites.
- DIMACS book series.
41Working Groups
- WGs on Large Data Sets
- Adverse Event/Disease Reporting, Surveillance
Analysis. - Data Mining and Epidemiology.
- WGs on Analogies between Computers and Humans
- Analogies between Computer Viruses/Immune Systems
and Human Viruses/Immune Systems - Distributed Computing, Social Networks, and
Disease Spread Processes
42WGs on Methods/Tools of TCS
- Phylogenetic Trees and Rapidly Evolving Diseases
- Order-Theoretic Aspects of Epidemiology
- WGs on Computational Methods for Analyzing Large
Models for Spread/Control of Disease - Spatio-temporal and Network Modeling of Diseases
- Methodologies for Comparing Vaccination
Strategies
43WGs on Mathematical Sciences Methodologies
- Mathematical Models and Defense Against
Bioterrorism - Predictive Methodologies for Infectious Diseases
- Statistical, Mathematical, and Modeling Issues in
the Analysis of Marine Diseases - WG on Noninfectious Diseases
- Computational Biology of Tumor Progression
44Workshops on Modeling of Infectious Diseases
- The Pathogenesis of Infectious Diseases
- Models/Methodological Problems of Botanical
Epidemiology - WS on Modeling of Non-Infectious Diseases
- Disease Clusters
45Workshops on Evolution and Epidemiology
- Genetics and Evolution of Pathogens
- The Epidemiology and Evolution of Influenza
- The Evolution and Control of Drug Resistance
- Models of Co-Evolution of Hosts and Pathogens
46Workshops on Methodological Issues
- Capture-recapture Models in Epidemiology
- Spatial Epidemiology and Geographic Information
Systems - Ecologic Inference
- Combinatorial Group Testing
- Other Topics
- Suggestions are encouraged.
47Tutorials
- Dynamic Models of Epidemiological Problems
- The Foundations of Molecular Genetics for
Non-Biologists - Introduction to Epidemiological Studies
- DM and TCS for Epidemiologists and Biologists
- Promising Statistical Methods for Epidemiology
for Epidemiologists and Biologists
48Challenges for Discrete Math and Theoretical
Computer Science
49What are DM and TCS?
- DM deals with
- arrangements
- designs
- codes
- patterns
- schedules
- assignments
50TCS deals with the theory of computer algorithms.
- During the first 30-40 years of the computer age,
TCS, aided by powerful mathematical methods,
especially DM, probability, and logic, had a
direct impact on technology, by developing
models, data structures, algorithms, and lower
bounds that are now at the core of computing.
51DM and TCS have found extensive use in many areas
of science and public policy, for example in
Molecular Biology. These tools, which seem
especially relevant to problems of epidemiology,
are not well known to those working on public
health problems.
52So How are DM/TCS Relevant to the Fight Against
Disease?
53Detection/Surveillance
- Streaming Data Analysis
- When you only have one shot at the data
- Widely used to detect trends and sound alarms in
applications in telecommunications and finance - ATT uses this to detect fraudulent use of credit
cards or impending billing defaults - Columbia has developed methods for detecting
fraudulent behavior in financial systems - Uses algorithms based in TCS
- Needs modification to apply to disease detection
54- Research Issues
- Modify methods of data collection, transmission,
processing, and visualization - Explore use of decision trees, vector-space
methods, Bayesian and neural nets - How are the results of monitoring systems best
reported and visualized? - To what extent can they incur fast and safe
automated responses? - How are relevant queries best expressed, giving
the user sufficient power while implicitly
restraining him/her from incurring unwanted
computational overhead?
55Cluster Analysis
- Used to extract patterns from complex data
- Application of traditional clustering algorithms
hindered by extreme heterogeneity of the data - Newer clustering methods based on TCS for
clustering heterogeneous data need to be modified
for infectious disease and bioterrorist
applications.
56Visualization
- Large data sets are sometimes best understood by
visualizing them.
57Visualization
- Sheer data sizes require new visualization
regimes, which require suitable external memory
data structures to reorganize tabular data to
facilitate access, usage, and analysis. - Visualization algorithms become harder when data
arises from various sources and each source
contains only partial information.
58Data Cleaning
- Disease detection problem Very dirty data
59Data Cleaning
- Very dirty data due to
- manual entry
- lack of uniform standards for content and formats
- data duplication
- measurement errors
- TCS-based methods of data cleaning
- duplicate removal
- merge purge
- automated detection
60Dealing with Natural Language Reports
- Devise effective methods for translating natural
language input into formats suitable for
analysis. - Develop computationally efficient methods to
provide automated responses consisting of
follow-up questions. - Develop semi-automatic systems to generate
queries based on dynamically changing data.
61Social Networks
- Diseases are often spread through social contact.
- Contact information is often key in controlling
an epidemic, man-made or otherwise. - There is a long history of the use of DM tools in
the study of social networks Social networks as
graphs.
62Spread of Disease through a Network
- Dynamically changing networks discrete times.
- Nodes (individuals) are infected or non-infected
(simplest model). - An individual becomes infected at time t1 if
sufficiently many of its neighbors are infected
at time t. (Threshold model) - Analogy saturation models in economics.
- Analogy spread of opinions through social
networks.
63Complications and Variants
- Infection only with a certain probability.
- Individuals have degrees of immunity and
infection takes place only if sufficiently many
neighbors are infected and degree of immunity is
sufficiently low. - Add recovered category.
- Add levels of infection.
- Markov models.
- Dynamic models on graphs related to neural nets.
64Research Issues
- What sets of vertices have the property that
their infection guarantees the spread of the
disease to x of the vertices? - What vertices need to be vaccinated to make
sure a disease does not spread to more than x of
the vertices? - How do the answers depend upon network structure?
- How do they depend upon choice of threshold?
65These Types of Questions Have Been Studied in
Other Contexts Using DM/TCS
66- Distributed Computing
- Eliminating damage by failed processors -- when a
fault occurs, let a processor change state if a
majority of neighbors are in a different state or
if number is above threshold. - Distributed database management.
- Quorum systems.
- Fault-local mending.
67Spread of Opinion
68Spread of Opinion
- Of relevance to bioterrorism.
- Dynamic models of how opinions spread through
social networks. - Your opinion changes at time t1 if the number of
neighboring vertices with the opposite opinion at
time t exceeds threshold. - Widely studied.
- Relevant variants confidence in your opinion (
immunity) probabilistic change of opinion.
69Evolution
70Evolution
- Models of evolution might shed light on new
strains of infectious agents used by
bioterrorists. - New methods of phylogenetic tree reconstruction
owe a significant amount to modern methods of
DM/TCS. - Phylogenetic analysis might help in
identification of the source of an infectious
agent.
71Some Relevant Tools of DM/TCS
- Information-theoretic bounds on tree
reconstruction methods. - Optimal tree refinement methods.
- Disk-covering methods.
- Maximum parsimony heuristics.
- Nearest-neighbor-joining methods.
- Hybrid methods.
- Methods for finding consensus phylogenies.
72New Challenges for DM/TCS
- Tailoring phylogenetic methods to describe the
idiosyncracies of viral evolution -- going beyond
a binary tree with a small number of
contemporaneous species appearing as leaves. - Dealing with trees of thousands of vertices, many
of high degree. - Making use of data about species at internal
vertices (e.g., when data comes from serial
sampling of patients). - Network representations of evolutionary history -
if recombination has taken place.
73New Challenges for DM/TCS Continued
- Modeling viral evolution by a collection of trees
-- to recognize the quasispecies nature of
viruses. - Devising fast methods to average the quantities
of interest over all likely trees.
74Decision Making/Policy Analysis
75Decision Making/Policy Analysis
- DM/TCS have a close historical connection with
mathematical modeling for decision making and
policy making. - Mathematical models can help us
- understand fundamental processes
- compare alternative policies and interventions
- provide a guide for scenario development
- guide risk assessment
- aid forensic analysis
- predict future trends
76Consensus
- DM/TCS fundamental to theory of group decision
making/consensus - Based on fundamental ideas in theory of voting
and social choice - Key problem combine expert judgments (e.g.,
rankings of alternatives) to make policy
77Consensus Continued
- Prior application to biology (Bioconsensus)
- Find common pattern in library of molecular
sequences - Find consensus phylogeny given alternative
phylogenies - Developing algorithmic view in consensus theory
fast algorithms for finding the consensus policy - Special challenge re bioterrorism/epidemiology
instead of many decision makers and few
candidates, could be few decision makers and
many candidates (lots of different parameters to
modify)
78Decision Science
- Formalizing utilities and costs/benefits.
- Formalizing uncertainty and risk.
- DM/TCS aid in formalizing optimization problems
and solving them maximizing utility, minimizing
pain, - Bringing in DM-based theory of meaningful
statements and meaningful statistics. - Some of these ideas virtually unknown in public
health applications. - Challenges are primarily to apply existing tools
to new applications.
79Game Theory
80Game Theory
- History of use in military decision making
- Relevant to conflicts bioterrorism
- DM/TCS especially relevant to multi-person games
- Of use in allocating scarce resources to
different players or different components of a
comprehensive policy. - New algorithmic point of view in game theory
finding efficient procedures for computing the
winner or the appropriate resource allocation.
81Some Additional Relevant DM/TCS Topics
- Order-Theoretic Concepts
- Relevance of partial orders and lattices.
- The exposure set (set of all subjects whose
exposure levels exceed some threshold) is a
common construction in dimension theory of
partial orders. - Point lattices may be useful for visualizing the
relationships of contigency tables to effect
measures and cut-off choices.
82Combinatorial Group Testing
- Natural or human-induced epidemics might require
us to test samples from large populations at
once. - Combinatorial group testing arose from need for
mathematical methods to test millions of WWII
draftees for syphilis. - Identify all positive cases in large population
by - dividing items into subsets
- testing if subset has at least one positive item
- iterating by dividing into smaller groups.
83Challenges Outside of DM/TCS
Were expecting your input!
84See You at DIMACS