Relational and Network Data - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Relational and Network Data

Description:

Repeated data what if you mistype 'Erin'? Wasted space imagine doing this 10,000 times... (Andrews) 7. 1. 1. 0. 0. 0. 0. 0 (Marcus) 6. 0. 0. 1. 0. 1. 1 ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 19
Provided by: mik50
Category:

less

Transcript and Presenter's Notes

Title: Relational and Network Data


1
Relational and Network Data
  • Mike Ryckman

2
Overview
  • Relational Data Concepts
  • Network Analysis Data
  • Events Data

3
Relational vs. Flat Data
  • Academics are used to flat data
  • Relational data is more common elsewhere
  • Flat data Excel/Stata
  • Relational data Collection of related flat
    tables
  • Relational data structures are trickier Well
    look at an example

4
Running Example Legislative Action
  • Say we are interested in studying how legislators
    work together to pass a bill
  • We interview the legislators asking about
    different pieces of legislation and ask who
    worked to pass the bills
  • We also record some secondary data how did they
    help, did the bill pass, etc.
  • This type of analysis could tell us a lot about
    the legislature who are the major players? how
    and when do they work together?

5
Data setup
  • Say you want to record bill participation
  • Excel
  • What if someone participates on two bills?

6
Option 1
  • Repeated data what if you mistype Erin?
  • Wasted space imagine doing this 10,000 times

Option 2 Very Common
  • Bad for searching how do you see who was on a
    specific bill?
  • Bad for combining data how do you count bill
    participation?

7
Relational Data Structures
  • Instead of one table what about using 3?
  • Table 1 holds your people, table 2 holds your
    bills, table 3 is bill participation

8
Advantages
  • By storing data relationally, we can do anything
    we want with it its flexible
  • You minimize data storage and entry one table
    can hold all your information on your legislators
    so you only have to enter it once
  • It is VERY efficient computers (and computer
    nerds) love it
  • You can generate flat files from relational files
    in one step it is difficult to do the opposite

9
Notes
  • You generally need flat data for statistics (this
    is why academics like it)
  • Flat data is GENERALLY less efficient
  • It is better to know what this stuff is then know
    how to use it
  • Remember context choose wisely

10
Network Analysis Data
  • Network data lends itself well to relational data
    structures
  • Generally, we are interested in producing a
    matrix of values

11
Back to Our Example
  • Think in terms of the goal
  • There are a number of matrixes we could create
  • N x N matrix of legislators
  • diagonal the number of bills they worked on
  • off-diagonal number of bills two legislators
    worked on together
  • Legislators x Bills
  • Singular indicators for whether the legislator
    did or did not participate on the bill

12
Two Matrixes
NxN Matrix
Using matrix algebra XX
13
Generating the Table
  • Getting that table is not always easy
  • Most of the time your data will not be structured
    this way
  • Relational databases have a special type of query
    that will create this (a crosstab or transform
    query) but most have limited field counts (about
    250)
  • Other options are available

14
Setting up your data
  • Ultimately, it comes down to how you setup your
    data
  • In a relational db, you can have 3 tables
    legislators, bills and bill participation
  • You could also build the table manually, or use
    some other tool

15
More Notes
  • There are many ways of structuring and using your
    data
  • Some are easier than others depending on the
    situation and your knowledge base
  • Dont get bogged down in the details there are
    always people around who can help you with the
    data

16
Events Data
  • Events data sets (nowadays) are machine coded
  • A computer program reads a news feed and
    determines who, what, when, where and how an
    event occurred
  • The VRA Events data has about 10 million records
    from 1990 2004
  • It includes both domestic and international events

17
Events Have
  • A source and a target (for domestic events source
    target)
  • Level information (government, businesses,
    organizations, groups, etc)
  • A date specific to the day

18
Notes on VRA Events Data
  • It is bias toward the US about 3 million
    records or so have the US in them
  • It covers a HUGE range of activities wars,
    meetings, animal attacks, etc.
  • The data are very flexible and we can create
    matrixes of all types out of it
Write a Comment
User Comments (0)
About PowerShow.com