Title: Send Me a Disk, Ok?
1Send Me a Disk, Ok?
- -Sharing Genealogical Information With Your
Relatives
Beau Sharbroughbeau_at_sharbrough.netPO Box
3170Grapevine TX 76099-3170
2 Thank you.
- To the CIG. Im grateful for the invitation to be
here. - To Russ and Birdie Holsclaw. They took care of me
the past three days, sharing their home, their
cars, their community, and their son Will. - To Roger Ebert. I was starting to worry about my
weight.
3General Topics
- Five steps to understanding what theyre saying
- Discussion of software developers methods of
merging files - Significance of GENTECH Genealogical Data Model
4Five Steps to Combining Your Research
5Step 1. Determine What Form the Data Is in.
- Which program do they use?
- What type of disk drives do they have?
- What general field usage have they adopted?
6Step 2. Exchange Pedigree and Group Sheet
Examples.
- Look for detail, accuracy, thoroughness.
- Are there full or partial dates?
- Do the citations for US places include counties?
Streets? Cemetery names? - Are nicknames used in place of real names?
- Are sources cited?
7Step 3. Agree on Usage of Fields.
- RESIdes or ADDRess?
- Will you both use CHRIsten?
- How will you document sources?
- How will you document the research of others?
8Step 4. Convert Your Information. Nobody Can
Avoid This Step.
- Agree with your relative what information you
will convert and how - Normally, this means saying things like, "Ill
put in the counties after I get it from you"
9Step 5. Exchange Only the Individuals You Want.
- NEVER just import the whole family on top of the
information you already have. - Computer routines for merging data are improving,
but not complete or effective yet.
10There Are No Effective Routines for Merging Data
Sets at Present.
- The problems of
- Identity
- merging methods and
- data formats
- are too new for generalized solutions to be
available in the marketplace - Good theoretical solutions dont even exist
11Merging Data Sets
12Customers who just assume that someone will know
what they want and have it ready when they
recognize that need had parents that spoilt them
rotten.
13WHY?
- Family history record-keeping is increasingly
becoming a digital process. - Linking ones information to the information
already gathered by other family members and
researchers is becoming more and more common.
14We Have to Put Our Information Together Somehow
15A Few Basics
- Computer programs store the data that we enter in
FILES - Each genealogical program stores the information
in its own way, called a PROPRIETARY FORMAT - Most programs can also read and write in GEDCOM
format
16A word about exchange
A
B
Export Routine
Import Routine
Possible Intermediate Format
17A Few Basics
- Merging is copying
- From a SOURCE
- To a TARGET
- Sometimes called the SURVIVING INFORMATION
18MERGING DATABASES
- merging the files into a single one
- merging the duplicated individuals
- merging the rest
- sources
- repositories
19The database merging process is evolving
- More input sources
- More freedom to choose the features you like.
- GenBridge
20Freedom has a price
- Enter a name
- Program wont break it up
- Enter a place
- Program wont break it up
21Legacy Trick
- You can open two family files at the same time,
and copy and paste a person and their descendents
from one set into another, like grafting a tree
branch from one tree to another.
22Making automatic citations
- Legacy individual level
- TMG and FTM field level
23The Current Merging Art
- Merging Databases
- Merging Individuals
- Merging the Rest
- Spotting Duplicates
24Merging Individuals
- If you want to merge duplicates, most programs
will make you choose which tags to keep and
throw the rest away.
25MERGING INDIVIDUALSThe old way
- Copy the info
- Delete one of the people
- Type the info into the new one
26MERGING INDIVIDUALSThe middle way
- View both persons
- Select what you want
- The program does the rest
27MERGING INDIVIDUALSThe future way
- Computer spots likely dups
- Recommends them to you
- You control the process
28Merge Sources for most popular software
- Their own files
- GEDCOM
- In some cases, files from other programs
- In some cases, CD and internet databases
Still, it ends up being like pouring two cans of
paint together.
29Merging the Rest
- Most programs dont even import and merge place
tables, source tables, etc. - I dont know of any program that recognizes the
same source in two separate datasets.
30Merging The Rest
- source citations, master sources, repositories,
and places - Most programs just combine the tables, creating
duplicates - LG will combine a source, with exact spelling
- UFT and FTM merge master sources
- PAF and TMG merge master sources and repositories
31Limits to Storage
- Some programs have really limited storage, and
only store conclusions - If you have two birth dates, they put your
favorite one in and throw the other away, or
store it in a note. - Some programs have a lot of storage, and let you
make your own tags such as executrix.
32SPOTTING DUPLICATES
- Some programs have merging routines based on
- Soundex
- Spelling of name
- Birth date
- TMG and Legacy use a large variety of match
choices
33Spotting duplicates
- Soundex for names (AQ)
- Exact spelling or soundex (PAF 3.0)
- Exact spelling and exact birth date (FTM)
- Many name compares (TMG and UFT)
- Soundex surname and user choice of of letters
in first name (LG) - Warn if duplicate name entered (most)
34Merging tips
- Match on parent soundex reduces false positives
(Gaylon Findlay) - If your program wont let you choose initials,
but has a number-of-letters, try that with 1. - Beware of people about whom you know very little.
- Beware of blank dates.
35Signs that you can merge better today than you
could before
- More formats allowed
- Easier individual merging
- Identifying routines are becoming more
sophisticated - More storage of conflicting data allowed
- More variety in the software marketplace
36Signs that we arent getting there yet
- No formal studies on known datasets to quantify
false positives and false negatives - No implementation of information sciences in
commercial products - No implementation of AI in commercial products
- No formal discussion of algorithms
37MERGING SUMMARY
- Users can merge from a wider variety of data
formats than in the past. - Users can merge individuals more easily.
38MERGING SUMMARY
- Routines to help identify candidates for merging
are becoming quite sophisticated. - More programs store conflicting data today.
39Its also encouraging that they are not all doing
the same thing.
- The resultant diversity and innovation offer us
more chances to connect Where-Weve-Been to
Where-Were-Going than weve ever had before.
40The GENTECH Genealogical Data Model
- Purpose To define and communicate the
meanings of family history data.
41Genealogical Data Model
- Request for Comment
- Project by genealogists and developers to
describe genealogy processes. - Describes the relationships between the various
kinds of family history information. - Overview of what genealogists do
- Not a genealogy program.
- Not a database design
- Not a document saying what genealogists SHOULD do.
42Every genealogist says that they do research
differently.
- The GDM describes the process that they do
differently.
43Stop Starting with Conclusions
- Dont start with conclusions, start with evidence.
44Some features of Evidence in the GDM
- REPOSITORY
- SOURCE
- REPRESENTATION TYPE
- REPRESENTATION
- CITATION
45CONCLUSIONS
- ASSERTIONS about
- PERSONA
- EVENTS
- CHARACTERISTICS
- GROUPS
- ASSERTIONS
46XML is eXtended Markup Language
- ltTITLEgtThe Title of My Booklt/TITLEgt
- ltNAMEgtJonathan Sharbroughlt/NAMEgt
- ltBIRTHDATEgtcirca 1734lt/BIRTHDATEgt
- ltBIRTH PLACENorth Carolina DATEcirca 1734gt
47Future digital research
- programs publish pedigrees and registers in some
XML format - repositories publish records in the same format
- local links, remote sources
- external authorities
48A new culture
- most quoted sites - authorities
- many link sites - hubs
- links define culture, tribe, families
49The digital future of family history is a virtual
library where it is ...
- Easy to find the conclusions
- Easy to identify the evidence
- Easy to identify the thought process that links
them.
50Missing ingredients
- agreement on LexML standard
- wide acceptance of LexML standard
- wide implementation of LexML
51Send Me A Disk, Ok?
- Dos and Donts
- Merging Technique
- GENTECH GDM
Beau SharbroughPO Box 3019Grapevine TX
76099-3019beau_at_sharbrough.netwww.sharbrough.net