Text Use in Online Dating Profiles - PowerPoint PPT Presentation

About This Presentation
Title:

Text Use in Online Dating Profiles

Description:

An exploratory exercise: can we use the text someone provides in ... The profile text in Yahoo! Personals doesn't seem as thoughtful as profiles on Match.com ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 24
Provided by: christ9
Category:

less

Transcript and Presenter's Notes

Title: Text Use in Online Dating Profiles


1
Text Use in Online Dating Profiles
  • James Fung Christo Sims
  • ANLP Final Presentation
  • Instructor Marti Hearst
  • 12.04.06

2
Overview
3
  • Goal
  • An exploratory exercise can we use the text
    someone provides in their dating profile to
    assign them to various classes?

4
  • The Text
  • Can't wait to get to know you
  • Nice, warm and sweet, as most of my friends
    would describe me. I love to laugh all the time.
    I have a strong passion towards life even through
    little things. I tend to be quiet in a large
    group but generally great with one on one basis.
    I am ambitious about love and romance. And I am
    very respectful of the needs and wants of other
    people. Life is a beautiful journey. I am seeking
    someone who would appreciate the value of life,
    family, have a warm heart and nice peronality to
    share this journey with. If you are that person,
    I can't wait to get to know you.

(female, asian, college grad)
5
Possible Classes
  • Education
  • Gender
  • Attend Services
  • Income
  • Ethnicity
  • Marital Status
  • Want kids
  • Others
  • Astrology?

6
Approach
7
Scraping Profiles
  • Yahoo! Personals
  • 200 Male seeking Female
  • 200 Female seeking Male
  • Within 50 Miles of San Francisco
  • Ages 25-35

8
Feature Extraction (Python)
  • Token frequency
  • Words TF, TF.IDF
  • Bigrams
  • Weighted headlines
  • Readability measures
  • Characters, syllables, words, complex words,
    sentences
  • Ratios of the above
  • Gunning-Fog and six others

9
Feature Selection Classification (Weka)
  • Use Wekas built in feature selection tools
  • Chi-Squared, Information Gain
  • Subset Eval (not working well with most of the
    classes)
  • Explore a variety of classification algorithms,
    for a variety of possible classes
  • Multinomial Naïve Bayes
  • K-Nearest Neighbors
  • Decision Tree
  • Support Vector Machines

10
Preliminary Results
11
Preliminary Results
  • Able to beat a naïve baseline in a few cases,
    usually where there are only two or three
    possible classification categories
  • Gender 69 Accuracy
  • Category Instances
  • Women seeking a man 196
  • Man seeking a woman 200 (51)
  • Want (more) kids 65 Accuracy
  • Category Instances
  • Yes 202 (62.3)
  • Not sure 105
  • No 17

12
Preliminary Results
  • More difficult with more classification
    categories
  • Education 47.4 Accuracy
  • Category Instances
  • Post-Graduate 94
  • College Grad 175 (44)
  • Some College 86
  • High School Grad 13
  • Some High School 3
  • Income (61 null reply)
  • Employment Status (75 Full-time)
  • Political Views
  • Attend Services
  • Ethnicity

13
Preliminary Results (cont.)
  • Some interesting statistics about feature
    probability for a given class (from multinomial
    Bayes output)
  • Gender
  • man - over 2x as likely in womens profile
  • sense - over 2x as likely in womans profile
  • honest - over 2x as likely in female profile
  • independent - over 3x as likely in female
    profile
  • loving - over 3x as likely in female profile
  • crazy - over 4x as likely in male profile
  • company - over 3x as likely in male profile
  • friendship - over 2x as likely in female
    profile
  • me laugh - almost 4x as likely in female
    profile
  • great sense - over 6x as likely in female
    profile

14
Preliminary Results (cont.)
  • Some interesting statistics about features (from
    multinomial bayes)
  • Want (more) kids
  • caring - over 3x as likely in the yes than
    the not sure class
  • heart - over 2x as likely in the yes than the
    not sure class
  • sometimes - over 2x as likely in the not sure
    than the yes class
  • beautiful - 2x as likely in yes than not
    sure
  • real - 2x as likely in not sure than yes
  • dancing - 2x as likely in the not sure than
    yes class
  • games - almost 2x as likely in the not sure
    than yes class

15
Challenges
16
Not Enough Instances
  • For most classes, we dont have enough instances
    for meaningful training
  • Education
  • Some College 87
  • College Grad 176
  • Post-Graduate 97
  • High School Grad 14
  • Some High School 3
  • Ethnicity
  • Hispanic/Latino 33
  • Caucasian (white) 202
  • Asian 74
  • Inter-racial 14
  • African American (black) 38
  • Other 13
  • Pacific Islander 7
  • Native American 1
  • East Indian 8

17
Features Arent Working
  • Weka identifies few relevant features
  • Subset Eval selects subsets of size 1-3
  • Difficult to overcome strong a priori
    probability

PG CG SC HSG SHS
Post-Graduate 0 94 0 0 0
College Grad 0 175 0 0 0
Some College 0 86 0 0 0
High School Grad 0 12 0 0 1
Some High School 0 2 0 0 1
18
Additional Challenges
  • I am 33old woman from Ireland living here for a
    few years Love this country and love the out
    doors, favourite thing is mountain biking and
    hiking tooSan Francisco has so much to offer,
    nice restaurants which i love Thai food and so
    many live music shows which i love to out and
    listen every month

No punctuation
19
Additional Challenges (cont.)
  • I am into swimming, sunrises, Vinyasa Yoga at
    the Loft, cafes, people watching, warm drinks in
    the morning, laughing, crying, feeling all of it,
    freshly squeezed juice, tennis, painting,
    spirals, Abraham Hicks, Life as Art, singing,
    swinging, sushi, backgammon, remembering my
    dreams, warm weather, soft textures, calligraphy,
    episopalian upbringing gone buddhist tendencies,
    handmade paper, dancing, fire, telling stories

Lists, not sentences
20
Additional Challenges (cont.)
  • I work alot but in my free time i love to play a
    round of golf and spend time out with my dog. I
    love going to the beach with him or going to the
    park and just chillin out. At night i love goin
    out with friend and having a few drinks.

Lack of complex words
21
Additional Challenges
  • Scraping profiles requires a user login
  • Easy in PHP, not in Python
  • Have to save profiles by hand, limits corpus size
  • The profile text in Yahoo! Personals doesnt seem
    as thoughtful as profiles on Match.com
  • Shorter profile text
  • Spam?
  • How dedicated are the participants?

22
Where were headed
23
Future Work (cont.)
  • Need more profiles!
  • PHP
  • manually saved
  • Different features
  • Use of capitalization emphasis, grammar
  • Tailored features
  • More token features
  • I go to church I am very sincere in my faith
    and my striving to become more Like Jesus. By
    know means am I perfect, however, NEW MERCIES
    EVERYDAY! I am a very real and straight forward
    person, however,HUMBLE to God's word and voice in
    my life. Always looking to HIM for my direction
    and HE is my SOURCE
Write a Comment
User Comments (0)
About PowerShow.com