Wordsmith Tools - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

Wordsmith Tools

Description:

Wordsmith Tools Stella E. O. Tagnin - USP Corpus Linguistics, Translation and Terminology New Technologies in Translation - CAPES Universitat Rovira i Virgili ... – PowerPoint PPT presentation

Number of Views:268
Avg rating:3.0/5.0
Slides: 64
Provided by: Stel66
Category:

less

Transcript and Presenter's Notes

Title: Wordsmith Tools


1
Wordsmith Tools
  • Stella E. O. Tagnin - USP
  • Corpus Linguistics, Translation and Terminology
  • New Technologies in Translation - CAPES
  • Universitat Rovira i Virgili-Universidade de São
    Paulo
  • Tarragona
  • July 8-11, 2008

2
  • How to use Wordsmith Tools to investigate a
    corpus

3
First
  • Download demo version of Wordsmith Tools 5.0
    from Mike Scotts site
  • http//www.lexically.net/wordsmith/version5/index.
    html

4
  • Name Tarragona and USP
  • gt Other Details
  • gt Registration SA00.3461.2978.3904.6880.9VVB
  • gt
  • gt When "Updating from Demo", please paste these
    details in
  • gt EXACTLY as you see them here.
  • gt Please see "readme.txt" for any further
    details.
  • gt
  • gt
  • gt --
  • gt Mike Scott
  • gt

5
WordSmith Tools
  • WordList
  • S General Statistics
  • F Frequency
  • A Alphabetical
  • KeyWords
  • Study Corpus vs Reference Corpus
  • Concord
  • KWIC Key Word In Context
  • Collocates
  • Clusters

6
WordList
  • S General Statistics overview of corpus and
    texts
  • F Frequency most frequent words may point to
    topic
  • A Alphabetical make lemmatizing easier

7
(No Transcript)
8
WordList - Statistics
  • Identifying peculiarities
  • corpus (Overall)
  • each text

9
(No Transcript)
10
Frequency WordList
  • Hint as to topic
  • Survey of most recurrent words in text/corpus

11
(No Transcript)
12
Alphabetical WordList
  • Spotting words
  • Lemmatizing word forms

13
(No Transcript)
14
(No Transcript)
15
KeyWords
  • Identifying prevailing vocabulary
  • Study Corpus vs Reference Corpus

16
Keywords
N WORD FREQ. WCUPING.LST FREQ. REFENG2.LST
KEYNESS P 1 CUP 1.024 0,77 1 3.291,6 0,000000 2
WORLD 1.197 0,90 301 0,06 2.496,9 0,000000 3 TEAM
575 0,43 48 1.538,4 0,000000 4 GAME 486 0,36 22 1
.396,6 0,000000 5 HIS 714 0,53 257 0,05 1.296,2 0
,000000 6 GERMANY 435 0,33 14 1.284,9 0,000000 7 S
OCCER 374 0,28 0 1.206,4 0,000000 8 HE 778 0,58 42
9 0,08 1.130,6 0,000000 9 ITALY 332 0,25 5 1.021,
0 0,000000 10 SAID 670 0,50 343 0,06 1.017,6 0,00
0000 11 WAS 892 0,67 716 0,13 987,2 0,000000 12 P
LAYERS 337 0,25 15 969,6 0,000000 13 GOAL 352 0,26
51 851,9 0,000000 14 BALL 260 0,19 2 815,9 0,0000
00 15 IN 3.214 2,40 7.019 1,31 761,0 0,000000 16
COACH 229 0,17 0 738,5 0,000000 17 TOURNAMENT 205
0,15 0 661,0 0,000000 18 SPORTS 234 0,18 13 658,5
0,000000 19 PLAY 264 0,20 37 643,5 0,000000 20 FR
ANCE 208 0,16 6 618,7 0,000000 21 FANS 193 0,14 1
610,2 0,000000 22 MATCH 265 0,20 49 604,4 0,000000
23 MINUTE 206 0,15 9 593,5 0,000000 24 BRAZIL 209
0,16 19 551,6 0,000000 25 WIN 193 0,14 15 521,2 0
,000000
17
Comparing 2 WordLists
  • Positive keywords (occurring vocabulary)
  • Negative keywords (NON-occurring vocabulary)

18
... and vocabulary that does NOT occur
  • Negative keywords

19
(No Transcript)
20
Compiling a Glossary
  • Selecting Terms
  • Keywords term candidates (terminology)
  • Concord - context
  • Collocates
  • Clusters multiword combinations, not
    necessarily terms or phrases

21

Concord
  • KWIC Key Word In Context
  • Collocates
  • Clusters

22
Identifying patterns
  • Context
  • Concordance lines
  • Lexical patterns collocations
  • Grammatical patterns colligations

23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
Collocates
  • Position of most frequent co-occurring words

30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
Compiling a Glossary
  • Selection of Terms
  • Keywords
  • Clusters

42
Clusters de 3 palavras
3-word clusters
43
WordList
  • With more than one term
  • gt Settings
  • gt Tab List
  • gt WordList Tab
  • gt Clusters
  • gt Activated

44
(No Transcript)
45
WordList 2 words
46
WordList 3 words
47
How to ignore undesired text
  • By tagging
  • Title
  • Subtitle
  • Figure
  • Date
  • URL
  • etc.

48
(No Transcript)
49
(No Transcript)
50
Adjusting Settings
  • Controller
  • ? Settings
  • ? Adjust Settings
  • ? Only part of file

51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
Viewer Aligner
  • Views all texts
  • Aligns parallel texts

58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
Another aligner
  • VisualTCA - Sentence alignment visualization
  • http//www.nilc.icmc.usp.br/nilc/tools/pagina-visu
    altca/visualtca/tca.htm

63
AntConc 3.2.2
  • http//www.antlab.sci.waseda.ac.jp/software.html
Write a Comment
User Comments (0)
About PowerShow.com