Title: Wordsmith Tools
1Wordsmith Tools
- Stella E. O. Tagnin - USP
- Corpus Linguistics, Translation and Terminology
- New Technologies in Translation - CAPES
- Universitat Rovira i Virgili-Universidade de São
Paulo - Tarragona
- July 8-11, 2008
2- How to use Wordsmith Tools to investigate a
corpus
3First
- Download demo version of Wordsmith Tools 5.0
from Mike Scotts site - http//www.lexically.net/wordsmith/version5/index.
html
4- Name Tarragona and USP
- gt Other Details
- gt Registration SA00.3461.2978.3904.6880.9VVB
- gt
- gt When "Updating from Demo", please paste these
details in - gt EXACTLY as you see them here.
- gt Please see "readme.txt" for any further
details. - gt
- gt
- gt --
- gt Mike Scott
- gt
5 WordSmith Tools
- WordList
- S General Statistics
- F Frequency
- A Alphabetical
- KeyWords
- Study Corpus vs Reference Corpus
- Concord
- KWIC Key Word In Context
- Collocates
- Clusters
6WordList
- S General Statistics overview of corpus and
texts - F Frequency most frequent words may point to
topic - A Alphabetical make lemmatizing easier
7(No Transcript)
8WordList - Statistics
- Identifying peculiarities
- corpus (Overall)
- each text
9(No Transcript)
10Frequency WordList
- Hint as to topic
- Survey of most recurrent words in text/corpus
11(No Transcript)
12Alphabetical WordList
- Spotting words
- Lemmatizing word forms
13(No Transcript)
14(No Transcript)
15KeyWords
- Identifying prevailing vocabulary
- Study Corpus vs Reference Corpus
16Keywords
N WORD FREQ. WCUPING.LST FREQ. REFENG2.LST
KEYNESS P 1 CUP 1.024 0,77 1 3.291,6 0,000000 2
WORLD 1.197 0,90 301 0,06 2.496,9 0,000000 3 TEAM
575 0,43 48 1.538,4 0,000000 4 GAME 486 0,36 22 1
.396,6 0,000000 5 HIS 714 0,53 257 0,05 1.296,2 0
,000000 6 GERMANY 435 0,33 14 1.284,9 0,000000 7 S
OCCER 374 0,28 0 1.206,4 0,000000 8 HE 778 0,58 42
9 0,08 1.130,6 0,000000 9 ITALY 332 0,25 5 1.021,
0 0,000000 10 SAID 670 0,50 343 0,06 1.017,6 0,00
0000 11 WAS 892 0,67 716 0,13 987,2 0,000000 12 P
LAYERS 337 0,25 15 969,6 0,000000 13 GOAL 352 0,26
51 851,9 0,000000 14 BALL 260 0,19 2 815,9 0,0000
00 15 IN 3.214 2,40 7.019 1,31 761,0 0,000000 16
COACH 229 0,17 0 738,5 0,000000 17 TOURNAMENT 205
0,15 0 661,0 0,000000 18 SPORTS 234 0,18 13 658,5
0,000000 19 PLAY 264 0,20 37 643,5 0,000000 20 FR
ANCE 208 0,16 6 618,7 0,000000 21 FANS 193 0,14 1
610,2 0,000000 22 MATCH 265 0,20 49 604,4 0,000000
23 MINUTE 206 0,15 9 593,5 0,000000 24 BRAZIL 209
0,16 19 551,6 0,000000 25 WIN 193 0,14 15 521,2 0
,000000
17 Comparing 2 WordLists
- Positive keywords (occurring vocabulary)
- Negative keywords (NON-occurring vocabulary)
18... and vocabulary that does NOT occur
19(No Transcript)
20Compiling a Glossary
- Selecting Terms
- Keywords term candidates (terminology)
- Concord - context
- Collocates
- Clusters multiword combinations, not
necessarily terms or phrases
21 Concord
- KWIC Key Word In Context
- Collocates
- Clusters
22Identifying patterns
- Context
- Concordance lines
- Lexical patterns collocations
- Grammatical patterns colligations
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29Collocates
- Position of most frequent co-occurring words
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41Compiling a Glossary
- Selection of Terms
- Keywords
- Clusters
42Clusters de 3 palavras
3-word clusters
43WordList
- With more than one term
- gt Settings
- gt Tab List
- gt WordList Tab
- gt Clusters
- gt Activated
44(No Transcript)
45WordList 2 words
46WordList 3 words
47How to ignore undesired text
- By tagging
- Title
- Subtitle
- Figure
- Date
- URL
- etc.
48(No Transcript)
49(No Transcript)
50Adjusting Settings
- Controller
- ? Settings
- ? Adjust Settings
- ? Only part of file
51(No Transcript)
52(No Transcript)
53(No Transcript)
54(No Transcript)
55(No Transcript)
56(No Transcript)
57Viewer Aligner
- Views all texts
- Aligns parallel texts
58(No Transcript)
59(No Transcript)
60(No Transcript)
61(No Transcript)
62Another aligner
- VisualTCA - Sentence alignment visualization
- http//www.nilc.icmc.usp.br/nilc/tools/pagina-visu
altca/visualtca/tca.htm
63AntConc 3.2.2
- http//www.antlab.sci.waseda.ac.jp/software.html