Title: WordsEye: Create 3D scenes using simple language
1WordsEye Create 3D scenes using simple language
The very humongous silver sphere is fifty feet
above the ground. The silver castle is in the
sphere. The castle is 80 feet wide. The ground is
black. The sky is partly cloudy.
2Why is it hard to create 3D graphics?
3The tools are complex
4Too much detail
5Involves training, artistic skill, and expense
6Underlying causes
- GUI paradigm - bottleneck
- Navigating menus, dialogs, modes, selection
- Direct manipulation
- Rigid modes/paths of expression
- Detail oriented
- Complex tools -- manual control of all details
- Build content from scratch
7WordsEye Goals
- Use Language
- No GUI - Just describe it!
- Language naturally describes constraints
- Incorporate pre-made 3D and 2D libraries
- Using language is fun and stimulates imagination
- Democratize 3D Graphics
- No entry barrier - no special skill or training
required - Low overhead fast, immediate
- Enable novel applications
8A tiny grey manatee is in the aquarium. It is
facing right. The manatee is six inches below the
top of the aquarium. The ground is tile. There is
a large brick wall behind the aquarium.
9A silver head of time is on the grassy ground.
The blossom is next to the head. the blossom is
in the ground. the green light is three feet
above the blossom. the yellow light is 3 feet
above the head. The large wasp is behind the
blossom. the wasp is facing the head.
10The humongous white shiny bear is on the American
mountain range. The mountain range is 100 feet
tall. The ground is water. The sky is partly
cloudy. The airplane is 90 feet in front of the
nose of the bear. The airplane is facing right.
11A microphone is in front of a clown. The
microphone is three feet above the ground. The
microphone is facing the clown. A brick wall is
behind the clown. The light is on the ground and
in front of the clown.
12(No Transcript)
13Mary uses the crossbow. She rides the horse by
the store. The store is under the large willow.
The small allosaurus is in front of the horse.
The dinosaur faces Mary. The gigantic teacup is
in front of the store. The gigantic mushroom is
in the teacup. The castle is to the right of the
store.
14Web Interface preview mode
15Web Interface with render
16Initial Version - research prototype
- Developed at ATT Labs
- Graphics Mirai 3D animation system on Windows NT
- Church Tagger, Collins Parser on Linux
- WordNet (http//wordnet.princeton.edu/)
- Viewpoint 3D model library
- WordsEye code in Allegro Common Lisp
- Siggraph paper August 2001
17New Version
- Formed Semantic Light LLC - 2003
- Rewrote software from scratch
- Linux and CMUCL
- Custom Parser/Tagger
- OpenGL for 3D preview display
- Radiance Renderer
- ImageMagic, Gimp for 2D
- Web service/application (www.wordseye.com)
- Gallery/Forum/E-Cards/PIctureBooks/2D effects
- Scalable web architecture
18WordsEye Overview
- Linguistic Analysis
- Parsing, create semantic representation
- Interpretation
- Add implicit objects, relations
- Resolve semantics and references
- Depiction
- Database of 3D objects, poses, textures
- Depiction rules generate graphical constraints
- Apply constraints to create scene
19Linguistic Analysis
- Tag part-of-speech
- Parse
- Generate semantic representation
- Semantic functions for verbs and prepositions
- WordNet-like dictionary for nouns
- Anaphora resolution
20Example John said that the cat is on the table.
21Parse tree for John said that the cat was on the
table.
22Nouns Hierarchical Dictionary
23Semantic Representation for John said that the
blue cat was on the table.
- 1. Object mr-happy (John)
- 2. Object cat-vp39798 (cat)
- 3. Object table-vp6204 (table)
- 4. Action say
- subject ltelement 1gt
- direct-object ltelements 2,3,5,6gt
- tense PAST
- 5. Attribute blue
- object ltelement 2gt
- 6. Spatial-Relation on
- figure ltelement 2gt
- ground ltelement 3gt
24WordNet problems
- Inheritance is more functional than lexical
- Terrace is a plateau
- Crossing Guard is a traffic cop
- Bellybutton is a point
- Lack of multiple inheritance at non-leaf nodes
- "ceramic-ware" is grouped under "utensil" and has
"earthenware", etc under it. But there are no
dishes, plates, under it because those are
categorized elsewhere under "tableware" - Lacks relations other than ISA. Thesaurus vs
dictionary. - Snowball made-of snow
- Italian resident-of Italy
- Cluttered with obscure words and word senses
- Spoon as a type of golf club
25Indexical Reference Three dogs are on the table.
The first dog is blue. The first dog is 5 feet
tall. The second dog is red. The third dog is
purple.
26Anaphora resolution The duck is in the sea. It
is upside down. The sea is shiny and transparent.
The ground is invisible. The apple is 3 inches
below the duck. It is in front of the duck. The
yellow illuminator is 3 feet above the apple. The
cyan illuminator is 6 inches to the left of it.
The magenta illuminator is 6 inches to the right
of it. It is partly cloudy.
27Interpretation
- Interpret semantic representation
- Object selection
- Resolve semantic relations/properties based on
object types - Answer Who? What? When? Where? How?
- Disambiguate/normalize relations and actions
- Identify and resolve references to implicit
objects
28When object is missing or doesn't exist . . .
29Semantic Resolution of Of
30Object attributes examples (modify versus
selection)
31Implicit objects references
- Mary rode by the store. Her motorcycle was red.
- Verb resolution Identify implicit vehicle
- Functional properties of objects
- Reference
- Motorcycle matches the vehicle
- Her matches with Mary
32Implicit Reference Mary rode by the store. Her
motorcycle was red.
33Depiction
- 3D object and image database
- Graphical constraints
- Spatial relations
- Attributes
- Posing
- Shape/Topology changes
- Depiction process
343D Object Database
- 2,000 3D polygonal objects
- Augmented with
- Spatial tags (top surface, base, cup, push
handle, wall, stem, enclosure) - Skeletons
- Default size, orientation
- Functional properties (vehicle, weapon . . .)
- Placement/attribute conventions
35Over 2000 3D Objects
36Over 10,000 images and textures
37Spatial Tags
38Spatial Tags
39Spatial Tags
40Spatial Tags
41Stem in Cup The daisy is in the test tube.
42Enclosure and top surface The bird is in the
bird cage. The bird cage is on the chair.
43On Wall(s) The couch is against the wood wall.
The window is on the wall. The window is next to
the couch. the door is 2 feet to the right of the
window. the man is next to the couch. The animal
wall is to the right of the wood wall. The animal
wall is in front of the wood wall. The animal
wall is facing left. The walls are on the huge
floor. The zebra skin coffee table is two feet in
front of the couch. The lamp is on the table. The
floor is shiny.
44Spatial Relations
- Relative positions
- On, under, in, below, off, onto, over, above . .
. - Distance
- Sub-region positioning
- Left, middle, corner,right, center, top, front,
back - Orientation
- facing (object, left, right, front, back, east,
west . . .) - Time-of-day relations
45Attributes
- Size
- height, width, depth
- Aspect ratio (flat, wide, thin . . .)
- Surface attributes
- Texture database
- Color, Texture, Opacity, reflectivity
- Applied to objects or textures themselves
- Brightness (for lights)
46Attributes The orange battleship is on the brick
cow. The battleship is 3 feet long.
47Time of day cloudiness
48Time of day lighting
49Poses
- Represent actions
- Database of 500 human poses
- Grips
- Usage (specialized/generic)
- Standalone
- Merge poses (upper/lower body, hands)
- Gives wide variety by mixnmatch
- Dynamic posing/IK
50Poses
51Poses
52Combined poses Mary rides the bicycle. She plays
the trumpet.
53Inverse Kinematics (IK) Mary pushes the lawn
mower. The lawnmower is 5 feet tall. The cat is 5
feet behind Mary. The cat is 10 feet tall.
54The Broadway Boogie Woogie vase is on the Richard
Sproat coffee table. The table is in front of the
brick wall. The van Gogh picture is on the wall.
The Matisse sofa is next to the table. Mary is
sitting on the sofa. She is playing the violin.
She is wearing a straw hat.
55Shape Changes
- Deformations
- Facial expressions
- Happy, angry, sad, confused . . . mixtures
- Combined with poses
- Topological changes
- Slicing
56Facial Expressions
57The rose is in the vase. The vase is on the half
dog.
58Depiction Process
- Given a semantic representation
- Generate graphical constraints
- Handle implicit and conflicting constraints.
- Generate 3d scene from constraints
- Add environment, lights, camera
- Render scene
59Example Generate constraints for kick
- Case1 No path or recipient Direct object is
large - Pose Actor in kick pose
- Position Actor directly behind direct object
- Orientation Actor facing direct object
- Case2 No path or recipient Direct object is
small - Pose Actor in kick pose
- Position Direct object above foot
- Case3 Path and Recipient
- Poserelations . . . (some tentative)
60Some varieties of kick
Case1 John kicked the pickup truck
Case3 John kicked the ball to the cat on the
skateboard
Case2John kicked the football
61Implicit Constraint. The vase is on the
nightstand. The lamp is next to the vase.
62Figurative Metaphorical Depiction
- Textualization
- Conventional Icons and emblems
- Literalization
- Characterization
- Personification
- Functionalization
63Textualization The cat is facing the wall.
64Conventional Icons The blue daisy is not in the
army boot.
65Literalization Life is a bowl of cherries.
66Characterization The policeman ran by the
parking meter
67Functionalization The hippo flies over the church
68Future Work
- Support verbs and poses (FrameNet?)
- Exploit context
- Use more world knowledge
- Compound objects, environments, situations
- Handle more complex, natural text
- Ambiguity issues
- Handle object parts
- Add more 2D/3D content (including user uploadable
3D objects) - More complex spatial constraints
- Physics, animation, sound, and speech
69FrameNet Frame Relations
(http//framenet.icsi.berkeley.edu/)
70FrameNet Annotations
71FrameNet Frame Elements
- Core vs Peripheral
- Inheritance
- Renaming (eg. agent -gt helper)
72Pragmatic Ambiguity The lamp is next to the vase
on the nightstand . . .
73Syntactic Ambiguity Prepositional phrase
attachment
John looks at the cat on the skateboard.
74Applications
- Online communications Electronic postcards,
visual chat/IM, social networks - Gaming, virtual environments
- Art Storytelling/comic books/art
- Education (ESL, reading, disabled learning,
graphics arts) - Graphics tool (e.g., for PowerPoint)
- Embedded in toys
75Storytelling The stagecoach is in front of the
old west hotel. Mary is next to the stagecoach.
She plays the guitar. Edward exercises in front
of the stagecoach. The large sunflower is to the
left of the stagecoach.
761st grade homework The duck sat on a hen the
hen sat on a pig...
77Conclusion
- New approach to scene generation
- Low overhead (skill, training . . .)
- Immediacy
- Usable with minimal hardware text or speech
input device and display screen. - Work is ongoing
- Available as experimental web service
78Related Work
- Adorni, Di Manzo, Giunchiglia, 1984
- Put Clay and Wilhelms, 1996
- PAR Badler et al., 2000
- CarSim Dupuy et al., 2000
- SHRDLU Winograd, 1972
- More at www.semanticlight.com
79Bloopers John said the cat is on the table
80Bloopers Mary says the cat is blue.
81Bloopers John wears the axe. He plays the violin.
82Bloopers Happy John holds the red shark
83Bloopers Jack carried the television
84Web Interface - Entry Page (www.wordseye.com)
- Registration
- Login
- Learn more
- Example pictures
85Web Interface - Public Gallery
86Web Interface - Add Comments to Picture
87Web Interface - Link Pictures into Stories Games
88The tall granite mountain range is 300 feet
wide. The enormous umbrella is on the mountain
range. The gray elephant is under the
umbrella. The chicken cube is 6 feet to the right
of the gray elephant. The cube is 5 feet tall.
The cube is on the mountain range. A clown is on
the elephant. The large sewing machine is on the
cube. A die is on the clown. It is 3 feet tall.
89(No Transcript)
90(No Transcript)