งานนำเสนอกำลังจะดาวน์โหลด โปรดรอ

งานนำเสนอกำลังจะดาวน์โหลด โปรดรอ

Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation Virach Sornlertlamvanich and Thatsanee Charoenporn

งานนำเสนอที่คล้ายกัน


งานนำเสนอเรื่อง: "Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation Virach Sornlertlamvanich and Thatsanee Charoenporn"— ใบสำเนางานนำเสนอ:

1 Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation Virach Sornlertlamvanich and Thatsanee Charoenporn National Electronics and Computer Technology Center, Thailand PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

2 Motivation • Cultural Knowledge Creation • Image as a focal point – Generalization – Concept representation – Language independent symbol – Common understanding • Image and object labeling – Keyword extraction PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

3 Motivation • Cultural Knowledge Creation • Image and object labeling – Keyword and semantic relation extraction – Image as a focal point • Cultural Knowledge Services – Service platform PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

4 3 Steps in Digital Cultural Communication Step 1: Cultural knowledge curation – Reuse – Standardization Step 2: Cultural image annotation – Keyword extraction – Semantic relation acquisition – Image annotation games Step 3: Cultural knowledge service – Cultural knowledge platform for application service development PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

5 CULTURAL KNOWLEDGE CURATION PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

6 Community Co-Creation -Input -GPS data -Tag -Invitation, registration, approval Citation • Museum • Museum archive • Other departments Citation • Museum • Museum archive • Other departments Community Cultural knowledge curation Standardized Annotated Cultural Knowledge Base -Search -Category -Statistics Curation and Presentation Institution Community Co-Creation Cultural Knowledge Base PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

7 Cultural Knowledge Portal Creation PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012  Cultural Personnel/Organization - Artist - Scholar - Religious Monument - Writer/Author - Society/Association - Cultural Network - Cultural Unit  Scope of Collection  Cultural Artifact - Archaeological Objects - Artwork - Visual Art - Book/Press - Audiovisual Media - Utensil - Costume  Way of Life - Ethnic - Religion and Belief - Tradition and Rite - Language and Literature - Local Wisdom - Performing Art and Music  Cultural Site - Archaeological Site - Historical Park - Historical Site - Architecture - Religious Place - Museum - Library - Archive - Monument - Theatre - Tourism spot

8 Cultural Knowledge Portal Creation PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012  Cultural Personnel/Organization - Artist ( 芸術家 ) - Scholar ( 学者 ) - Religious dignitaries ( 宗教要人 ) - Writer/Author ( 作家 ) - Society/Association ( 社会 / 協会 ) - Cultural Network ( 文化ネットワーク ) - Cultural Unit ( 文化機関 )  Scope of Collection  Cultural Artifact - Archaeological Objects ( 骨董品 ) - Artifact ( アーティファクト ) - Visual Art ( 視覚芸術 ) - Book/Press ( 出版物 ) - Audiovisual Media ( 視覚メディア ) - Utensil ( 装備 ) - Costume ( 衣装 )  Way of Life - Ethnic ( 人種 ) - Religion and Belief ( 宗教と信念 ) - Tradition and Rite ( 伝統や儀式 ) - Language and Literature ( 言語と文学 ) - Local Wisdom ( 地域の知恵 ) - Performing Art and Music ( 芸術と音楽 )  Cultural Site - Archaeological Site ( 考古学的資源 ) - Historical Park ( 歴史公園 ) - Historical Site ( 遺跡 ) - Architecture ( 建築 ) - Religious Place ( 宗教的な場所 ) - Museum ( 博物館 ) - Library ( 図書館 ) - Archive ( 書庫 / アーカイブ ) - Monument ( 記念碑 / 記念館 ) - Theatre ( 文化的な会場 ) - Tourism spot ( 観光の名所 )

9 PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012 Cultural Databank

10 PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012 Cultural Databank

11 CULTURAL IMAGE ANNOTATION PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

12 Keyword Extraction • Some keywords are readily available in the set tags, but many of them are still missing. • Our task is to extract those missing keywords from the description and title. PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

13 Keyword Extraction • Some keywords can be linked to external pages, e.g. Wikipedia. • Our task is to find appropriate articles corresponding to those keywords. PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

14 Method for KW Extraction • Chunking model (Uchimoto et al., 2004) for keyword extraction PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

15 Training Data Preparation • Generate a keyword list from tags and titles that are not shorter than 5 characters and not longer than 30 characters • Segment descriptions using a state-of-the-art Thai word segmentation algorithm (Kruengkrai et al., 2009) • Note that the word segmentation algorithm was trained using ORCHID corpus and TCL’s lexicon (contents of ORCHID corpus and our current data are quite different) • Label the segmented descriptions with the keyword list PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

16 Training Data • Description ผ้าซิ่นลายมัดหมี่บ้านปทุมแก้ว เป็นงานฝีมือพื้นบ้าน ….. PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

17 Labeling • Apply BIO tagging – B: beginning position of a keyword – I: intermediate (or end) position of a keyword – O: other words • If several matches are possible, select the longest one (like in the previous example) PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

18 Training Data • Description ผ้าซิ่นลายมัดหมี่บ้านปทุมแก้ว เป็นงานฝีมือพื้นบ้าน ….. • Segmented/Tagged/Labeled Description WordPOS tagLabel ผ้าซิ่น NB-K ลายมัดหมี่บ้านปทุมแก้ว NI-K PO เป็น VO งานฝีมือพื้นบ้าน NO ………..….. PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012 • Keyword List (extracted from tag and title) ….. ผ้า ….. ผ้าซิ่น ผ้าซิ่นลายมัดหมี่บ้านปทุมแก้ว …..

19 Chunking Model PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

20 Preliminary Experiment Result • 3000 examples for training, 500 examples for testing • Based on Margin Infused Relaxed Algorithm (MIRA), Crammer et al., 2005 – Baseline features (Unigram and Bigram) + – 3 character prefix/suffix of current word + – 3 consecutive POS tags • Recall=0.8256, Precision=0.9061, F1= PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

21 Semantic Relation Acquisition • Extract commons syntactic patterns between two nouns • Our task is to acquire triples (e i, r ij, e j ), where – e i and e j are entities (keywords) – r ij is a relationship between them PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

22 •Example Title: วัดทุ่ง Description: วัดทุ่ง มีอายุราว 500 ปี สันนิฐานว่าสร้างขึ้น ในสมัยกรุงสุโขทัย Title: วัดตราชู Description: วัดตราชู สร้างขึ้นในสมัย กรุงศรีอยุธยา ตอนต้น ราว พ. ศ.2076 Title: หลวงพ่อขาว Description: เป็นพระพุทธรูปเก่าแก่เนื้อหินทรายปางสมาธิ ขนาดหน้าตักกว้าง ๒ ศอก ประดิษฐานอยู่ในวิหารวัดหลวงวัดสันนิฐานว่าสร้างขึ้นใน สมัยอยุธยา Title: พระพุทธรูปปางมารวิชัย Description: สร้างขึ้นในสมัยรัตนโกสินทร์ตอนต้น Title: วิหารวัดโยธานิมิต Description: สร้างขึ้นในสมัยพระบาทสมเด็จพระเจ้าตาก สินมหาราช Extract Common Syntactic Pattern of a Predicate between Two Keywords PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012 Anchored keyword Predicate

23 •Example ( วัดทุ่ง, สร้างขึ้นในสมัย, กรุงสุโขทัย ) ( วัดตราชู, สร้างขึ้นในสมัย, กรุงศรีอยุธยาตอนต้น ) ( หลวงพ่อขาว, สร้างขึ้นในสมัย, อยุธยา ) ( พระพุทธรูปปางมารวิชัย, สร้างขึ้นในสมัย, รัตนโกสินทร์ตอนต้น ) ( วิหารวัดโยธานิมิต, สร้างขึ้นในสมัย, พระบาทสมเด็จพระเจ้าตากสินมหาราช ) Extract Common Syntactic Pattern of a Predicate between Two Keywords (e i, BUILT_IN, e j ) PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

24 Extract Common Syntactic Pattern of a Predicate between Two Keywords •Example Title: กระโจมไฟบ้านโรงถ่าน Description: สร้างโดย อพท. เมื่อปี พ. ศ.2550 เป็นท่า เทียบเรีอสำหรับเรือท่องเที่ยว Title: ศาลเจ้าตากสิน วัดบ้านค่าย Description: ศาลปูนขนาดกลาง สร้างโดยพระครูพิพัฒน์ ชยาภรณ์ Title: วัดทุ่งโฮ้งใต้ Description: สร้างขึ้นเมื่อ พ. ศ.2370 จากตำนานเล่าว่า สร้างโดยกลุ่มชาวลาวพวน Title: ศาลพระพรหม Description: ตั้งอยู่บริเวณสวนตุงโคม ตำบลเวียงอำเภอ เมืองเชียงรายจัดสร้างโดยเทศบาลนครเชียงราย Title: วงเวียนนิมิตร Description: วงเวียนนิมิตรหรือวงเวียนม้าน้ำก่อสร้างโดย เทศบาลนครภูเก็ตในปี พ. ศ.2548 PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012 Anchored keyword Predicate

25 Extract Common Syntactic Pattern of a Predicate between Two Keywords •Example ( กระโจมไฟบ้านโรงถ่าน, สร้างโดย, อพท.) ( ศาลเจ้าตากสิน วัดบ้านค่าย, สร้างโดย, พระครู พิพัฒน์ชยาภรณ์ ) ( วัดทุ่งโฮ้งใต้, สร้างโดย, กลุ่มชาวลาวพวน ) ( ศาลพระพรหม, สร้างโดย, เทศบาลนครเชียงราย ) ( วงเวียนนิมิตร, สร้างโดย, เทศบาลนครภูเก็ต ) PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012 (e i, BUILT_BY, e j )

26 ESP Game and Peekaboom proposed by Luis von Ahn, May 25, 2006 by Pete Cashmore • ESP Game – In the ESP Game, the two players are shown an image and asked to enter a word that describes it. The players can’t see each other’s guesses. The aim is to enter the same word as your partner in the shortest possible time. But there’s an ulterior motive here: much of the data is recorded, and could be used to power image search engines in the future. What’s cheaper – paying thousands of Mechanical Turkers to label all the images on the web, or tricking people into doing it for free? • Peekaboom – Peekaboom takes the ESP Game to the next level. Unlike the ESP Game, it’s asymmetrical. To start, one user is shown an image and the other sees an empty black space. The first user is given a word relating to the image, and the aim is to communicate that word to the other player by revealing portions of the image. So if the word is “eye” and the image is a face, you reveal the eye to your partner. But the real aim here is to build a better image search engine: one that could identify individual items within an image. PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

27 ESP Game • Two players are shown an image • asked to enter a word that describes it. • The aim is to enter the same word as your partner in the shortest possible time. TwitterBird To name the image PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012 Angry bird Bird Mohawk

28 Extended ESP Game • Two players are shown an image • asked to enter a word that describes it. • The aim is to enter the same word as your partner in the shortest possible time. Twitter Angry bird Bird Mohawk Bird To name the image AWN PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012 • Synset can be selected if more than one word match a synset. • Once a synset is selected cross language matching can be determined.

29 Peekaboom • One user is shown a named image and show the part of the image according to the name • Another user gives a word relating to the image • The aim is to enter the same word as it is named in the shortest possible time. Bird Squirrel Flying fish To label the object in the image To label the object in the image PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

30 Extended Peekaboom • One user is shown a named image and show the part of the image according to the name • Another user gives a word relating to the image • The aim is to enter the same word as it is named in the shortest possible time. PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012 • A word from the Synset can be matched. • Once a synset is selected cross language matching can be determined. Bird Squirrel Flying fish AWN

31 Demo • ESP-like game – – Play mode • Single player mode: play against history • Two-player mode: guess to match each other • Extended Peekaboom game – – Play mode • Single player mode: play against history • Two-player mode: guess to match each other – For Thai language, use AWN to support synonym, hypernym, hyponym, meronym, and holonym – For other languages, use AWN to support synonym only PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

32 Preliminary Experiment • 18 images played by 19 persons. For each image, we allow 60 seconds to guess a proper word. • AWN can expand the matching in 67 cases or increase 22% of matching ratio. PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012 ExactSynHyperHypoMeroHolo

33 CULTURAL KNOWLEDGE SERVICE PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

34 Title culture Snippet description Tags A, B, C Title culture Snippet description Tags A, B, C Title culture Snippet description Tags A, B, C A B C D Cultural Database Title product Snippet description Tags A, B, C Title product Snippet description Tags A, B, C Title product Snippet description Tags A, B, C Product Database Title shop Snippet description Tags A, B, C Title shop Snippet description Tags A, B, C Title shop Snippet description Tags A, B, C Shop Database Title maker Snippet description Tags A, B, C Title maker Snippet description Tags A, B, C Title maker Snippet description Tags A, B, C Maker Database PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

35 Title product Snippet description Tags A, B, C Title culture Snippet description Tags A, B, C Title culture Snippet description Tags A, B, C Title culture Snippet description Tags A, B, C Title culture Snippet description Tags A, B, C PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012 Title product Snippet description Tags A, B, C To find a related Product from Culture information

36 Title product Snippet description Tags A, B, C Title product Snippet description Tags A, B, C Title product Snippet description Tags A, B, C Title product Snippet description Tags A, B, C PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012 Title culture Snippet description Tags A, B, C Title product Snippet description Tags A, B, C To find the background Culture information from a Product

37 Title product Snippet description Tags A, B, C Title culture Snippet description Tags A, B, C Title culture Snippet description Tags A, B, C Title culture Snippet description Tags A, B, C Title culture Snippet description Tags A, B, C Title product Snippet description Tags A, B, C Title product Snippet description Tags A, B, C Title product Snippet description Tags A, B, C PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012 Title product Snippet description Tags A, B, C Product and Culture information relation

38 Summary • From this ESP-like game, we successfully named the images or at least obtained a list of candidates for labeling the object in the image to be used in the next extended Peekaboom game. • Synonym, hypernym, hyponym, meronym, holonym from AWN can help expanding the matching ratio. • Cross language image labeling is realized by AWN synonym. PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

39 Future Work • Enhancing keyword extraction to find more term candidate for image matching • Call for participation of the extended ESP and Peekaboom games for image labeling PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

40 Framework PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

41 Creating Community Co-Creation Cultural Knowledge Base • Cultural infrastructure for information service and business innovation • Language and media technology for knowledge base creating • Linked data for knowledge base completion • Digital cultural network for knowledge assimilation and business innovation PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

42 Digitized Thailand: The Ultimate Goal • DT is a framework for collaboration in technology and content development • DT is a platform for digital content sharing • Toward creative economy, DT PaaS will be established PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

43 Thank you PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012


ดาวน์โหลด ppt Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation Virach Sornlertlamvanich and Thatsanee Charoenporn

งานนำเสนอที่คล้ายกัน


Ads by Google