44 research outputs found

    A Study on Multiple Document Summarization Systems

    No full text
    為了幫助線上使用者可以從網際網路上迅速有效地擷取所需新聞資訊﹐本論文針對多文件文章摘要系統﹐探討相關課題,例如,句子挑選、重複內容偵測與刪除、句子排列等並提相關的解決方法後,提出一個新的摘要系統,其主要由事件分群和摘要產生兩大模組所構成。除了使用傳統的語法屬性﹐例如﹐詞性、詞頻等外﹐還提出資訊詞、事件詞和同指涉鏈等語意屬性來解決相關課題, 針對事件分群模組﹐為了提高分群的效率﹐我們先使用同指涉鍊來產生每個文件的摘要後,再以摘要為對象執行分群處理。另外,除導入了同指涉鏈外﹐並也提出動態閥值和控制辭彙產生等演算法。另一方面﹐針對摘要產生模組﹐為了解決跨文件名詞辭彙不統一和時間指示等問題﹐除利用上述之控制辭彙外﹐提出了使用參照時間的時間標記演算法。為了避免傳統之句子分群所帶來的錯誤﹐潛在語意分析(LSA)被應用於候選句的選擇上。再者﹐為了能含入更多高資訊的句子在摘要中﹐提出了使用事件構成詞和資訊字的文句縮減演算法。同時﹐為了能抽出基本事件以得到事件構成詞﹐導入網頁語料庫之名詞句判別系統(NP-chunker)也被提出。針對候選句在摘要內的排列方式﹐提出了使用句時間的文句排列演算法。最後﹐針對傳統的多文件摘要系統﹐由實驗結果得知本論文的多文件文章摘要系統不論在內容或閱讀性上比傳統的多文件文章摘要系統,在統計檢驗上都有明顯的改善。 另外,為了驗證這些語意屬性的有效性﹐將其運用在多文件的標題產生和多國語多文件文章摘要上,除提出標題重組和不同語言間的文件(文句)比對等相關演算法外,得到令人滿意的結果。再者,為了能跨越人工摘要評估的瓶頸,我們提出導入自動問答系統(Question Answering)以執行自動摘要評估的方法,不論使用小型或大型的語料庫,實驗結果證實自動評估系統在時間上和客觀性上可行性。In order to provide a generic summary to help on-line readers to absorb news information from multiple sources, in this dissertation we study the related issues on the multi-document summarization, e.g., event clustering, sentence selection, redundancy avoidance, sentence ordering and summary evaluation, and focus on two major modules: event clustering and summary generation. Besides using the conventional features, e.g., lexical information or part-of-speech, term frequency, document frequency and paragraph dispersion of a word in a document are used to propose informative words, which can be used to represent the corresponding document. In the event clustering module, to further understand a document we introduce the semantic features, such as event words and co-reference chains. The controlled vocabulary mining from co-reference chains is also proposed to solve the cross document name entity unification issue. Meanwhile, we propose a novel dynamic threshold model to enhance the performance of event clustering. On the other hand, in the summary generation module, we propose a temporal tagger to deal with the temporal resolution and provide sentence dates for sentence ordering. We also introduce the latent semantic analysis (LSA) to tackle the sentence selection issue. On the one hand, to tackle the summary length issue, the sentence reduction algorithm using both event constituent words and informative words is also proposed. Finally, the experimental results on both content and readability for generated multi-document summarization are promising. On the other hand, to investigate the performance of proposed semantic features, the headline generation and multi-lingual multi-document summarization are also studied. Besides, we tackle the automatic evaluation issue on summary evaluation by introducing question answering (QA). Promising results are obtained as well.Abstract i 摘要 iii 誌謝 v Contents vii Illustrations xiii Tables xv 1. Introduction 1.1. Document Summarization 1 1.2. Headline Generation 4 1.3. Multi-lingual Multi-document Summarization 4 1.4. Summary Evaluation 6 1.5. The Event Words and Co-reference Chains 7 1.6. The Goal of the Study 10 2. Multi-document Summarization Using Informative Words 2.1. System Architecture 13 2.2. Issues of Basic Multi-document Summarization System 14 2.3. Generating Summaries with Informative words 14 2.4. Experiment 16 2.4.1. Experimental Results 16 2.4.2. Observation 18 2.5. Discussion 19 3. Evaluation Model Using Question Answering 3.1. Modeling using Question Answering 21 3.2. Evaluation 22 3.2.1. Data Set and Evaluation Method 23 3.2.2. Experimental Results and Observation 24 3.3. Experiments using Large Documents and Results 24 3.3.1. Data Set 24 3.3.2. Experimental Results 25 3.4. Discussion 26 4. Headline Generation 4.1. Introduction 27 4.2. Selection of Informative Words 28 4.2.1. Paragraph Dispersion 29 4.2.2. Informative Words 29 4.3. Headline Generation Using Informative Words 30 4.3.1. Bag Generation Method Using Informative Words 31 4.3.2. Sentence Selection Using Statistical Information and Density 33 4.4. Evaluation 34 4.4.1. Evaluation and Method 34 4.4.2. Results 35 4.5. Discussion 36 5. Clustering and Visualization in a Multi-lingual Multi-document Summarization System 5.1. Introduction 39 5.2. Basic Architecture 40 5.3. Similarity Measurement 41 5.3.1. Methods 41 5.3.2. Experiments 43 5.4. Event Clustering 45 5.4.1. Clustering Models 45 5.4.2. Experiments 46 5.5. Sentence Clustering 48 5.5.1. Clustering Models 48 5.5.2. Experiments 50 5.6. Visualization 52 5.6.1. Focusing Model 52 5.6.2. Browsing Model 53 5.7. Discussion 54 6. Multi-document Summarization Using both Informative Words and Knowledge Mining from Co-reference Chains 6.1. Introduction 57 6.2. System Architecture 58 6.3. Document Summarization Using Co-reference Chains 60 6.4. Creating Controlled Vocabulary from Individual Co-reference Chains 62 6.4.1. Normalized Chain Edit Distance 63 6.4.2. Creating Controlled Vocabulary 65 6.4.3. Evaluation 66 6.4.3.1. Data Set 67 6.4.3.2. Experimental Results 67 6.5. Event Clustering 68 6.6. Experimental Results 70 6.6.1. Data Sets 70 6.6.2. Evaluation Metrics 71 6.6.3. Experimental Results 72 6.7. Experiments Using Co-reference Chains from Co-reference Resolution System 74 6.7.1. Flow of a Chinese Co-reference Resolution System 75 6.7.2. Experimental Results of Using Noisy Co-Reference Chains 77 6.7.3. Co-reference Chains Filter 78 6.7.4. Performance of Event Clustering Using Clearer Co-Reference Chains 81 6.8. Discussion 82 7. Event-based Summary Generation 7.1. Introduction 85 7.1.1. Similarity Model 85 7.1.2. Sentence Extraction and Ordering 87 7.1.3. Experimental Results 87 7.1.4. Discussion 88 7.2. Processing Chinese Temporal Expression 90 7.2.1. Representation of Time and Date 90 7.2.2. Temporal Resolution Using Focus Time and Co-Reference Chains 91 7.2.3. Experiments 93 7.2.4. Discussion 94 7.3. System Architecture of News Summarizer 95 7.4. Event Extraction and NP-Chunker 96 7.4.1. NP-Chunker Using Significance Estimation Function and Web Corpora 97 7.4.1.1. Observation 98 7.4.1.2. NP-Chunker using Web Corpora and Association Rules 101 7.4.1.3. Experiment 102 7.4.2. Event Extraction 104 7.5. Sentence Selection 104 7.5.1. Latent semantic Analysis 105 7.5.2. Sentence Extraction Using Latent Semantic Analysis 106 7.6. Summary Generation Using Sentence Date 107 7.6.1. Sentence Reduction Using Both Informative Words and Event Constituent Words 107 7.6.2. Summary Generation Using Sentence Data 108 7.7. Experiment 109 7.7.1. Data Set and Evaluation Metrics 109 7.7.2. Experimental Results 109 7.8. Discussion 113 8. Conclusions and Future Works 8.1. Achievements 115 8.1.1. Event Clustering 117 8.1.2. Summary Generation 117 8.2. Future Work 119 References Appendices Appendix A The Evaluation File for Headline Generation 135 Appendix B An Example of Chi-square Test for a Term Pair 136 Appendix C Controlled Vocabulary Before/After Employing Chain Filter 137 Appendix D Example of 8 Type Generated Summaries 13

    Children’s Information Seeking Behavior: A Case Study of Book Locating Applying Augmented Reality

    No full text
    本研究之主要目的為了解目前兒童尋書定位所產生之問題與擴增實境對於兒童在尋書定位上的幫助,進一步設計出一套應用擴增實境之兒童尋書定位的雛型系統,再藉由此系統了解兒童之尋書定位的資訊尋求行為。 本研究採實驗研究法,並把兒童分為實驗組與控制組。實驗組又分為二組,為採用擴增實境尋書與智慧手錶尋書,而控制組則採用自行查找為主並以手冊導覽為輔的方式。尋書定位活動結束後,所有組別之受試者須均填寫任務完成表及問卷。參與本研究的受試者總計有76位,填寫問卷者總計有96位,皆就讀國小一至四年級的兒童,而受訪者總計有44位,就讀國小三至四年級的兒童。訪談過程中以半結構訪談蒐集資料,並藉此了解其之資訊尋求行為。 研究發現擴增實境確實能幫助兒童在尋書定位上,且無論在正確率或花費時間皆略勝一籌。本研究從填寫問卷者與受訪者中分析兒童資訊尋求行為,發現其尋找資訊之動機以主動居多,而在尋找資訊之管道為圖書館,家人為第一優先之求助對象,而圖書館只是解決問題之配角,且其認為在圖書館尋找想要之資料是有點困難,其不完全了解中國圖書分類法之意義,其到校外圖書館之目的為借課外讀物,其最常到書架上一本一本的尋找圖書,也完全不了解國立公共資訊圖書館之兒童館的排架方式。 本研究發現受試者對於索書號、中國圖書分類法及圖書館書本如何排架不甚了解,建議製作教學影片或歌曲,好讓兒童用唱跳學習的方式了解索書號或中國圖書分類法等。經這次實驗後發現公共圖書館適合應用擴增實境在兒童尋書上,建議未來可以應用尋書、數位閱讀或圖書館導覽等。The main purpose of this study is to identify with the problems from the book-locating and understand children find books with the help of augmented reality, further, design a prototype system for children''s book locating applying augmented reality and use this system to understand children’s information seeking behavior. This study used experimental research and divided children into experimental and control groups. Control group has to find books on its own and supplemented by manual, while the experimental group used AR technology and smart watch to find books. After the activity is over, all participants must complete the mission completion form and questionnaire. Researcher interviewed children after the experiment. During the interview, the semi-structured interviews were used to collect information and to learn about their information seeking behavior. The study found that the accuracy of finding books and spend time on finding books by the control group are lower than that of the experimental group, which means augmented reality can really help children in finding a book. This study analyzes children''s information seeking behaviors from the questionnaires and respondents, and finds that children’s motivation to find information is mostly active. Children’s access for finding information is the library, and the family is the first priority for help. Children think that it’s a bit difficult to solve their problem and find the information which they want in the library. Children don’t fully understand the meaning of the Chinese Book Classification Numbers. Children go to library’s purpose is to borrow books. Children mostly find a book one by one on the bookshelf. Children don’t understand how National Library of Public Information of Children’s center arranges the shelving. This study found that the participants didn’t know much about the Call Number, Chinese Book Classification Numbers. This study recommended making videos or songs so that the children can learn the Call Number, Chinese Book Classification Numbers by means of singing and learning. After this experiment, it was found that the public library is suitable for applying augmented reality in children''s book finding. This study suggested that book finding, digital reading or library navigation can be applied in the future.第一章 緒論 1 第一節 研究背景與動機 2 第二節 研究目的 3 第三節 研究問題 3 第四節 研究方法 4 第五節 研究範圍與限制 4 第六節 名詞解釋 5 第二章 文獻探討 7 第一節 資訊需求 7 第二節 資訊尋求行為 12 第三節 圖書館導覽與室內定位 17 第四節 兒童資訊需求、資訊尋求行為與圖書館導覽 21 第五節 擴增實境 26 第六節 小結 33 第三章 研究設計與實施 35 第一節 研究架構 35 第二節 研究方法 36 第三節 研究場所與研究對象 37 第四節 實驗設計 39 第五節 研究工具 48 第六節 實驗流程 48 第七節 研究流程 52 第四章 研究結果與分析 55 第一節 受試者、受訪者與填寫問卷者之背景資料分析 55 第二節 兒童尋書之正確率分析 57 第三節 兒童尋書之花費時間分析 58 第四節 兒童資訊尋求行為訪談結果 59 第五節 兒童問卷調查結果 79 第六節 比較與綜合討論 97 第五章 結論與建議 117 第一節 研究結論 117 第二節 研究建議 120 第三節 後續研究建議 121 參考文獻 122 附錄一 兒童資訊需求及資訊尋求行為調查問卷 127 附錄二 兒童家長/監護人同意書 131 附錄三 應用擴增實境於兒童資訊尋求行為之研究訪談大綱 13
    corecore