A Study on Quotation of Web Documents Using Reading Annotation and its Applications

Ryosuke HAYASHI

Graduate School of Information Science, Nagoya University

Abstract

In this thesis, we propose a new mechanism of quotation of Web documents and its applications.In our mechanism, we define a quotation information as an annotation to the documents described in an XML format. The quotation information includes the pointer to the internal element of the quoted document, the pointer to the internal element of the quoting document, and the attribute concerning the purpose of the quotation.And we represent the quotation information as a bidirectional hyperlink that connects the internal elements of the quoted document with that of the quoting document. Our mechanism has an advantage for document authors who quote online documents, for readers, and for researchers who perform citation analysis.

Also, we propose a method to facilitate the user to quote the document using reading annotation - metadata that we associate some attributes with any parts of the documents during reading them. Our proposed system records users' reading annotations and allows the user to retrieve them and quote parts of the document easily when the user writes a new document.We compared our method with a general retrieval method in some experiments.The results showed that our method was more effective than the general method in retrieval time.

In addition, we propose a method to similar documents based on co-citation extracted from quotation annotations - a set of quotation information accumulated in our system.Most of previous methods that have been proposed in citation analysis consider that all citations have the same similarities.Any semantic information on the quotation were not reflected in these similarities.We propose a method to consider semantic information on the quotation using our quotation annotation.Semantic information on the quotation include a distance between quotation parts and purposes of quotation.We experimented to show how effective our method is.Concretely, we classified co-citations based on their semantic information,and compared similarities between documents that have the relationship defined by a co-citation.The results of our experiment showed that our method was effective than the method that employed conventional measures.

1

2

2.1

Fugure1:

2.2

2.3

Fugure2:

Fugure3:

2.4

Fugure4:

2.5

3

3.1

3.1.1

Fugure5:

3.1.2

3.2

3.2.1

3.2.2

3.3

3.3.1

Fugure6:

3.3.2

Fugure7:

3.3.2.1

3.3.2.2

3.3.2.3

Fugure8:

Fugure9:

3.3.3

3.3.3.4

Fugure10:

3.3.3.5

Fugure11:

3.3.3.6

Fugure12:

Fugure13:

Fugure14:

3.4

4

4.1

Fugure15:

Fugure16:

4.2

4.3

Fugure17:

Fugure18:

4.4

Fugure19:

Fugure20:

4.5

5

5.1

5.1.1

Fugure21:

5.1.2

Fugure22:

Fugure23:

Fugure24:

Fugure25:

5.2

5.2.1

Fugure26:

Fugure27:

5.2.2

Fugure28:

Fugure29:

Fugure30:

5.3

6

6.1

6.1.1

6.1.2

6.1.3

6.1.4

6.1.5

6.1.6

6.2

Fugure31:

7

7.1

7.2

7.2.1

7.2.2

7.2.3

7.2.4

7.2.5

References

[1] W3C, XML Path Language, http://www.w3.org/TR/xpath.html,

[2] W3C, XML Pointer Language, http://www.w3.org/TR/WD-xptr,

[3] B. N. Schilit, G. Golovchinsky, M. Price, Beyond Paper: supporting active reading with freeform digital ink annotations, In Proceedings of CHI ’98, 1998

[4] H. Small, Co-citation in the scientific literature: a new measure of the relationship between two documents, Journal of the American Society for lnformation Science, Vol.24, pp.265-269, 1973

[5] , CiteSeer, http://citeseer.ist.psu.edu/,

[6] M. Weinstock, Citation indexes, Encyclopedia of Library and Infomation Science, Vol.5, pp.16-41, 1971

[7] 難波英嗣, 神門典子, 奥村学, 論文間の参照情報を考慮した関連論文の組織化, 情報処理学会論文誌, Vol.42, No.11, pp.2640-2649, 2001

[8] テッド・ネルソン, リテラリーマシン?ハイパーテキスト原論, アスキー, 1994

[9] M. M. Kessler, Bibliographic coupling between scientific papers, Ammerican Documentation, Vol.14, pp.10-25, 1963

[10] 江藤正己, 引用箇所の間隔に基づいた共引用の検討, 電子情報通信学会第18回データ工学ワークショップ, 2007

[11] E. Garfield, The history and meaning of the journal impact factor, Journal of the American Medical Association, Vol.295, No.1, pp.90-93, 2006

[12] , EndNote, http://www.endnote.com/,

[13] C. C. Marshall, Annotation: from paper books to the digital library, In Proceedings of Digital Libraries’97, 1997

[14] 伊藤清美, 柳沢昌義, 赤堀侃司, Web教材への書き込みを共有する学習環境WebMemoシステム, 電子情報通信学会技術研究報告, Vol.100, No.467, pp.35-40, 2003

[15] 松岡有希, 坂本竜基, 中田豊久, 伊藤禎宣, 武田英明, 論文概要に対する色付きアンダーライン付与システムの運用・分析, 電子情報通信学会第17回データ工学ワークショップ, 2006

[16] 藤田節子, 電子文献の参照をめぐる問題点, 情報と科学の技術, Vol.51, No.4, pp.239-244, 2001

[17] , Sen, http://ultimania.org/sen/,

[18] , MeCab, http://mecab.sourceforge.net/,

[19] , goo辞書, http://dictionary.goo.ne.jp/index.html,

[20] , CiNii, http://ci.nii.ac.jp/,

[21] , GoogleScholar, http://scholar.google.com/,

[22] G. Salton, C. Buckley, Term-weighting approaches in automatic text retrieval, Information Processing Management, Vol.24, No.5, pp.513-523, 1988

[23] 増田智樹, 山本大介, 大平茂輝, 長尾確, オンラインアノテーションを利用したビデオシーン検索, 人工知能学会第21回全国大会, 2007

[24] K. Nagao, Digital Content Annotation and Transcoding, Artech House Publishers, 2003

[25] 伊藤禎宣，角康之，間瀬健二，國藤進, SmartCourier:アノテーションを介した適応的情報共有環境, 人工知能学会論文誌, Vol.17, No.3, 2002

[26] 坂本竜基，中田豊久，伊藤禎宣，松岡有希，小暮潔, イロノミー：色付き傍線によるWeb文章を対象としたフォークソノミー, 人工知能学会第20回全国大会, 2006

[27] J. Kahan, M. Koivunen, E. P. Hommeaux, R. R. Swick, Annotea: An Open RDF Infrastructure for Shared Web Annotations, In Proceedings of the WWW 10th International Conference, 2001

[28] M. Roscheisen, C. Mogensen, T. Winograd, Shared Web Annotations as a Platform for Third-Party Value-Added, Information Providers : Architecture, Protocols, and Usage Examples, Technical Report CSDTR/DLTR, 1994

[29] W3C, The Semantic Web Community Portal, http://www.semanticweb.org/,

[30] 齋藤孝, 三色ボールペン情報活用術, 角川書店, 2003

[31] 石戸谷顕太朗, 増田智樹, 山本大介, 長尾確, 引用の構造化によるマルチメディアコンテンツの意味的統合支援システム, 情報処理学会第70回全国大会, 2008