An Annotation Platform for Any Kinds of Digital Contents

PDF
Katsuhiko KAJI
Graduate School of Information Science, Nagoya University
Katashi NAGAO
Center for Information Media Studies, Nagoya University

1 Introduction

Recently, there are several researches on annotation for digital contents applications. Their researches give suggestions about the necessity of content semantics for realising advanced applications. Additionally, content structures and its human interpretation is often regarded as content semantics.

Though there are several annotation formats, much of them have strong restrictions about content type. Moreover, several formats are not well suited to descrive content structures and human interpretation. That is why there are few applications based on multi-kinds of contents.

We have been developing an annotation platform to deal with content structures and human interpretations for any kinds of contents named "Annphony." In this paper, we focused on data format of annotation and its schema.

2 Annphony

Annphony is a platform to create/retrieve/use annotations without hunging up the kinds of contents. By using the platform, user can develop annotation editor and collect annotations easily. Furthermore, collected annotations are used for applications.

To deal with contents structures and human interpretations on any kinds of contents through the Internet, we have to solve following three points: 1. content segmentation, 2. describing flexible relationship between the segments, 3. easy definition of annotations. In our platform, these points are solved by following method.

2.1 Content Segmentation

XPointer is a URI format to indicate arbitary segment of XML document. It is represented as a URI of XML document and a path of paticular nodes in the document. Though, it is not exist that a data format to indicate segment for any kinds of contents like XPointer. Then, we propose ElementPointer as a data format to indicate segment of any contents. It is expressed as follows:

[Content URI]#epointer([Schema URI]([arg1,arg2...]))

ElementPointer is similar to XPointer format. It is a URI format. Schema URI and its following arguments means paticular segment of the content. Schema is described as RDFS(RDF Schema) that is a schema language of RDF. Following example is a conclete one of ElementPointer.

http://domain1/picture.jpg#epointer(http://domain2/rect.rdfs#rect(10px,30px,10px,20px))

In this example, rectangle area of image contents is represented. In the rdfs schema, a definition that rectangle is constructed by X-coordinate, Y-coordinate, width and height. Machine can recognize which content and segment is indicated by reading ElementPointer.

2.2 Making relations

In Annphony platform, annotation should express a relationship between content segments and multiple human interpretations. Therefore relationship between contents and annotations would be complex graph structure. Then we apply RDF as a base format of Annphony annotation because RDF can express resources relationship as directed graph structure. Though, RDF format has two problem to express contents structures and multiple human interpretations. First, multiple subjects do not expressed in RDF. It complicates describing content structure such as grouping structure. Second, RDF deal with strict information so that single fact is described to a resource by ordinary. On the other hand, multiple human interpretation will exist in relation to same resource.

An Example of Annotation for Multiple Target

Fugure1: An Example of Annotation for Multiple Target

In Annphony platform, annotaion is described like figure . In the example, subjects are represented in tag and tag. tag is defined as a collection tag in RDF. Annphony annotations have their own IDs. It enables us to associate multiple annotations with paticular content. Additionally, every annotation can be associated to a annotation. It is for additional information to human interpretation.

2.3 Annotation Schema

To describe structural/interpretational annotation for many kinds of contents, various types of annotation should be defined and its definition also should be shared. Annphony platform supports annotation schema written in RDFS as annotation definition. Root annotation defines basic properties such as creation date and annotator URI. New definition can be describe by extending existing definition.

An Example of Annotation Schema

Fugure2: An Example of Annotation Schema

Annotation definition consists of an Annotation Class and a number of Annotation Properties. Annotation Class express which annotation definition is extended. Annotation Property consists of domain Annotation Class and its data type. Figure represents impression annotation class. In the class, impression property is available as string type. Because the class extends root annotation, several annotation properties defined in root annotation is also available.

New definition is available after registration to Annphony platform. At the registration time, several kinds of information such as definition date, creator and application examples are annotated to the schema for retrieval and utilization. Any user can retrieve annotation definition and define new annotation. It helps effective annotation sharing.

3 The Expected Effect

Unified annotation format enables us to deal with strucure and human interpretation of any kinds of contents. For example, we can associate linguistical annotaitons annotation for telop information in video content.

Furthermore, we think about mix media playlist. Ordinaly, playlist is consist of music only. If elements of playlist are consist of multiple kind of contents, we would be able to realise broad contents recommendation. Consequently, annphony platform enables us to achieve advanced multi content application.

4 Future Works

In this paper, we mentioned about data formats of annotation/schema to deal with contents structures and human interpretations. Future works follows: Naturally, annotation platforms are dispersed. We have to consider dispersed environment of annphony. Additionally, we expect to develop actual applications dealing with multi-kinds of contents.