Übungsblatt 5

Vorlesung „Intelligente“ Systeme im WWW

Sommersemester 2002

 

 

Zum Online Bearbeiten am 10.7.

 

 

Aufgabe 1 (10pt. – also bitte mehr als 3 Sekunden denken)

Finden Sie Beispiele für Strukturelle, Beschreibende und Administrative Metadaten für folgendes Bild:

 

Der kurzsichtige Liebhaber

 

Carl Spitzweg

 

Der kurzsichtige Liebhaber

 

 WV 1043

 

frühe Arbeit

Öl auf Leinwand, 46,0 x 34,0 cm

Museum Georg Schäfer Schweinfurt

 

 

bei Roennefahrt wird dieses Bild als "Die Angebetete" aufgeführt

 

 

 

Aufgabe 2 (10 Pt):

Der International Press Telecommunication Council (IPTC). Hat einen Metadaten-Standard mit gleichem Namen geschaffen, der bei der professionellen Informationsverarbeitung von Bildinformationen (bsp.: in Verlagen) genutzt wird. Er wid auch von Applikationen für die professionelle Bildbearbeitung unterstützt (zum Beispiel sind die Textinformationen in Adobe Photoshop eine Teilmenge der IPTC- Informationen).

 

Ordnen Sie die Metadaten-Attribute den Kategorien Beschreibend, Strukturell und Administrativ zu.

 

Objektname

Überschrift

Bildunterschrift

Besondere Hinweise

Objektbeschreibung

Bildrechte

Quelle

Copyright-Vermerk

Schlagwörter (Menge)

Titel der Bildunterschrift

Autor

IPTC Kategorie

IPTC Erstellungsdatum

IPTC Erstellungszeitpunkt

Freigabedatum

Freigabezeitpunkt

Versanddatum

Versandzeitpunkt

Ort

Staat/Provinz

Land(Code)

Land (Text)

Aufgeber-Code

Erstellungsprogramm

Programmversion

Job-Kennzeichnung

Service-Id

Bearbeitungsstand

Dringlichkeit

Objektzyklus

Referenzdienst

Referenzdatum

Referenznummer

 

Aufgabe 3 (30 Pt):

Für die Beschreibung von Nachrichtenartikeln existieren mehrere Standars, die versuchen

Metadaten zusammen mit den eigentlichen Nachrichten zu erfassen. Beispiele dafür sind NITF (http://www.nitf.org/) und NewsML (http://www.newsml.org/).

 

Erfassen Sie (mit Hilfe des beiliegenden NIFT Tutorial). Metadatenbeschreibungen für den folgenden

Artikel.

 

 

 

 

Weather and Tide Updates for Norfolk

By Alan Karben
NITF Network News Online

The weather was great today in Norfolk, Virginia. Made me want to take out my boat, manufactured by the Acme Boat Company .

Tides in Norfolk are running normal today. This weeks article highlights many of this week's fishing issues, and also presents a reference table of tide times.

The Tides are High

As can be seen from the table below, the shores of Oceanview again present the brightest spots for fishermen and sandcastle-builders alike.

 

today

tide

tomorrow

next day

third day

beach

high

low

in

out

high

low

high

low

high

low

Sunset

30

14

09:23

18:51

28

11

31

12

33

9

Oceanview

31

15

09:25

18:56

26

11

31

11

31

9

Shellfish

29

15

09:25

18:53

26

9

29

11

30

11

Based on these tide tables, I believe you can see that this weekend stands to be an excellent one for small- or large-scale fishing exhibitions.

Photo: Volz

The tides, captured on film late yesterday.

There are many local nooks that fishing fans may want to keep a special eye one.

Happy fishing everybody!

Stewart Klometers contributed to this article.

 

 

 

 

NIFT Elemente:

a

Anchor for hypertext links.

abstract

Story abstract

addressee

Person or organization to whom the postal item is being sent.

alt-code

An alternate symbol for the phrase.

bibliography

Free-form bibliographic data.

block

A group of releated containers.

body

The content portion of the NITF document.

body.content

Actual body content.

body.end

Information at the end of an article body

body.head

Metadata intended to be displayed to the reader.

bq

Block quote.

br

Forced line break.

byline

Container for byline information.

byttl

Byline title. Often contains an organization.

caption

Text for the caption of a table.

care.of

Poste restante.

chron

Date and time.

city

City, town, village, etc.

classifier

Generic holder for metadata. Could be used by researchers and archivists to qualify documents.

col

Column.

colgroup

Column group.

copyrite

Container for copyright information.

copyrite.holder

Copyright holder.

copyrite.year

Copyright year

correction

Correction information.

country

Geographic area with a government.

credit

Names the source of the block quote.

custom-table

A holder for a namespaced XML fragment for custom-tagged data, or for an alternative set of non-parser-breaking content.

datasource

Source of the information grouped in a block element.

date.expire

Date/time at which the document has no validity.

date.issue

Date/time document was issued.

date.release

Date/time document is available to be released.

dateline

Container for dateline information.

dd

Definition data.

del-list

Delivery trail of delivery services.

delivery.office

Postal city or town.

delivery.point

Street, PO Box No.

denom

Fraction denominator.

distributor

Information distributor.

dl

Definition list.

doc-id

Registered identification for document.

doc-scope

Indicates an area where the document may be of interest.

doc.copyright

Copyright information for document header.

doc.rights

Rights information for use of the document.

docdata

Document metadata.

ds

IIM Record 2 dataset information.

dt

Definition term.

du-key

Dynamic Use Key, created daily. Has tree structure indicated by defined form.

ed-msg

Non-publishable editorial message from provider or editor of item.

em

Emphasis.

event

An event.

evloc

Event location.

fixture

Specification for named document, such as Heard on the Street or On Language.

fn

Footnote.

frac

Fraction.

frac-sep

Fraction separator.

from-src

Delivery service identifier.

function

Role played by a person.

head

Holds metadata about the document as a whole.

hedline

Container for main headline and subheadlines.

hl1

Headline 1 (main-headline).

hl2

Headline 2 (sub-headline)

hr

Horizontal rule.

identified-content

Holds content identifiers that can apply to document as a whole.

iim

IIM Record 2 Data Container.

key-list

List of keywords.

keyword

Keyword. Can also be a phrase.

lang

Language identifier.

li

List item.

location

Significant place mentioned in an article.

media

Generalized media object.

media-caption

Text describing media.

media-metadata

Generic metadata placeholder.

media-object

Inline media data.

media-producer

Byline of media producer.

media-reference

Reference to an external media object, OR to its following media-object.

meta

A construct for sending generic metadata.

money

Monetary item.

name.family

Family name.

name.given

Given name.

nitf

The root element for NITF.

nitf-col

A holder for a namespaced XML fragment for custom-tagged data.

nitf-colgroup

A collection of nitf-col elements.

nitf-table

A holder for a table, and content-filled metadata.

nitf-table-metadata

A holder for a namespaced XML fragment for custom-tagged data.

nitf-table-summary

Textual description of the table.

note

Document cautionary note.

num

Numeric data.

numer

Fraction numerator.

object.title

Title of inline object such as book, song, artwork, etc.

ol

Ordered list.

org

Organization.

p

Paragraph.

person

Human individual.

postaddr

Mailing address.

postcode

Postal code.

pre

Preformatted information.

pronounce

Pronunciation Information.

pubdata

Information about specific instance of an item's publication.

q

Quotation.

region

Geographic area.

revision-history

Information about the creative history of the document; also used as an audit trail.

rights

Information on rights holder.

rights.agent

Rights agent.

rights.enddate

Rights end date.

rights.geography

Area to which rights apply.

rights.limitations

Limitations (exclusive / nonexclusive) of rights.

rights.owner

Rights owner

rights.startdate

Rights start date.

rights.type

Type of rights claimed.

series

Series information.

state

State or province or region.

story.date

Date of story.

sub

Subscript.

sublocation

Named region within city or state.

sup

Superscript.

table

Table of data.

table-reference

A pointer to a table that is elsewhere in the document.

tagline

A byline at the end of a story.

tbody

Table body.

td

Table data cell.

tfoot

Table footer.

th

Table header cell.

thead

Table heading.

title

Document Title.

tobject

Subject code.

tobject.property

Subject code property.

tobject.subject

Assigns subject information to news material based on a Subject Code system.

tr

Table row.

ul

Unordered list.

urgency

News importance.

virtloc

Virtual Location.

 

NITF is XML

NITF is an XML-conforming vocabulary. This means that NITF uses the constructs standardized by XML to describe elements of content within a document, and the descriptive attributes of that content.

For example, if a publisher wants to use NITF to distinguish a company name from the surrounding text, the <org> element would be used:

Today, <org>Microsoft</org> announced the release of....

If the publisher wants to embed Microsoft's NASD stock symbol, the value and org-id attributes of the <org> element would be used:

Today, <org value="MSFT" org-id="NASD">Microsoft</org>
announced the release of....

For more information on how XML works, visit our page listing XML resources.

This tutorial covers the most widely used sections of NITF. For details about each element, consult the NITF documentation page. Within this tutorial, elements and attributes are displayed in purple. Comments are included within the XML <!-- comment markers -->.


Basic Structure of NITF

NITF is divided into two sections, the <head> and the <body>.

<nitf>
  <head>
         <!--
         Metadata about the document as a whole
         goes here.
         -->
  </head>
 
  <body>
         <!--
         Contents for direct display to the user
         go here.
         -->
  </body>
</nitf>

In this respect, NITF is just like HTML -- But that's where the resemblance ends. Web authors use HTML to describe the display of their pages. NITF, on the other hand, is designed to describe the substance of news articles.


NITF <body>

The NITF <body> element is itself divided into three sections:

<body>
  <body.head>
         <!--
         This section holds core news components,
         such as headline and byline, that are commonly
         displayed before the text of an article.
         -->
  </body.head>
 
  <body.content>
         <!--
         This section holds the article,
         generally consisting of paragraphs of text,
         but perhaps with embedded tables, lists,
         photos, and other items. These can also be
         referenced by specifying a location on the
         Internet or another computer.
         -->
  </body.content>
 
  <body.end>
         <!--
         This section holds core news components
         that are commonly displayed at the end of
         an article.
         -->
  </body.end>
</body>

Here is an example of a simple NITF article:

<nitf>
  <head>
  </head>
  <body>
         <body.head>
                 <hedline>
                         <hl1>This is the main headline</hl1>
                 </hedline>
 
                 <byline>
                         By Joseph Q. Reporter
                 </byline>
         </body.head>
 
         <body.content>
                 <p>This is the content of the first
                 paragraph of the article.</p>
 
                 <p>This is the content of the second
                 paragraph of the article.</p>
         </body.content>
  </body>
</nitf>


Containers within <body.content>

Within body.content, NITF allows for several other types of text container at the same level as the paragraph. These include:


Enriched Text

NITF contains many elements that can distinguish content appearing within headlines, bylines, tables, lists, and paragraphs. These "enriched-text" elements allow the publisher to index and highlight documents better, and add hyperlinks to richer sources of related and archival information.

Attributes of these enriched-text elements allow publishers to store -- inline with the text -- descriptive codes. These attributes can provide consistency and dependability among authors, writing styles, and languages.

Enriched-text elements provide facilities for marking up:

Here is a more enriched and expanded example of an NITF article:

<nitf>
  <head>
  </head>
  <body>
         <body.head>
                 <hedline>
                         <hl1>This is the main headline</hl1>
                         <hl2>This is a sub-headline</hl2>
                 </hedline>
                 <byline>
                         By <person value="JQR-412"
                         idsrc="newspub-corp">
                         Joseph Q. Reporter</person>
                 </byline>
         </body.head>
 
         <body.content>
                 <p>Today, <org value="MSFT"
                 org-id="NASD">Microsoft</org>
                 announced the release of....</p>
                 <p>That company is <em>so</em> big
                 that they....</p>
                 <media media-type="image">
                         <media-reference
                                mime-type="image/jpeg"
                                source="gates.jpg"
                                alternate-text="Bill
                                Gates at the podium."
                                >
                         </media-reference>
                         <media-caption>
                                Gates makes speech.
                         </media-caption>
                 </media>
         </body.content>
  </body>
</nitf>

More details on what is allowed in the NITF <body> element can be learned from our documentation and examples.


NITF <head>

The <head> element contains metadata that describes the article as a whole. It is divided into five main sections:

<head>
  <title>
         <!--
         A short, plain-text title of the document.
         Often used for display in a listing of
         search-results.
         -->
  </title>
 
  <tobject>
         <!--
         Codes for the type of article, and the
         subjects covered by the article.
         -->
  </tobject>
 
  <docdata>
         <!--
         Contains metadata about this document in
         particular. Includes publication date, an
         urgency rating, the column name (if it's
         a regular feature), series name
         (if it's part of a series of articles),
         and information on copyright ownership and
         distribution rights.
         -->
  </docdata>
 
  <pubdata/>
         <!--
         Information on where and how this article
         was published.
         -->
 
  <revision-history>
         <!--
         Specifies who revised the document, and why.
         -->
  </revision-history>
</head>


NITF <title>

The <title> element holds a plain-text string of characters that sums up, usually in one line, what the article covers. Most often, the <title> element is just the headline of the article, with any line breaks or enriched-text elements stripped.

Many NITF processing systems display a list of <title> elements when users search an NITF archive. Hence, it is important that this element be included.

Here is an example:

<title>
  President's State of the Union Address Makes Big Splash
</title>


NITF <tobject>

The <tobject> section (which stands for "topic object") contains elements that distinguish feature news stories from, say, news analyses or obituaries. It also holds elements that describe what subject the article is about, via a three-level Subject Codes taxonomy.

The IPTC has issued controlled Subject Code vocabularies that it strongly recommends the publisher use when including <tobject> elements. These vocabularies are available in several languages, and the IPTC is open to requests for additions to the list.

Here is an example:

<tobject>
  <tobject.property
         tobject.property.type="Wrapup"
         />
 
  <tobject.subject
         tobject.subject.refnum="4008006"
                 // required
                 
         tobject.subject.code="FIN"
                 // optional three-letter code for Level 1
                 
         tobject.subject.type="Economy, Business & Finance"
                 // optional display-name for Level 1
                 
         tobject.subject.matter="Macro Economics"
                 // optional display-name for Level 2
                 
         tobject.subject.detail="Foreign Exchange Markets"
                 // optional display-name for Level 3
                 
         />
</tobject>


NITF <docdata>

The <docdata> element (which stands for "document data") holds elements that identify, date, and register the article. It breaks out as follows:

<docdata>
  <correction
         info="fixes typo in paragraph 3"
         id-string="4000.21.a"
         />
         <!--
         Indicates this article was issued as a
         correction, and which document it corrected.
         -->
 
  <doc-id
         id-string="4000.21.b"
         />
         <!--
         ID of this document.
         -->
 
  <del-list>
         <!--
         Indicates who has delivered / distributed
         this article.
         -->
         <from-src src-name="ScreamingMedia"/>
  </del-list>
 
  <urgency
         ed-urg="1"
         />
         <!--
         Highest.
         -->
 
  <fixture
         fix-id="Investigative Business Feature"
         />
         <!--
         The name given to a regular feature.
         -->
 
  <date.issue
         norm="2000-05-21T09:05:00-5:00"
         />
         <!--
         Date the item was published.
         -->
 
  <du-key
         generation="3"
         part="2"
         version="12"
         key="microsoft-trial"
         />
         <!--
         Provides a mechanism for grouping related
         stories.
         -->
 
  <doc.copyright
         year="2000"
         holder="The Toronto Globe & Mail"
         />
         <!--
         Who owns the copyright to this document.
         -->
 
  <key-list>
         <!--
         Holds a set of keywords relevant to the article.
         -->
         <keyword key="software"/>
         <keyword key="antitrust"/>
  </key-list>
</docdata>


NITF <pubdata>

The <pubdata> element (which stands for "publication data") has attributes that identify and date and register the article. It breaks out as follows:

<pubdata
  type="print"
  name="The Toronto Globe and Mail"
  date.publication="20000521"
  />


NITF <revision-history>

The <revision-history> elements provide a creative history of the document. It breaks out as follows:

<revision-history
  name="Pat Q. Editor"
  function="editor"
  norm="20000521T08:43:00Z"
  comment="fixed rampant typos"
  />

 

Aufgabe 4 (10 Pt)

 

(a)

Diskutieren Sie die Vor- und Nachteile von XML-basierten Metadatenformaten (wie NIST) gegenüber

RDF-basierten Metadatenvokabularen (wie RSS).

 

(b)

Diskutieren Sie, inwiefern die Herausforderungen an Metadatenformate (Slide 45 – Metadatenfoliensatz) von den Sprachtypen addressiert werden. Zur Erinnerung:

-          Konsistenz

-          Proper Reference (schwer zu übersetzen, sorry)

-          Vermeidung von Redundanz

-          Relationierung / In Bezugsetzung von Metadaten untereinander

-          Wartbarkeit

-          Einfachheit der Nutzung

-          Effizienz