Übungsblatt 5
Vorlesung „Intelligente“
Systeme im WWW
Sommersemester 2002
Zum Online Bearbeiten am 10.7.
Aufgabe 1 (10pt. – also bitte mehr als 3 Sekunden denken)
Finden Sie Beispiele für Strukturelle, Beschreibende und
Administrative Metadaten für folgendes Bild:

Carl Spitzweg
Der kurzsichtige Liebhaber
WV 1043
frühe Arbeit
Öl auf Leinwand, 46,0 x 34,0 cm
Museum Georg Schäfer Schweinfurt
bei Roennefahrt wird dieses Bild als "Die Angebetete" aufgeführt
Aufgabe 2 (10 Pt):
Der International Press Telecommunication Council (IPTC). Hat einen Metadaten-Standard mit gleichem Namen geschaffen, der bei der professionellen Informationsverarbeitung von Bildinformationen (bsp.: in Verlagen) genutzt wird. Er wid auch von Applikationen für die professionelle Bildbearbeitung unterstützt (zum Beispiel sind die Textinformationen in Adobe Photoshop eine Teilmenge der IPTC- Informationen).
Ordnen Sie die Metadaten-Attribute den Kategorien Beschreibend, Strukturell und Administrativ zu.
Objektname
Überschrift
Bildunterschrift
Besondere Hinweise
Objektbeschreibung
Bildrechte
Quelle
Copyright-Vermerk
Schlagwörter (Menge)
Titel der Bildunterschrift
Autor
IPTC Kategorie
IPTC Erstellungsdatum
IPTC Erstellungszeitpunkt
Freigabedatum
Freigabezeitpunkt
Versanddatum
Versandzeitpunkt
Ort
Staat/Provinz
Land(Code)
Land (Text)
Aufgeber-Code
Erstellungsprogramm
Programmversion
Job-Kennzeichnung
Service-Id
Bearbeitungsstand
Dringlichkeit
Objektzyklus
Referenzdienst
Referenzdatum
Referenznummer
Aufgabe 3 (30 Pt):
Für die Beschreibung von Nachrichtenartikeln existieren mehrere Standars, die versuchen
Metadaten zusammen mit den eigentlichen Nachrichten zu erfassen. Beispiele dafür sind NITF (http://www.nitf.org/) und NewsML (http://www.newsml.org/).
Erfassen Sie (mit Hilfe des beiliegenden NIFT Tutorial). Metadatenbeschreibungen für den folgenden
Artikel.
By
Alan Karben
NITF Network News Online
The
weather was great today in Norfolk, Virginia. Made me want to take out my boat,
manufactured by the Acme Boat Company .
Tides
in Norfolk are running normal today. This weeks article highlights many of this
week's fishing issues, and also presents a reference table of tide times.
As
can be seen from the table below, the shores of Oceanview again present the
brightest spots for fishermen and sandcastle-builders alike.
|
|
today |
tide |
tomorrow |
next day |
third day |
|||||
|
beach |
high |
low |
in |
out |
high |
low |
high |
low |
high |
low |
|
Sunset |
30 |
14 |
09:23 |
18:51 |
28 |
11 |
31 |
12 |
33 |
9 |
|
Oceanview |
31 |
15 |
09:25 |
18:56 |
26 |
11 |
31 |
11 |
31 |
9 |
|
Shellfish |
29 |
15 |
09:25 |
18:53 |
26 |
9 |
29 |
11 |
30 |
11 |
Based
on these tide tables, I believe you can see that this weekend stands to be an
excellent one for small- or large-scale fishing exhibitions.
|
Photo: Volz The tides, captured on film late
yesterday. |
There
are many local nooks that fishing fans may want to keep a special eye one.
Happy
fishing everybody!
Stewart
Klometers contributed to this article.
NIFT Elemente:
|
Anchor
for hypertext links. |
|
|
Story
abstract |
|
|
Person or
organization to whom the postal item is being sent. |
|
|
An alternate
symbol for the phrase. |
|
|
Free-form
bibliographic data. |
|
|
A group
of releated containers. |
|
|
The
content portion of the NITF document. |
|
|
Actual
body content. |
|
|
Information
at the end of an article body |
|
|
Metadata
intended to be displayed to the reader. |
|
|
Block
quote. |
|
|
Forced
line break. |
|
|
Container
for byline information. |
|
|
Byline
title. Often contains an organization. |
|
|
Text for
the caption of a table. |
|
|
Poste restante.
|
|
|
Date and
time. |
|
|
City,
town, village, etc. |
|
|
Generic
holder for metadata. Could be used by researchers and archivists to qualify documents.
|
|
|
Column. |
|
|
Column
group. |
|
|
Container
for copyright information. |
|
|
Copyright
holder. |
|
|
Copyright
year |
|
|
Correction
information. |
|
|
Geographic
area with a government. |
|
|
Names the
source of the block quote. |
|
|
A holder
for a namespaced XML fragment for custom-tagged data, or for an alternative
set of non-parser-breaking content. |
|
|
Source of
the information grouped in a block element. |
|
|
Date/time
at which the document has no validity. |
|
|
Date/time
document was issued. |
|
|
Date/time
document is available to be released. |
|
|
Container
for dateline information. |
|
|
Definition
data. |
|
|
Delivery
trail of delivery services. |
|
|
Postal
city or town. |
|
|
Street,
PO Box No. |
|
|
Fraction
denominator. |
|
|
Information
distributor. |
|
|
Definition
list. |
|
|
Registered
identification for document. |
|
|
Indicates
an area where the document may be of interest. |
|
|
Copyright
information for document header. |
|
|
Rights
information for use of the document. |
|
|
Document
metadata. |
|
|
IIM
Record 2 dataset information. |
|
|
Definition
term. |
|
|
Dynamic
Use Key, created daily. Has tree structure indicated by defined form. |
|
|
Non-publishable
editorial message from provider or editor of item. |
|
|
Emphasis.
|
|
|
An event.
|
|
|
Event
location. |
|
|
Specification
for named document, such as Heard on the Street or On Language. |
|
|
Footnote.
|
|
|
Fraction.
|
|
|
Fraction
separator. |
|
|
Delivery
service identifier. |
|
|
Role
played by a person. |
|
|
Holds
metadata about the document as a whole. |
|
|
Container
for main headline and subheadlines. |
|
|
Headline
1 (main-headline). |
|
|
Headline
2 (sub-headline) |
|
|
Horizontal
rule. |
|
|
Holds
content identifiers that can apply to document as a whole. |
|
|
IIM
Record 2 Data Container. |
|
|
List of
keywords. |
|
|
Keyword.
Can also be a phrase. |
|
|
Language
identifier. |
|
|
List
item. |
|
|
Significant
place mentioned in an article. |
|
|
Generalized
media object. |
|
|
Text
describing media. |
|
|
Generic
metadata placeholder. |
|
|
Inline media
data. |
|
|
Byline of
media producer. |
|
|
Reference
to an external media object, OR to its following media-object. |
|
|
A
construct for sending generic metadata. |
|
|
Monetary
item. |
|
|
Family
name. |
|
|
Given
name. |
|
|
The root
element for NITF. |
|
|
A holder
for a namespaced XML fragment for custom-tagged data. |
|
|
A
collection of nitf-col elements. |
|
|
A holder
for a table, and content-filled metadata. |
|
|
A holder
for a namespaced XML fragment for custom-tagged data. |
|
|
Textual
description of the table. |
|
|
Document
cautionary note. |
|
|
Numeric
data. |
|
|
Fraction
numerator. |
|
|
Title of
inline object such as book, song, artwork, etc. |
|
|
Ordered
list. |
|
|
Organization.
|
|
|
Paragraph.
|
|
|
Human
individual. |
|
|
Mailing
address. |
|
|
Postal
code. |
|
|
Preformatted
information. |
|
|
Pronunciation
Information. |
|
|
Information
about specific instance of an item's publication. |
|
|
Quotation.
|
|
|
Geographic
area. |
|
|
Information
about the creative history of the document; also used as an audit trail. |
|
|
Information
on rights holder. |
|
|
Rights
agent. |
|
|
Rights
end date. |
|
|
Area to
which rights apply. |
|
|
Limitations
(exclusive / nonexclusive) of rights. |
|
|
Rights
owner |
|
|
Rights
start date. |
|
|
Type of
rights claimed. |
|
|
Series
information. |
|
|
State or
province or region. |
|
|
Date of
story. |
|
|
Subscript.
|
|
|
Named
region within city or state. |
|
|
Superscript.
|
|
|
Table of
data. |
|
|
A pointer
to a table that is elsewhere in the document. |
|
|
A byline
at the end of a story. |
|
|
Table
body. |
|
|
Table
data cell. |
|
|
Table
footer. |
|
|
Table
header cell. |
|
|
Table
heading. |
|
|
Document
Title. |
|
|
Subject
code. |
|
|
Subject
code property. |
|
|
Assigns
subject information to news material based on a Subject Code system. |
|
|
Table
row. |
|
|
Unordered
list. |
|
|
News
importance. |
|
|
Virtual
Location. |
NITF is an XML-conforming vocabulary. This means that NITF uses the
constructs standardized by XML to describe elements of content within a document,
and the descriptive attributes of that content.
For example, if a publisher wants to use NITF to distinguish a company
name from the surrounding text, the <org>
element would be used:
Today, <org>Microsoft</org> announced the release of....
If the publisher wants to embed Microsoft's NASD stock symbol, the value
and org-id attributes of the <org>
element would be used:
Today, <org value="MSFT" org-id="NASD">Microsoft</org>announced the release of....
For more information on how XML works, visit our page listing XML resources.
This tutorial covers the most widely used sections of NITF. For details
about each element, consult the NITF documentation page.
Within this tutorial, elements and attributes are displayed in purple. Comments are included within the XML <!-- comment markers -->.
NITF is divided into two sections, the <head>
and the <body>.
<nitf>
<head> <!-- Metadata about the document as a whole goes here.-->
</head> <body> <!-- Contents for direct display to the user go here.-->
</body></nitf>
In this respect, NITF is just like HTML -- But that's where the
resemblance ends. Web authors use HTML to describe the display of their
pages. NITF, on the other hand, is designed to describe the substance of
news articles.
The NITF <body> element is itself
divided into three sections:
<body>
<body.head> <!-- This section holds core news components, such as headline and byline, that are commonly displayed before the text of an article.-->
</body.head> <body.content> <!-- This section holds the article, generally consisting of paragraphs of text, but perhaps with embedded tables, lists, photos, and other items. These can also be referenced by specifying a location on the Internet or another computer.-->
</body.content> <body.end> <!-- This section holds core news components that are commonly displayed at the end of an article.-->
</body.end></body>
Here is an example of a simple NITF article:
<nitf>
<head> </head> <body> <body.head> <hedline> <hl1>This is the main headline</hl1> </hedline> <byline> By Joseph Q. Reporter </byline> </body.head> <body.content> <p>This is the content of the first paragraph of the article.</p> <p>This is the content of the second paragraph of the article.</p> </body.content> </body></nitf>
Within body.content, NITF allows for several
other types of text container at the same level as the paragraph. These
include:
NITF contains many elements that can distinguish content appearing
within headlines, bylines, tables, lists, and paragraphs. These
"enriched-text" elements allow the publisher to index and highlight
documents better, and add hyperlinks to richer sources of related and archival
information.
Attributes of these enriched-text elements allow publishers to store --
inline with the text -- descriptive codes. These attributes can provide
consistency and dependability among authors, writing styles, and languages.
Enriched-text elements provide facilities for marking up:
Here is a more enriched and expanded example of an NITF article:
<nitf>
<head> </head> <body> <body.head> <hedline> <hl1>This is the main headline</hl1> <hl2>This is a sub-headline</hl2> </hedline> <byline> By <person value="JQR-412"idsrc="newspub-corp">
Joseph Q. Reporter</person> </byline> </body.head> <body.content> <p>Today, <org value="MSFT"org-id="NASD">Microsoft</org>
announced the release of....</p> <p>That company is <em>so</em> big that they....</p><media media-type="image">
<media-reference
mime-type="image/jpeg" source="gates.jpg" alternate-text="Bill Gates at the podium.">
</media-reference> <media-caption>Gates makes speech.
</media-caption>
</media></body.content>
</body></nitf>
More details on what is allowed in the NITF <body>
element can be learned from our documentation
and examples.
The <head> element contains metadata
that describes the article as a whole. It is divided into five main sections:
<head>
<title> <!-- A short, plain-text title of the document. Often used for display in a listing of search-results.-->
</title> <tobject> <!-- Codes for the type of article, and the subjects covered by the article.-->
</tobject> <docdata> <!-- Contains metadata about this document in particular. Includes publication date, an urgency rating, the column name (if it's a regular feature), series name (if it's part of a series of articles), and information on copyright ownership and distribution rights.-->
</docdata> <pubdata/> <!-- Information on where and how this article was published.-->
<revision-history> <!-- Specifies who revised the document, and why.-->
</revision-history></head>
The <title> element holds a plain-text
string of characters that sums up, usually in one line, what the article
covers. Most often, the <title> element is
just the headline of the article, with any line breaks or enriched-text
elements stripped.
Many NITF processing systems display a list of <title>
elements when users search an NITF archive. Hence, it is important that this
element be included.
Here is an example:
<title>
President's State of the Union Address Makes Big Splash</title>
The <tobject> section (which stands for
"topic object") contains elements that distinguish feature news
stories from, say, news analyses or obituaries. It also holds elements that
describe what subject the article is about, via a three-level Subject Codes
taxonomy.
The IPTC has issued controlled Subject Code vocabularies
that it strongly recommends the publisher use when including <tobject> elements. These vocabularies are available
in several languages, and the IPTC is open to requests for additions to the
list.
Here is an example:
<tobject>
<tobject.property tobject.property.type="Wrapup"/>
<tobject.subject tobject.subject.refnum="4008006"// required
tobject.subject.code="FIN"
// optional three-letter code for Level 1
tobject.subject.type="Economy, Business & Finance"// optional display-name for Level 1
tobject.subject.matter="Macro Economics"// optional display-name for Level 2
tobject.subject.detail="Foreign Exchange Markets"// optional display-name for Level 3
/></tobject>
The <docdata> element (which stands for
"document data") holds elements that identify, date, and register the
article. It breaks out as follows:
<docdata>
<correction info="fixes typo in paragraph 3" id-string="4000.21.a"/>
<!-- Indicates this article was issued as a correction, and which document it corrected.-->
<doc-id id-string="4000.21.b"/>
<!-- ID of this document.-->
<del-list> <!-- Indicates who has delivered / distributed this article.-->
<from-src src-name="ScreamingMedia"/></del-list>
<urgency ed-urg="1"/>
<!-- Highest.-->
<fixture fix-id="Investigative Business Feature"/>
<!-- The name given to a regular feature.-->
<date.issue norm="2000-05-21T09:05:00-5:00"/>
<!-- Date the item was published. --><du-key
generation="3" part="2" version="12" key="microsoft-trial" /><!--
Provides a mechanism for grouping related
stories. --><doc.copyright
year="2000"
holder="The Toronto Globe & Mail"/>
<!-- Who owns the copyright to this document.-->
<key-list> <!-- Holds a set of keywords relevant to the article.-->
<keyword key="software"/> <keyword key="antitrust"/> </key-list></docdata>
The <pubdata> element (which stands for
"publication data") has attributes that identify and date and
register the article. It breaks out as follows:
<pubdata type="print"name="The Toronto Globe and Mail"
date.publication="20000521" />
The <revision-history> elements provide
a creative history of the document. It breaks out as follows:
<revision-history name="Pat Q. Editor" function="editor" norm="20000521T08:43:00Z" comment="fixed rampant typos" />
Aufgabe
4 (10 Pt)
(a)
Diskutieren Sie die Vor- und Nachteile von XML-basierten Metadatenformaten (wie NIST) gegenüber
RDF-basierten Metadatenvokabularen (wie RSS).
(b)
Diskutieren Sie, inwiefern die Herausforderungen an Metadatenformate (Slide 45 – Metadatenfoliensatz) von den Sprachtypen addressiert werden. Zur Erinnerung:
- Konsistenz
- Proper Reference (schwer zu übersetzen, sorry)
- Vermeidung von Redundanz
- Relationierung / In Bezugsetzung von Metadaten untereinander
- Wartbarkeit
- Einfachheit der Nutzung
- Effizienz