Recent Changes - Search:

Preservation Guide

Introduction

Supplements

Originated by the PrestoSpace project, supported by the Cultural Heritage Programme of the European Commission Information Society Technologies Programme.

Information Society Technologies Programme

edit SideBar

MAD

MAD: the Architecture

The MAD Platform adopts a modular and extensible architecture. It consists of two different components:

  1. The Documentation Platform
  2. The Publication Platform

The MAD Platform receives digitised media (audio and video files) as input: these data are processed by the Documentation Platform, which returns different materials as key frames, camera motions and metadata. These materials are then indexed and published on a web server by the Publication Platform.

In order to simplify the communication between the Documentation Platform and the Publication Platform, an alternative MAD Platform, called Turnkey System, has been developed. The Turnkey System is a lightweight system specifically tailored for small size archives. It is made up of both the Documentation and the Publication Platform with customized features.

The Documentation Platform

The Documentation Platform is made up of a core component, called Core Platform, and a set of pluggable software processors named GAMPs, where GAMP stands for Generic Activity MAD Processor. A GAMP is a software component that extracts the metadata from the digitised material.

The Core Platform offers the following main services:

  • Workflow management service, responsible for starting processes in the right order and for resolving dependencies between GAMPs;
  • Essence and Metadata Storage (EMS) system, which stores the audiovisual material sources and the associated metadata;
  • Concurrent Versioning System, tracking every change to the metadata operated by the GAMPs, built on a standard CVS engine;

• Delivery of enriched metadata and related material created by the GAMPs within the Documentation Platform.

The main features of the Documentation Platform are shown in Figure 1.

Documentation Platform

Fig. 1: the Documentation Platform

The overall services offered by the Core Platform are available through web services interfaces based on SOAP. Using web services, every GAMP polls the Core Platform asking for a job, and then submits the produced metadata and notifies the completion of its work to the Workflow Manager. By using web services, GAMPs can be implemented by using any programming language supporting SOAP and web services protocols.

The architecture of the Documentation Platform has the following peculiarities:

  • it is modular, since GAMPs can interact with the Core Platform even being totally different in implementation details and functionalities;
  • it is extensible, in the sense that it is easy and natural to insert a new GAMP;
  • it is platform independent, since the Core Platform itself is implemented in Java, therefore portable to several operating systems;
  • it is characterized by a multi-tier distribution, in the sense that every GAMP can be installed on a different physical system, provided that a network link to the Core Platform is available.

The Publication Platform

The Publication Platform is the component of the MAD Platform providing retrieval and browsing functionalities. In detail, it deals with instances of documents in MAD metadata format, making them available on a web representation, and it gives access to the material sources exported from the Core Platform.

The Publication Platform comprises three different main subcomponents:

  • a web application, namely the user interface;
  • a relational DBMS that stores information related to the available programmes;
  • a text search and indexing engine (Lucene – KIM), comprising a semantic engine for processing natural language queries.

The searching interface of the Publication Platform offers several searching approaches, and the user can choose to apply for a programme or a news item, which can be filtered by programme title, broadcast date, authors, topics, and so on.

The user interface presents a video preview, currently making use of Windows Media Player. This is the only feature written specifically for Internet Explorer.

A schema of the Publication Platform is shown in Figure 2.

Fig. 2: the Publication Platform

The Web Interface

The Publication Platform provides a web interface for searching and retrieving information produced by the Documentation Platform.

The entry point for queries is the form shown in Fig. 3.

Fig. 3: the Search Interface

Basically, the user can submit a keyword and start the search among programmes or news, searching by contribution, title, publication date, publication service and topic.

The results of the query are then shown in a list (Fig. 4). In order to browse a retrieved document, the user can select it from this list (Fig. 5).

Fig. 4: the list containing the results of a query

Fig. 5: the web page for browsing the selected programmes/news

In the left part of the page, there is the video section (upper), and the tree structure showing the segmentation of the programme in news. This segmentation is also shown as a timeline (Fig. 6).

Each of the segments describes a single highlight (and is related to the shots presented in the bottom of the right side of the web page).

Fig. 6: the timeline section

The remaining (main) part of the page provides several tabs showing:

  • Info (titles, publications, contributions and identifiers)
  • Transcription: the entire text converted from speeches (the user can do a textual search)
  • Semantic analysis
  • Content analysis (stripes and camera motion, if extracted during the documentation)
  • Related sources (correlated news from external web sites)

There is also a RSS button in the upper right corner for exporting the programme in the RSS (Really Simple Syndication) format, and then read it with the aim of a feed reader (Fig.7).

Fig. 7: RSS export feature

The Turnkey System

It is usual that big size archives have their own Publication Platform, developed in order to satisfy specific needs. Concerning the MAD architecture, this kind of archive is typically interested in the Documentation Platform only.

However, small size archives are usually interested in the functionalities of both the Documentation and the Publication Platform. As mentioned, in order to develop an “ad hoc” solution for this kind of archives, a Turnkey system has been implemented.

The Turnkey System is a lightweight system specifically tailored for small size archives. It is made up of both the Documentation and the Publication Platform with customized GAMPs and features. It is a fully automatic system for content enrichment and web publishing/searching. Big size archives should use subparts of the Turnkey System because they have their content management systems and web search and publication features.

The Turnkey System is represented in Figure 8.

Fig. 8: the Turnkey System

The metadata

Researches within the PrestoSpace project have determined that the required information for a typical audiovisual archive exploitation processes can be partitioned in the following fundamental classes:

  • Identification information, such as titles, credits, and programme publication information;
  • Editorial parts of information, i.e. information about the relevant editorial sub-items of a programme, such as news items;
  • Content-related information, such as text of speech, descriptions, and visual low level descriptive features;
  • Enrichment information, coming from external sources related to the programme content.

The data model adopted, representing the above classes, together with a data format carrying all the entities and relations of it, consists of a single XML-based document format, resulting from the combination of MPEG-7 and P_META. More in detail, MPEG-7 has been used thanks to its powerful temporal segmentation tools and for its comprehensive set of standard audiovisual descriptors, whereas P_META has been adopted in order to capture information structures for identification, classification and publication-related features of a programme.

PrestoSpace Orchestrator (PSO)

The process of digital preservation of audio-visual collections requires the execution of the following tasks:

  • Preservation (migration)
  • Restoration
  • Documentation and Access

performed by:

  • The Preservation Unit (PRE)
  • The Restoration Unit (RES)
  • The Documentation Unit (MAD)

respectively. Of course, there are overlaps between these actions. The user requirements phase have demonstrated an extreme variation between the different requirements, scales, and urgencies for the three functions above. The Preservation effort is clearly driven in most cases by urgency considerations, as the media are often deteriorating at a fast pace, playback machines and expertise harder and harder to maintain. Requirements for urgent Restoration and Documentation are variable from one archive to another, and so on.

In order to automate the communication and the interaction among the mentioned components, namely PRE, RES, and MAD, a PrestoSpace Orchestrator (PSO) has been implemented. PSO is a software layer interfacing:

  • archives, requiring preservation and restoration of audio-visual data;
  • the PrestoSpace units PRE, RES, and MAD

A schema of PSO is presented in Figure 9 below:

Fig. 9: the PrestoSpace Orchestrator (PSO)

Intuitively, an archive requiring preservation and restoration of its own audio-visual material asks the PSO to process an order, which comprises a set of batches of material. The PSO is then able to perform a workflow management service, responsible for starting processes in the right order and for resolving dependencies between different PrestoSpace Units PRE, RES, and MAD.

In order to adopt an extensible and portable solution, the PSO makes use of web services interfaces for the communication with both archives and the three mentioned PrestoSpace Units.

The Web Interface

The web interface provided by the PSO is shown in Fig. 10:

Fig. 10: the PSO web interface

In order to manage all the operations involved, the user can add/modify profiles and services (location where to find materials). As an orchestrator, the user can then create new orders to submit to the MAD Platform, requiring the elaboration provided by the different units (PRE, RES, MAD).

The activity diagram of the process is shown in Fig. 11:

Fig. 11: interaction between the archive(s) and the PSO

First, the archive(s) must be registered. A registered user can then insert a new order by performing the required operations.

At this point, the user creates the EDOB(s) by submitting the legacy information and specifying the related materials supplied by the archive(s). With the EDOB(s), the user creates the batch(es) for each order.

After the batch submission, when the related materials are available, then the order can be processed.

In the Management area (see Fig. 10), the user can see the status of the submitted orders/batches; moreover, one can see the processing status of a single EDOB.


Original documentation for MAD

D15.1 Analysis of current audiovisual documentation models (Report) http://www.prestospace.org/project/public.en.html

D15.3 State of the art of content analysis tools for video, audio, speech (Report) Survey of the existing technologies and analysis of their applicability to audiovisual materials. http://www.prestospace.org/project/public.en.html

D15.2 PrestoSpace documentation framework (Report) Documentation structures, processes, user interface, tools integration framework

D15.4 Content analysis tools (Report + Software) Results of experimentation, operational guidelines, integration of tools in the framework

D15.5 Analysis of cross-linguistic IE tools for Metadata Discovery (Report). Survey of the viable techniques for IE in a cross-linguistic framework. Experimentation on real user data and analysis of the results (Performance Assessment). A Proposal for an effective IE architecture as a metadata discovery tool.

D15.6 Semantic interpretation tools (Report + Software) Results of experimentation, operational guidelines, integration of tools in the framework

D16.3 Cross language retrieval and access tools Survey of the existing technologies and analysis of their applicability to archive retrieval. Definition of a test bed and evaluation of the selected technologies.

D16.4 Delivery models (Report + Software) Analysis of B2B transaction models. Definition of a model for the management of the transactions between the Factory and its customers (open to CRM-systems) including the supported file formats and transcoding functionalities.

D16.1 Content retrieval and browsing for the general public Specification of retrieval and browsing interfaces for the public access. This will include browsing tools efficiency evaluation (key frame based storyboard, low quality video, types of indexing provided, etc ...) as well as usability of non conventional query methods (e.g., image based search, free text search on transcripts, category tree traversal).

D16.2 Conceptual search Survey of the existing technologies in the field of automatic data models mapping and ontologies and analysis of their applicability to audiovisual archives, with special reference to the results of D15.5. Definition of a test bed and evaluation of the selected technologies..

D18.1 Documentation platform for the MAD Factory (Software and Manuals)

This implementation deliverable will provide the software infrastructure for the management of the documentation process of the digitized content, including automatic extraction of metadata and manual annotation and validation. The result of the overall preservation process, that is digitized content and associated metadata, will then be packaged and delivered back to the archive. The same package will also be used as input format for the Publication platform.

D18.2 Publication platform for the Results of Digitization and Documentation (Software and Manuals) This implementation deliverable will provide the software infrastructure for the management of the digitized contents and the documentation metadata, including storage and retrieval facilities. The system will be oriented to offer turnkey services to small archives.

D18.3 Turnkey System for Delivering to small Archives (Software and Manuals) This implementation deliverable will provide the software infrastructure for the management of the whole MAD System made up of the Documentation and the Publication Platform customized to small archives needs.

Edit - History - Print - Recent Changes - Search
Page last modified on March 30, 2007, at 06:53 PM