Search CORE

1,721,029 research outputs found

Semistructured und Structured Data in the Web: Going Back and Forth

Author: PAOLO ATZENI
GIANSALVATORE MECCA
MERIALDO PAOLO
Publication venue
Publication date: 01/01/1997
Field of study

Archivio della Ricerca - Università di Roma 3

Design and development of data-intensive web sites: The araneus approach

Author: MECCA G.
GIANSALVATORE MECCA
ATZENI Paolo
MERIALDO PAOLO
Publication venue
Publication date: 01/01/2003
Field of study

Archivio della Ricerca - Università di Roma 3

To Weave the Web

Author: GIANSALVATORE MECCA
ATZENI Paolo
Mecca G
MERIALDO PAOLO
Publication venue
Publication date: 01/01/1997
Field of study

Archivio della Ricerca - Università di Roma 3

Araneus in the Era of XML

Author: PAOLO ATZENI
MECCA G.
GIANSALVATORE MECCA
ATZENI Paolo
MERIALDO PAOLO
Publication venue
Publication date: 01/01/1999
Field of study

Archivio della Ricerca - Università di Roma 3

Persistence on different databases via reflective IoT Middleware

Author: Di Noia Tommaso
Di Sciascio Eugenio
Mongiello Marina
Nocera Francesco
Publication venue
Publication date: 01/01/2016
Field of study

Politecnio die Bari - Catalogo di prodotti della Ricerca

Efficient Queries over Web Views

Author: ALBERTO MENDELZON
MECCA Giansalvatore
GIANSALVATORE MECCA
MENDELZON A.
MERIALDO PAOLO
MERIALDO P
Publication venue
Publication date: 01/01/2002
Field of study

Large Web sites are becoming repositories of structured information that can benefit from being viewed and queried as relational databases. However, querying these views efficiently requires new techniques. Data usually resides at a remote site and is organized as a set of related HTML documents, with network access being a primary cost factor in query evaluation. This cost can be reduced by exploiting the redundancy often found in site design. We use a simple data model, a subset of the Araneus data model, to describe the structure of a Web site. We augment the model with link and inclusion constraints that capture the redundancies in the site. We map relational views of a site to a navigational algebra and show how to use the constraints to rewrite algebraic expressions, reducing the number of network accesses. We show that similar techniques can be used to maintain materialized views over sets of HTML pages

Archivio della Ricerca - Università della Basilicata

Archivio della Ricerca - Università di Roma 3

Data-Intensive Web Sites: Design and Maintenance

Author: MECCA Giansalvatore
GIANSALVATORE MECCA
MECCA G.
ATZENI P
ATZENI Paolo
MERIALDO PAOLO
MERIALDO P
Publication venue
Publication date: 01/01/2001
Field of study

A methodology for designing and maintaining data-intensive Web sites is introduced. Leveraging on ideas well established in the database field, the approach heavily relies on the use of models for the description of Web sites. The design process is composed of two intertwined activities: database design and hypertext design. Each of these is further divided in a conceptual phase and a logical phase, based on specific data models. The methodology strongly supports site maintenance: in fact, the various models provide a concise description of the site structure; they allow to reason about the overall organization of pages in the site and possibly to restructure it

Archivio della Ricerca - Università della Basilicata

Archivio della Ricerca - Università di Roma 3

Managing Web-based data - Database models and transformations

Author: Mecca G.
MECCA Giansalvatore
GIANSALVATORE MECCA
ATZENI P
ATZENI Paolo
MERIALDO P.
MERIALDO PAOLO
Publication venue
Publication date: 01/01/2002
Field of study

Database research, traditionally aimed at data management methods and tools in various frameworks, now requires a broader focus. Building on recent successes in business applications, researchers in database technology need to widen their spectrum of interest to confront new data management opportunities--particularly in thecontext of the Internet. Indeed, in the Asilomar Report on Database Research, experts from industry and academia called for researchers to "make it easy for everyoneto store, organize, access, and analyze the majority of human information online" within the next 10 years

Archivio della Ricerca - Università della Basilicata

Archivio della Ricerca - Università di Roma 3

AND GIANSALVATORE MECCA

Author: Università Diroma Tre
Università Della Basilicata
Valter Crescenzi
Publication venue
Publication date: 01/04/2008
Field of study

Abstract. Information extraction from websites is nowadays a relevant problem, usually performed by software modules called wrappers. A key requirement is that the wrapper generation process should be automated to the largest extent, in order to allow for large-scale extraction tasks even in presence of changes in the underlying sites. So far, however, only semi-automatic proposals have appeared in the literature. We present a novel approach to information extraction from websites, which reconciles recent proposals for supervised wrapper induction with the more traditional field of grammar inference. Grammar inference provides a promising theoretical framework for the study of unsupervised—that is, fully automatic—wrapper generation algorithms. However, due to some unrealistic assumptions on the input, these algorithms are not practically applicable to Web information extraction tasks. The main contributions of the article stand in the definition of a class of regular languages, called the prefix mark-up languages, that abstract the structures usually found in HTML pages, and in the definition of a polynomial-time unsupervised learning algorithm for this class. The article shows that, differently from other known classes, prefix mark-up languages and the associated algorithm can be practically used for information extraction purposes. A system based on the techniques described in the article has been implemented in a working prototype. We present some experimental results on known Websites, and discuss opportunities and limitations of the proposed approach. Categories and Subject Descriptors: F.4.3 [Mathematical Logic and Formal Languages]: Formal Languages—Classes defined by grammars or automata; H.2.4 [Database Management]: Systems— Relational database

CiteSeerX

An Automatic Data Grabber for Large Web Sites

Author: Mecca Giansalvatore
Giansalvatore Mecca
Paolo Missier
P. MISSIER
G. MECCA
Valter Crescenzi
MERIALDO PAOLO
CRESCENZI VALTER
Paolo Merialdo
Missier Paolo; id_orcid
Publication venue
Publication date: 01/01/2004
Field of study

This chapter investigates a system to automatically grab data from data intensive Websites. The system first infers a model that describes the Website as a collection of classes. Each class represents a set of structurally homogeneous pages, and it is associated with a small set of representative pages. Based on the model, a library of wrappers, one per class, is then inferred with the help an external wrapper generator. The model, together with the library of wrappers, can thus be used to navigate the site and extract the data. The inference process is performed incrementally. The system starts from a given entry point that becomes the first member of the first class in the model. It then refines the model by exploring its boundaries to gather new pages. At each iteration, the system selects a link collection from the model outbound, and iteratively fetches a page by following one of the links in the collection. In order to reduce the number of pages actually visited, after each download the system makes a guess on the class of remaining pages. If looking at the pages already downloaded, there is sufficient evidence that the guess is right, the remaining pages of the collections are assigned to classes without actually fetching them. The process iterates until all the link collections are typed with a known class.</p

Crossref

University of Birmingham Research Portal

Archivio della Ricerca - Università di Roma 3