pith. sign in

arxiv: 0705.1457 · v1 · submitted 2007-05-10 · 💻 cs.DB

Web data modeling for integration in data warehouses

classification 💻 cs.DB
keywords datamodelcontextdocumentsheterogeneousphasesourcewarehousing
0
0 comments X
read the original abstract

In a data warehousing process, the data preparation phase is crucial. Mastering this phase allows substantial gains in terms of time and performance when performing a multidimensional analysis or using data mining algorithms. Furthermore, a data warehouse can require external data. The web is a prevalent data source in this context, but the data broadcasted on this medium are very heterogeneous. We propose in this paper a UML conceptual model for a complex object representing a superclass of any useful data source (databases, plain texts, HTML and XML documents, images, sounds, video clips...). The translation into a logical model is achieved with XML, which helps integrating all these diverse, heterogeneous data into a unified format, and whose schema definition provides first-rate metadata in our data warehousing context. Moreover, we benefit from XML's flexibility, extensibility and from the richness of the semi-structured data model, but we are still able to later map XML documents into a database if more structuring is needed.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.