Enriching Data Lakes with Knowledge Graphs

Chessa Alessandro;Fenu Gianni;Reforgiato Recupero Diego
;
Secchi Luca
2022-01-01

Abstract

Data lakes are repositories of data stored in natural/raw format. A data lake may include structured data from relational databases, semi-structured data (i.e., JSON, CSV), unstructured data (i.e., text data), or binary data (i.e., images, audio, video). It is usually built on top of cost-efficient infrastructures such as Hadoop, Amazon S3, MongoDB, ElasticSearch, etc. Several organisations rely on big data lakes for crucial tasks such as reporting, visualisation, advanced analytics, machine learning, and business intelligence. A major limitation of this solution is that without descriptive metadata and a mechanism to maintain it, such data tend to be noisy, making their management and analysis complex and time-consuming. Therefore, there is the need to add a semantic layer based on a formal ontology to describe the data and efficient mechanism to represent them as a knowledge graph. In this paper, we present a methodology to add a semantic layer to a data lake and thus obtain a knowledge graph that can support structured queries and advanced data exploration. We describe a practical implementation of a methodology applied to a data lake consisting of text data describing the online marketplace for lodging and tourism activities. We report statistics about the data lake and the resulting knowledge graph.
2022
Inglese
TEXT2KG 2022 & MK 2022. First International Workshop on Knowledge Graph Generation From Text and First International Workshop on Modular Knowledge. Proceedings of the 1st International Workshop on Knowledge Graph Generation From Text and the 1st International Workshop on Modular Knowledge co-located with 19th Extended Semantic Conference (ESWC 2022)
3184
123
131
9
1st International Workshop on Knowledge Graph Generation From Text and the 1st International Workshop on Modular Knowledge
Esperti anonimi
30 May 2022
Hersonissos, Greece
scientifica
4 Contributo in Atti di Convegno (Proceeding)::4.1 Contributo in Atti di convegno
Chessa, Alessandro; Fenu, Gianni; Motta, Enrico; Osborne, Francesco; REFORGIATO RECUPERO, DIEGO ANGELO GAETANO; Salatino, Angelo; Secchi, Luca ...espandi
273
7
4.1 Contributo in Atti di convegno
open
info:eu-repo/semantics/conferencePaper
Files in This Item:
File Size Format  
Enriching Data Lakes with Knowledge Graphs - TEXT2KG_Short_1.pdf

open access

Type: versione editoriale
Size 2.77 MB
Format Adobe PDF
2.77 MB Adobe PDF View/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Questionnaire and social

Share on:
Impostazioni cookie