ABOUT THE PROJECT
enRichMyData delivers an open software toolbox – the enRichMyData toolbox – comprising practical, robust and scalable components to support organizations in enriching their data with reference data they may have limited knowledge of, as well as supporting data providers in making their data reusable and available in data enrichment processes.
The toolbox lowers the technological entry barriers by providing support for the definition of highly scalable and replicable data enrichment pipelines through a set of tools and infrastructure services related to capabilities needed during the lifecycle of enrichment pipelines. The toolbox makes the data enrichment process accessible to a broader set of stakeholders by reducing the required expertise and enhancing the tool support level.
DISCOVERY OF POTENTIALLY VALUABLE DATA FOR DATA ENRICHMENT
Improve data discovery and profiling featuring search on data, ontologies, and semantic data profiles to identify potentially valuable data for data enrichment.
WRAPPING DATA SOURCES IN DIFFERENT FORMATS
Improve wrapping of data sources in different formats so they can be securely accessed as virtual semantic graphs and used more easily for data enrichment.
SIMPLIFIED CLEANING, LINKING AND EXTENSION OF DATA
Simplify cleaning, linking (to reference resources), and extension of structured and semi-structured data, featuring approaches that enable users to specify such operations visually.
SIMPLIFIED ANNOTATION AND CLASSIFICATION OF DATA
Simplify annotation and classification of textual data, featuring entity and concept extraction, feature extraction (via embeddings), and classification with predefined and custom classifiers.
SUPPORT THE MANAGEMENT OF DATA ENRICHMENT PIPELINES
Support the management of data enrichment pipelines, including the creation and operation of data linking and extension services, a framework for deployment and execution of pipelines at a large scale, and reuse and extension of existing pipelines to deliver a hub of data and services for data enrichment.
SUPPORT DATA STREAMINGIN DATA ENRICHMENT PIPELINES
Support data streaming in data enrichment pipelines, featuring support for setting up appropriate endpoints and ensuring high throughput during pipeline execution.
ENERGY CONSUMPTION REDUCTION FOR DATA ENRICHMENT PIPELINES
Monitor and reduce energy consumption for executing data enrichment pipelines using models to estimate and track their carbon footprint.
The consortium
Consists of 13 partners from 11 countries. It has three strong university partners specialised in Big Data, distributed computing, and high-productivity languages, led by a research institute. Additionally, one research institute and one international organisation are involved. EnrichMyData gathers three SMEs and five large companies that prioritise the business focus of the project in achieving high business impacts.