Blog

ResourcR: How to support data linking and entity reconciliation algorithms

Data linking and entity reconciliation are key tasks to enrich an input dataset with data from another source solving semantic mismatches. In previous posts, we introduced the tools that implement the LinkR components of the enRichMyData toolkit, which support these tasks. In the post, we provided more details about how we approach these tasks in …

ResourcR: How to support data linking and entity reconciliation algorithms Read More »

ScalR: horizontal scalability of data enrichment pipelines using software containers

Executing cleaning, transformation, and linking at large scale requires infrastructural components that allow for scalability. As the scalability is the ability of a system to sustain increasing workloads by making use of additional resources, the implementation of a system with this characteristic is an essential step in a big data pipeline to avoid common performance …

ScalR: horizontal scalability of data enrichment pipelines using software containers Read More »

StreamR: Empowering Real-time Insights from Complex Streaming Data

In today’s fast-paced, data-driven world, the ability to extract valuable insights from streaming data in real-time is more crucial than ever. That’s where StreamR, a powerful component within the enRichMyData toolbox, comes into play. Designed to tackle the challenges of streaming data analysis, StreamR revolutionizes the way organizations uncover real-time insights and drive informed decision-making. …

StreamR: Empowering Real-time Insights from Complex Streaming Data Read More »

LinkR: Enriching data by linking values from different sources

When users need to enrich their dataset with an external data source, the task of linking values from both sources becomes a critical hurdle to overcome. It’s the key to unlocking the external data and seamlessly blending it with the original dataset.  In addition, Knowledge Graphs (KGs) provide a powerful abstraction to support AI applications …

LinkR: Enriching data by linking values from different sources Read More »

Simplify labeling and categorization of entire documents with ClassifiR

ClassifiR simplifies the task of labeling and categorizing entire documents based on predefined taxonomies, industry classifications, or customized label sets. It works seamlessly with StructR, which identifies text segment properties, providing a comprehensive data analysis solution. With a user-friendly graphical interface, ClassifiR facilitates the creation and exploration of custom ontologies through clustering, labeling, and querying. …

Simplify labeling and categorization of entire documents with ClassifiR Read More »

StructR: Structuring Data from Text for Enhanced Insights

Data extraction and structuring from unstructured text sources have always been a challenging task in the field of data analytics. To tackle this challenge, we introduce StructR, a powerful component within the enRichMyData toolbox that specializes in extracting structured data from textual content. StructR offers a range of advanced techniques, including entity recognition and linking, …

StructR: Structuring Data from Text for Enhanced Insights Read More »

Towards Honest AI

In the good old days of machine learning and data mining, in the era of nearest neighbours, decision trees, linear regression and naive Bayes, the limitations of these models were clear. They worked surprisingly well in many cases, especially if the underlying data was rich enough. But are they comparable to the human brain?  Read …

Towards Honest AI Read More »

DiscoverR tools help users find and understand data that they can use in their data enrichment processes

DiscoverR tools are the components of the enRichMyData toolbox that help users find and understand data that they can use in their data enrichment processes. Since knowledge graphs (KGs) play a crucial role in data enrichment, either as target data sources of interest or as bridges to reach additional sources, the first DiscoverR tool, ABSTAT, …

DiscoverR tools help users find and understand data that they can use in their data enrichment processes Read More »

Data Augmentation Does Not Necessarily Beat a Smart Algorithm

In his recent work, Krisztian Buza challenged the aforementioned “widely acknowledged truth” in context of data augmentation. His observations show that rich training data may be much more valuable than augmented (i.e., artificially generated) data, and – most importantly – the advantage of a sophisticated algorithm relative to a simple algorithm may not be easily …

Data Augmentation Does Not Necessarily Beat a Smart Algorithm Read More »

Scroll to Top