Data assessment
 

 

What is it?
 

Odoma systematically assesses data quality, in terms of contents and structure, formats, management, and documentation.
 

We are often requested to perform comprehensive OCR/HTR quality assessments or to advise on best practices and standards in digitization workflows.

Results:
 

  • Get a clear overview of the quality and readiness of your data in view of future use 
     

  • Get advice on how to best organize data workflows, assessment, management, and release
     

  • Prepare the ground for interventions to improve data quality and workflows

From data to dataset
 

 

What is it?

A dataset is a collection of data with a clear purpose, structured following a robust data model. Its contents and boundaries are known, allowing for its responsible use within and outside an organization.

 

Odoma is specialized in converting raw data into usable datasets. We support our clients in the task of systematically assessing the current status of their data and data workflows, we then propose interventions according to the client's goals, and we execute data collection, consolidation, and documentation, following the best practices and consolidated standards.

 

Odoma is also able to support the client in the future use of datasets, their management, and in the creation of data-generating workflows, for example for developing AI applications, gaining insight into data science, or releasing datasets publicly (also see Open and FAIR data).

Results:

  • Map and consolidate existing raw data sources
     

  • Improve data workflows in view of future needs
     

  • Understand gaps, issues, and missed opportunities
     

  • Design and develop new data-generating workflows
     

  • Ready your organization for data-driven applications and services

 

High-quality annotations
 

What is it?

AI applications and data quality assessments often require high-quality annotations (ground truth), fully or partially validated by human experts.

 

Odoma is specialized in the design, piloting, execution, and assessment of annotation campaigns for a variety of tasks, including image analysis, text extraction, information extraction, natural language and image processing, and more.

 

Odoma is experienced in using international standards, for example, best practices, guidelines, annotation typologies, data formats, as well as a wide variety of open and commercial annotation tools.

 

Furthermore, thanks to our strong research record, we offer access to advanced AI techniques such as active learning, which helps minimize the amount of annotated data necessary to achieve the client's goal.

Results:

  • Get advice on best practices, international standards and tools for annotation campaigns
     

  • Develop systematic guidelines and annotation workflows to guarantee best quality outcomes
     

  • Run pilots to better design your annotation campaigns
     

  • Successfully conduct end-to-end annotation campaigns with complex dependencies
     

  • Minimize costs and maximize results via Active Learning and other AI techniques in support of expert annotation

 

Data cards and documentation
 

What is it?

Data Cards are structured summaries of essential facts about key aspects of a dataset, needed by stakeholders for its informed and responsible use.

 

Data cards include the essential documentation about a dataset, but also selection criteria, limitations, descriptive data analyses, disclosures, and legal and practical constraints. Data cards are rapidly becoming a standard in data-driven AI applications.

 

Odoma is specialized in documenting datasets independently or in view of their use for AI applications and data analyses.

 

Data cards are but one way we help our clients create high-quality documentation for their datasets, so to facilitate their re-use across the organization. We follow established best practices and standards in our documentation work.

Related  works:

Data card: https://github.com/budh333/UnSilence_VOC/blob/v1.3/Datacard.pdf

Entity recognition in historical colonial archives: https://arxiv.org/abs/2210.02194

Results:

  • Get consistent, comprehensive, intelligible, and concise data documentation that provides clarity while surfacing uncertainties
     

  • Facilitate the re-use of datasets across the organization
     

  • Prepare your datasets for open release or their use in AI applications

 

Open and FAIR data
 

What is it?

The publication of open data is becoming increasingly popular among cultural and creative organizations, as it supports user engagement, creative re-use, research, and education.

Open data should abide to the FAIR principles (Findable, Accessible, Interoperable, Reusable): https://www.go-fair.org/fair-principles.

Thanks to our extensive open science experience, we support our clients in the release of open and FAIR-compliant data, and in the organization of public competitions, shared tasks, and hackathons.

Results:

  • Gain visibility by releasing open data
     

  • Engage with a broad public in education, research, and industry
     

  •  Outsource tasks via open competitions and hackathons