Lead: Vladimir Lobaskin (University College Dublin) / Deputy: Georgios Gkoutos (University of Birmingham)
Work Package 5 (WP5) aims to integrate the current state of the art tools for data mining and data analysis, utilising a linked data approach that will exploit, extract, and integrate knowledge from all available information (raw experimental and modelling data, and metadata) captured in the NanoCommons data warehouse. These tools, once linked and inter-operable via the NanoCommons platform, will be made available via to the NanoCommons User community.
Physics and chemistry-based materials modelling procedures will be adapted to calculate relevant nanomaterials descriptors and complement the data sets where information gaps are identified. Existing data handling and analysis tools will be further developed, extended and integrated throughout the project, taking into account existing knowledge from chemicals, the additional needs of the nanosafety community due to the larger and more diverse data sources and NM structures. Extracted knowledge will then be organised in formats suitable for predictive modelling. The tools developed within WP5 will be implemented based on interoperable, standards-compliant modular web services maximising cross-talk and interaction between different/diverse sources of data.
Specific objects are to incorporate tools for:
- extracting knowledge from raw experimental data (such as microscopic images or spectral data)
- preprocessing the data before they are sent to modelling services (normalisation, missing data handling, selection of important variables, dimensionality reduction)
- generating theoretical descriptors (such as structural descriptors or quantum mechanical descriptors)
- analysing big omics or “corona” data in terms of identifying the biological mechanisms and pathways associated with toxicity and other adverse effects and producing aggregated biologically enriched descriptors
- harmonising and integrating diverse data and metadata originating from heterogeneous resources, so that homogeneous datasets suitable for modelling are produced
- semantically retrieving ontology annotated data from the project data warehouse.