Authors: Thomas Exner, Dieter Maier, Tassos Papadiamantis and Iseult Lynch
In order to support as many projects as possible with their data management needs and become a one-stop shop to search and retrieve all available nanosafety related data, NanoCommons offers two ways to integrate project data into the NanoCommons knowledge base and supports them via Transnational access (TA) applications. Figure 1 shows a schematic representation of how the data from the different providers flows through the data ecosystem and is annotated and mapped to make it accessible via the NanoCommons Knowledge Base. Here we use specific examples to demonstrate how both ways work in practice and how the end user sees the data in the NanoCommons Knowledge Base.
Figure 1: Data workflow from the data providers to the specific data warehouse in which the data is stored and the steps required to harmonise the data via the semantic interoperability layer provided by the NanoCommons Knowledge Base to federate the data warehouses and enable cross-warehouse access via a single access point with a user friendly interface. Dark arrow colours represent existing links while light colours are ongoing or planned extensions to increase the data coverage.
The first way is tailored to data that is not shared in another data management solution but is to be made publicly available for the first time. The NanoFASE project data is an excellent example here. The project collected data following specific data and metadata templates designed with the help of NanoCommons. These were then uploaded into the NanoCommons Data Warehouse and structured for easy use according to the data model of the semantic interoperability layer. Since the NanoCommons Data Warehouse is a central part of the NanoCommons Knowledge Base, there is no real distinction between the two and the data ends up in the Knowledge Base immediately upon upload and can be directly accessed (by the project consortium) and made publicly available immediately or after an embargo period (see Figure 2).
Figure 2: Entry page of the NanoCommons Knowledge Base showing the growing list of available data sources.
By selecting the specific tab, the NanoFASE data can be accessed including structured areas, for example data sets, nanomaterials, protocols, publications and nanomaterials instances – i.e. points where the properties of the nanomaterials may change such as upon dispersion, exposure to organisms, release to the environment etc. (see Figure 3). Since this last area is how nanomaterials should ideally be captured to allow correlation of their transformations with their effects allowing understanding of the evolution of the nanomaterial throughout its life cycle and its fate in different environments, the way the NanoCommons Knowledge Base represents these nanomaterial instances is also shown in Figure 3.
Figure 3: NanoFASE area in the NanoCommons Knowledge Base also showing the instances subarea to illustrate how the properties of the nanomaterial can be tracked along its life cycle and at points of likely transformation.
The second way to integrate dataset into NanoCommons is designed for data that is already available in an existing public data warehouse (e.g. AMBIT, GEO, PRIDE etc.). The goal is here to make the data searchable and accessible within the NanoCommons infrastructure without replicating it. Such replicates always bring issues with keeping the data up to date and having different versions on different data platforms. As an example, we use here the eNanoMapper data warehouse and other project-specific databases operated by Ideaconsult Ltd. A TA project was utilised to prepare the semantic integration of the eNanoMapper metadata into the NanoCommons Knowledge Base data model. In this way the external data warehouse is then recognised in exactly the same way as the internal NanoCommons Data Warehouse (see Figure 2 above) and the user doesn’t see any difference between metadata stored in NanoCommons directly or imported from other data warehouses like eNanoMapper. As shown in Figure 4, metadata on endpoints from all data warehouses running on the eNanoMapper/AMBIT system can be seen in the NanoCommons Knowledge Base and are included in searches for relevant data. The actual data is then accessible from the original system, directly guaranteeing that access rights are handled correctly.
Figure 4: Listing of the endpoints available in the eNanoMapper/AMBIT databases as provided by the NanoCommons Knowledge Base.Any project or group with nanosafety datasets or databases that would like to integrate with NanoCommons are invited to contact us for details (email or contact form) and where needed to apply for Transnational Access to support their data integration activities.