Context Navigation

Changes between Version 2 and Version 3 of SopUseGenericImporter

Timestamp:: 2012-01-25T13:57:21+01:00 (13 years ago)
Author:: cp229
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

SopUseGenericImporter

-                      v2
+                      v3
-Documentation on Generic Importer
-Motivation: the Excel inputs could vary drastically, we don`t know how many columns there are, what entities in Pheno Model the columns should map to, it is as if that each Excel input should have a corresponding importer in which the mappings have to be specified by the programmers. We need to write a new case-specific importer every time and this is not very durable. Therefore we would like to have a generic importer where there is a user-interface and on which all the mappings can be done by the people without any programming background.
-Where to find and how to use?
-. Simply add the following inside your ui xml file and run your generators again.
-“<plugin name="GenericImporter" type="plugins.GenericImporter.GenericImporterPlugin" label="Generic Importer"/>”
-. For details, see “/molgenis_apps/apps/lifelines/plugins/GenericImporter”.
-Step.1 : select a file that you want to import
-Figure.1 select a file[[Image()]]
-The first time when users come to the plugin, they could choose the file that they want to import, there are two options provided for importing, which are importing by column headers and importing by row columns. Usually we import by column header but a few cases we need to import by row header and there`ll be an example for that in the documentation later on.
-Step.2 specify the mappings between columns and Pheno Model entities
-Figure.2 Map the columns against the Pheno Model entities
-This is the page after we have chosen how to import, for instance importing by column header. The investigation has to be filled out otherwise it won`t continue with that left blank. The most important component is the mapping panel which is highlighted by the blue lines. That`s where we specify which columns map to which entities. As we can see, there are three drop-down boxes for each column. The first drop-down box is used to specify which entity this column should map to, the second drop-down box is used to specify which field it is in the entity, such as a name of Measurement, data type of Measurement. The third drop-down box is used to specify the relation with another columns, “0”  is default which means it dosen`t have any relations at all.
-However, there is the case that input file contains quite a lot columns, such as 30 and even up to 100 columns. It is impossible to select everything by clicking. Therefore we have provided a short-cut function which is highlighted by red lines. We can update the mapping panel by specifying the columns, entities, fields and target. For the continuous columns, e.g. It is observedValues from 10 to 20 columns which have the same target “the third columns”, we could put “10 > 20” in column numbers, select observedValues in drop-down boxes in short cut, type 3 in target field and then we click on update, and all these columns in mapping panel would be updated simultaneously. For the discontinuous columns, e.g. 3, 5, 7, 10, 12 columns are observedValues which have target the third columns, then we could put “3;5;7;10;12”  in the column numbers, the program could split the values by semicolon, we specify the entity, field, target and the mapping panel will be updated.
-Example of Importing by column header
-. DataShaper schema
-Figure.3 Example of small part of DataShaper Schema input file
-This is an example of DataShaper Schema excel input, we do some analysis on which columns should map to which entities before we actually import. The first row highlighted by red color is Pheno Model entities that the columns map to. In figure.3, first column is mapped to Protocol entity. The second column is mapped to Measurement entity, in this case, the measurement is part of protocol therefore in the generic importer we need to specify this relation between them. The third column is description of Measurement, therefore it is mapped to measurement_description and has relation with column 2 (measurement). The fourth column is mapped to measurement_unit and therefore has relation with column 2 (measurement).
-Figure.3 Example of small part of DataShaper Schema input file
-In figure.3, it is the mapping result in mapping panel. Then we could push the button next step. The data will be stored in database.
-Example of Importing by row header
-Figure.4 Example of importing by row header
-This is an excel of prediction models with their associated variables. In this case, the information within a column is inconsistent as it contains not only the Protocol but also contains the Measurement information, therefore we have to import by row header. The first row is mapped to Protocol. The rest of rows are mapped to measurement which are part of protocols. The following figure.5 shows how the mapping is done. We used the shortcut to update all the mappings by using “2 > n” (from second column to last column)
-Figure.5 Mapping result for importing by row column