Changes between Initial Version and Version 1 of MolgenisFile


Ignore:
Timestamp:
2011-01-20T16:12:42+01:00 (14 years ago)
Author:
jvelde
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MolgenisFile

    v1 v1  
     1= MolgenisFile =
     2
     3Managing files becomes increasingly more important in some of our projects that deal with large, preformatted datasets. Also, many results are files of non-relational nature such as images or documents.
     4
     5I would like to present some thoughts and principles here on how to better deal with such files in a database (Molgenis) context. I do not pretend to have the best solutions, nor do I think it is clever to introduce features already present in Molgenis.
     6
     7Instead I would like to open a constructive discussion on how to use this work and/or its principes to improve Molgenis and the way we design software dealing with the challenges it addresses. I hope you find it at least informative or inspirational :)
     8
     9- Joeri
     10
     11== Overview ==
     12
     13Here we explain the differences between two ways to treat files, and how they can be harmonized.
     14
     15=== Field is file ===
     16
     17Molgenis has a field type 'file' which allows you to store and retrieve files. This is a solid mechanism that works just fine for most applications. See the molgenis [http://www.molgenis.org/wiki/FieldElement guide].
     18
     19For advanced users however, there are some limitations to this. For instance:
     20
     21The storage directory is hardcoded in properties file. Not ideal, because:
     22* Often cannot redeploy elsewhere without editing this file (ie. application not portable)
     23* No way to check whether the path is correct, nor if tomcat/java has rights to use it for read/write actions
     24
     25The file is a field and not an entity of its own. This means:
     26* There is no straightforward way to attach a plugin for eg. viewing the file
     27* Adding decorators, extensions, etc on is not possible in a suitable way
     28
     29(see note B )
     30 
     31 === Entity is file ===
     32 
     33To make the mechanism more open and flexible, the MolgenisFile entity was added to the [http://www.molgenis.org/svn/gcc/trunk/handwritten/datamodel/shared/core.xml core] datamodel.
     34 
     35The model without descriptions:
     36 
     37{{{
     38<entity name="MolgenisFile" abstract="false" implements="Nameable" decorator="decorators.MolgenisFileDecorator">
     39        <field name="Extension" nillable="false" length="8" />
     40</entity>
     41}}}
     42
     43This basic entity represents a file. It has two attributes: file name and extension. The extension is important, because it is used to map the MIME type at runtime. For example, 'png' will be served out as 'image/png'.
     44
     45More about the attributes and subclassing later on.
     46
     47 == Merging ==
     48 
     49If the entity way of handling files is a good idea, it would be very feasable to combine the two and use the best of both worlds. The model and classes that deal with filehandling could be put into the Molgenis source so they are always available and centrally updated.
     50
     51When using file as a field, it would secretly simply be an XREF to the MolgenisFile table, so the user does not notice a difference at all. However, it would allow freedom for developers because the files can also be treated as entities.
     52
     53Developers can extend upon the MolgenisFile definition and handlers to tailor projects their specific needs, while keeping the field + XREF construction for the end users. There does not have to be a conflict with the current implementation :)
     54
     55== Technical ==
     56
     57Here we delve into the cool stuff on how to exploit new possibilities.
     58
     59=== Decorating ===
     60
     61When file is an entity, we can use a decorator to influence its behaviour. The decorator is automatically applied to all the subclasses of the entity as well.
     62
     63Basically, the decorator takes care of the mapping of the entity (any MolgenisFile) to the file on the filesystem. It does things like:
     64
     65* Names are 'escaped' to filesafe versions (eg. strange characters removed)
     66* Names must be unique when escaped (handy for finding/downloading)
     67* Files need to be renamed when the name is changed
     68* Files need to be deleted properly when the record (entity) is removed
     69* Extensions must be correct
     70* And so on. Informative errors are thrown when something isn't going right.
     71
     72The code can be found [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/decorators/MolgenisFileDecorator.java here]
     73
     74=== Setting storage location ===
     75
     76Before you can start storing files, you need a validated storage location. There is a plugin that helps you do this [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/plugins/system/settings/ here].
     77
     78The idea is as follows:
     79* In a running application (deployed anywhere) you browse to the plugin. Preferably an administrator - we should hide this plugin from others.
     80* You type in the preferred storage path, and click 'Set path' to save it.
     81* Now, you must run two tests which both need to succeed before this path is marked as validated.
     82* When the tests are successful, your path is marked VALIDATED and you can store MolgenisFiles.
     83
     84[[Image(storagedirplugin.png)]]
     85
     86If the tests fail, refer to the error message and fix what is wrong. Maybe the database is not accessable, tomcat/java lacks rights on this directory, directory is not a valid path, etc. Some information about the location is also displayed: Does it already exists? Are there files in it?
     87
     88The path is stored in a special table which is located inside your selected database, but outside the range of tables accessable by your application.
     89
     90For testing purposes, the path can be set and receive validated status manually. (see note A )
     91
     92=== Java API ===
     93
     94The API has two layers: BasicFileHandler and MolgenisFileHandler, which extends BasicFileHandler.
     95
     96[http://www.molgenis.org/svn/gcc/trunk/handwritten/java/filehandling/generic/BasicFileHandler.java BasicFileHandler] tells you the most basic information. For example, give me the common file storage directory for my application as a Java 'File' object. For example:
     97
     98{{{
     99BasicFileHandler bfh = new BasicFileHandler(db);
     100File fileStorage = bfh.getFileStorage()
     101}}}
     102
     103[http://www.molgenis.org/svn/gcc/trunk/handwritten/java/filehandling/generic/MolgenisFileHandler.java MolgenisFileHandler] is a direct extension of BasicFileHandler and is constructed in the same way.
     104
     105Mostly focused on 'MolgenisFile' objects, you can get information or manipulate files using functions such as getFile(), deleteFile(), findFile(), getStorageDirFor().
     106
     107{{{
     108MolgenisFileHandler mfh = new MolgenisFileHandler(db);
     109File myRealFile = mfh.getFile(myMolgenisFile);
     110File storageForFileType = mfh.getStorageDirFor(myMolgenisFile);
     111}}}
     112
     113Note that each 'type' of MolgenisFile has its own subdirectory, and your application name is used as part of the storage location. For example: You have set your path to "/data/xgap" and deploy the application as "ngspipeline". You created an entity 'Video extends MolgenisFile'. A video file "result.mpg" would be saved as "/data/xgap/ngspipeline/video/result.mpg". This makes manual tasks such as browsing or backing up files on your filesystem easier.
     114
     115=== Services: uploading and downloading ===
     116
     117'''Uploading''' means creating a new MolgenisFile record, plus put the file in the correct place. There is a simple upload [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/filehandling/servlet/Upload.java servlet] to do this.
     118
     119For a basic MolgenisFile, the servlet expects to receive:
     120* name = The name under which the file should be stored
     121* type = The type (subclass) of MolgenisFile, in this case: 'MolgenisFile'
     122* file = A filestream with the content of your file you wish to store
     123
     124The servlet can be called in many ways, for example with [http://gbic.target.rug.nl/forum/showthread.php?tid=107 RCurl] or regular commandline [http://gbic.target.rug.nl/forum/showthread.php?tid=86 cURL].
     125
     126Cool thing nr.1:
     127
     128The servlet is detached from the actual procedure that handles creating the database records and storing the file. This is another Java API. See code [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/filehandling/generic/PerformUpload.java here].
     129
     130This means you can store files from anywhere in Java sourcode by using the static doUpload() function. There are two flavours:
     131
     132{{{
     133doUpload(Database db, MolgenisFile mf, File content)
     134}}}
     135
     136Which needs a database object, a MolgenisFile definition, and a File pointer to the content. Example usage:
     137
     138{{{
     139File content = request.getFile("upload");
     140PerformUpload.doUpload((JDBCDatabase) db, this.model.getMolgenisFile(), content);
     141}}}
     142
     143And the second:
     144
     145{{{
     146doUpload(Database db, boolean useTx, String name, String type, File content, HashMap<String, String> extraFields)
     147}}}
     148
     149Which requires some low-level specifications instead of a 'MolgenisFile' object. Example usage:
     150
     151{{{
     152//upload as a MolgenisFile, type 'BinaryDataMatrix'
     153HashMap<String, String> extraFields = new HashMap<String, String>();
     154extraFields.put("data_name", data.getName());
     155PerformUpload.doUpload(db, true, data.getName()+".bin", "BinaryDataMatrix", binFile, extraFields);
     156}}}
     157
     158Cool thing nr.2:
     159
     160The upload services will ask for the additional fields of a subclass if you forget them! For example, if you have a 'Image extends MolgenisFile', and add a field to this subclass:
     161
     162{{{
     163<field name="Investigation" type="xref" xref_entity="Investigation" />
     164}}}
     165
     166Then the upload API will want you to provide in an 'investigation_name', or report back an error if you don't. (e.g. "Missing needed field 'investigation_name' for MolgenisFile type 'Image'")
     167
     168'''Downloading''' is as simple as can be. All you need to do is provide the name of the MolgenisFile to the download [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/filehandling/servlet/Download.java service], and it will return a download (outputstream) with the file content.
     169
     170The cool thing here is that MIME types are automatically mapped to the file extension, so your browser will know what to do with this type of file.
     171
     172{{{
     173response.setContentType(sc.getMimeType(mf.getExtension()));
     174}}}
     175
     176Use the service by calling:
     177
     178{{{
     179http://255.255.255.255:8080/xgap_1_4_distro/download.do?name=SomeFile
     180}}}
     181
     182Just like the Upload servlet, it wraps a Java API (MolgenisFileHandler) that you can use elsewhere. (see sourcecode)
     183
     184=== Practical example ===
     185
     186Let's walk through a practical example on how to use all this stuff, step-by-step.
     187
     188Say we want to store images in a Molgenis database. These images are coupled to an 'Investigation'. Start by adding the entity in the datamodel, extending MolgenisFile:
     189
     190{{{
     191<entity name="Image" extends="MolgenisFile">
     192        <field name="Investigation" type="xref" xref_entity="Investigation" />
     193        <unique fields="name,Investigation" description="Name is unique within an investigation" />
     194</entity>
     195}}}
     196
     197Now add a GUI component. We nest a small [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/plugins/molgenisfile/ plugin] to the form that will allow us to upload and view the images that belong to the records.
     198
     199{{{
     200<form name="Images" entity="Image">
     201        <plugin name="Viewer" type="plugins.molgenisfile.MolgenisFileManager" />
     202</form>
     203}}}
     204
     205After generating, browse to the image section of the GUI. Create a new record as normal.
     206
     207[[Image(newimagerecord.png)]]
     208
     209The plugin appears, telling you there is no source file. Select a picture and press upload.
     210
     211[[Image(imguploaded.png)]]
     212
     213Done! If you take at a look at your filesystem, you'll find it back at your storage path + app name + MolgenisFile type, meaning:
     214
     215[[Image(imglocation.png)]]
     216
     217The plugin that we use here is very simple, and only wraps the upload and download services.
     218
     219Here's the upload code:
     220
     221{{{
     222File content = request.getFile("upload");
     223PerformUpload.doUpload((JDBCDatabase) db, this.model.getMolgenisFile(), content);
     224this.setMessages(new ScreenMessage("File uploaded", true));
     225}}}
     226
     227And the viewer simply puts an IFRAME around a download:
     228
     229{{{
     230<iframe width="750px" height="600px" src="download.do?name=mypicture">
     231}}}
     232
     233The plugin is extensible to use different viewers for different MIME types. For example, we have *.fig files (which are in essence text files), representing a figure. Instead of looking at the text, we want to use an applet to display a graph. Inside the viewer part of the plugin, we add:
     234
     235{{{
     236<#if model.molgenisFile.extension == 'fig'>
     237        <applet code=jfig.gui.JFigViewerApplet>
     238}}}
     239
     240And now the applet appears when we view *.fig files.
     241
     242=== Java API extension example ===
     243
     244To be able to store and manage datamatrices (a special datatype) in file backend sources, while reusing the MolgenisFile handlers to do so, we extend them.
     245
     246We define of a matrix backend file as an Entity. MolgenisFile is extended, and furthermore we have an XREF to 'Data' to link the matrix metadata to the files.
     247
     248{{{
     249<entity name="BinaryDataMatrix" extends="MolgenisFile">
     250        <field name="Data" type="xref" xref_entity="Data" description="Reference to the datamatrix this binary file belongs to." />
     251</entity>
     252}}}
     253
     254By doing so, we get all the services, handlers and decorators for free. We don't have to worry anymore about placing the file in the correct location, renaming, deleting, serving it out, interaction with other records in the database, etc.
     255
     256But since this is a special datatype, we need more. For example:
     257* We would like to use the 'Data' definition to find, verify or delete backend files
     258* We would like to create instances of 'Matrix' using the 'Data' definition, regardless of the location of the backend file
     259
     260For this purpose, we created [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/matrix/general/DataMatrixHandler.java DataMatrixHandler], which extends MolgenisFileHandler.
     261
     262(see note C )
     263
     264A few usage examples.
     265
     266Check if the data elements for this data matrix are stored in the database:
     267
     268{{{
     269DataMatrixHandler dmh = new DataMatrixHandler(db);
     270if (data.getStorage().equals("Database"))
     271{
     272        if (dmh.isDataMatrixStoredInDatabase(data))
     273        {
     274                throw new DatabaseException("Database source already exists for source type '" + data.getStorage() + "'");
     275        }
     276}
     277}}}
     278
     279Iterate through all 'Data' definitions in the database and create a list of BinaryMatrix instances. (only succeeds if they all are!)
     280
     281{{{
     282List<BinaryDataMatrixInstance> bmList = new ArrayList<BinaryDataMatrixInstance>();
     283for (Data data : db.find(Data.class)) {
     284        BinaryDataMatrixInstance bm = (BinaryDataMatrixInstance) new DataMatrixHandler(db).createInstance(data);
     285        bmList.add(bm);
     286}
     287}}}
     288
     289Find the 'Data' definition that belongs to this MolgenisFile in a constructor wrapper:
     290
     291{{{
     292public CSVDataMatrixInstance(Database db, MolgenisFile mf) throws Exception
     293{
     294        DataMatrixHandler dmh = new DataMatrixHandler(db);
     295        new CSVDataMatrixInstance(dmh.findData(mf), dmh.getFile(mf));
     296}
     297}}}
     298
     299Check if this 'Data' is stored as a binary file:
     300
     301{{{
     302DataMatrixHandler dmh = new DataMatrixHandler(db);
     303Data dm = db.find(Data.class).get(0);
     304dmh.isDataStoredIn(dm, "Binary");
     305}}}
     306
     307Create an instance of ANY matrix, regardless of storage mechanism:
     308
     309{{{
     310db = new JDBCDatabase("xgap.properties");
     311Data data = db.find(Data.class).get(0);
     312DataMatrixHandler dmh = new DataMatrixHandler(db);
     313AbstractDataMatrixInstance<Object> myMatrix = dmh.createInstance(data);
     314}}}
     315
     316== Notes ==
     317
     318A. Manual setting of path, not recommended. If path is 'C:\data', do an SQL insert:
     319
     320* create table systemsettings_090527PBDB00QCGEXP4G (filedirpath VARCHAR(255), verified BOOL DEFAULT 0);
     321* insert into systemsettings_090527PBDB00QCGEXP4G (filedirpath, verified) values ('C:\data', 1);
     322
     323B. Are my statements here even correct? :)
     324
     325C. This part is terribly nerdy and incomprehensible I think..