Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of DespoinaLog/2010/04/29

Timestamp:: 2010-10-01T23:19:13+02:00 (15 years ago)
Author:: trac
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

DespoinaLog/2010/04/29

                       v1
+= Next Steps , notes & links on OpenData, nano publications & Lucene Build Index on cvs file extracted by molgenis csv extract.   =
+== Next Steps :  ==
+. Build an index (using Lucene) in Molgenis on a single database table. (Gene table from SyndromeBook).
+   * we have two options:
+. use command line lucene . Need files for that. So we can export csv (molgenis csv export - Joeri) . The problem with that is that we wil have to export every single column or row in a file (see how that can be done) so the search is efficient. We need separate files for every single information (valid form) from the table . So an option is per column? or per row?. Explore that .
+. There is an experimental database (Joeri ) we could also try that . The main idea is to generalize that and later make an index on more db tables, and potentially build a generator . The goal later is to place a google search on patient data on top of Molgenis.
+. The fact is that there are several sets (molgenis) : database- model - system and we can use this lucene based indexing engine to create and index and be used by the search box that could be able to look in these dbs .''' If we generalize that to ALL DBs in molgenis we have the search inside patient data.'''
+. use lucene java call inside a plugin in molgenis for the table in (1).
+. Search not though ontocat but lucene in specific DBs . A start point is SyndromeBook's DB table : gene .
+http://www.ebi.ac.uk/ebisearch/search.ebi?db=allebi&requestFrom=searchBox&query=brca&FormsButton3=Go
+== Other notes : ==
+ * Peregrine is running in Concept wiki so that data production in TRIPLEs is feasible. --> nano publications (some steps are missing , but we get the idea) --> rdf
+ * Hypothesis data / Real data /evidence  --> experiments ---> STATEMENTS --> triples  --> semantic web
+   * molgnenis producing triples ?? (experimental DB - Joeri) future plans.
+ * About the servlet version of search on top of ontocat , if you use a servlet , you are not REST (architecture) .
+Ontocat is retuning keywords , how about links? or more specific studies about the specific term.
+ * http://www.nbic.nl/about-nbic/affiliated-organisations/cwa/introduction/
+ * http://esw.w3.org/images/c/c0/HCLSIG$$SWANSIOC$$Actions$$RhetoricalStructure$$meetings$$20100215$cwa-anatomy-nanopub-v3.pdf
+ * [http://www.w3.org/TR/xhtml-rdfa-primer/ Rdfa] : "I''n this paper we explore the extra components that would need to be available to reinforce the value of a statement to the point where it could in itself be considered a publication. " - nanoPublication..''"
+ * "''[http://www.nbic.nl/about-nbic/affiliated-organisations/cwa/introduction/ The Concept Web Alliance (CWA) is a non- profit organization whose mission is “to enable an open collaborative environment to jointly address the challenges associated with high volume scholarly and professional data production, storage, interoperability and analyses for knowledge discovery]''.” "==== '''...and more analyric perpective on each '''
+ * [http://www.nbic.nl/about-nbic/affiliated-organisations/cwa/declaration/ A declaration']
+==== aspect of the as called "core model":''' ''' ====
+ * "''Our core model addresses some key requirements that stem from existing publication practices and the need to aggregate information from distributed sources. Similar to standard scientific publications, nano-publications need to be citable, attributable, and reviewable. Furthermore, they need to be easily curated. Nano-publications must be easily aggregated and identified across the Web. Finally, they need to be extensible to cater for new forms of both metadata and descriptio''n."
+   * '''''"aggregate Information from distributed sources".''.this is really important . The key is not to create more and more resources out of the existing ones, but try to provide and efficient and accurate serving/presentation  of the existing valid ones. use standards that could actually point/refer to the core of your actual search, in a way that the information is distributed in an organized and consistent manner. '''
+==== links ====
+http://www2005.org/cdrom/docs/p613.pdf
+http://4store.org/
+http://tagora.ecs.soton.ac.uk/eswc2009/
+http://wiki.dbpedia.org/Downloads351
+ * ''"Numerous scientists have pointed out the irony that right at the historical moment when we have the technologies to permit worldwide availability and distributed process of scientific data, broadening collaboration and accelerating the pace and depth of discovery…..we are busy locking up that data and preventing the use of correspondingly advanced technologies on knowledge"''
+ * ''http://sciencecommons.org/''
+=== STEP 1 : Create a Lucene Index (command line) using Molgenis svn extract for database hvp_pilot . ===
+. New test csv class : call csv export (/Users/despoina/Documents/workspace/hvp_pilot/handwritten/java/plugins/test_csv.java)
+. . Done . File in : CVS molgenis export directory : /private/var/folders/to/toww8wCyG3a88-qsfLyIV++++TI/-Tmp-/
+. Cvs export in molgenis does not export every single valid quantity of information , like columns. just the database  .
+. In command prompt cd to cdLucene lucene directory  , and try : (after you have copied your csv file in a directory here _syndrome_book_data_
+. $$$$ java org.apache.lucene.demo.!IndexFiles _syndrome_book_data_/
+ * Now you can search your index by typing :
+. $$$$$$ java org.apache.lucene.demo.!SearchFiles
+. example search :glycoprotein
+. ok ther it is in a single file ...
+. CUSTOMIZING lucene : ..more output ...[http://lucene.apache.org/java/3_0_1/queryparsersyntax.html#Fuzzy Searches http://lucene.apache.org/java/3_0_1/queryparsersyntax.html#Fuzzy Searches]
+     * http://lucene.apache.org/java/3_0_1/gettingstarted.html
+==  ==