Difference between revisions of "Archive"

Revision as of 15:57, 25 September 2013

Matthias Troyer came to Poly to discuss his use of VisTrails and the problems he was facing with the persistence package.

Fernando, Juliana, Matthias and Remi met on 2013-09-24.

Summary

Persistence only used as a cache

Can’t delete stuff; he deletes and recreates the whole store
He wants to use it to archive correct result, without the other intermediate files that resulted from bogus workflow OR module code
He wants to be able to find his files afterwards. Git revision hash + file reference = impracticable
He doesn’t mind filenames being unreadable if he has some way of finding these from metadata (workflow name, vistrail query, or custom metadata from module code)

Conclusions:

Drop git. If we are only going to use it for storage, and keep a separate database to map from ref uuid/upstream hash to object hash, commits and branches are useless (and are a nuisance because we can’t rewrite history)
Use a flat object store with hashes (like git’s)
Use a database to associate hash filename (upstream modules hash?) to metadata: vistrails parameter, (execution info), custom metadata
Request: make this separate from VisTrails (and used by the new archive package) so that it can be used directly by other code, and used to find files in the store

Which database?

MongoDB

MongoDB was suggested by Fernando. Its role is precisely to store key-value pairs associated to an id.

Cons:

MongoDB required
Reliability?
An index would need to be created for each attribute

find({'key1': 'value1', 'key2': 'value2'})

SQL

This would required a filename/key/value table (or even, one per value type). JOINing could be painful.

Cons:

Join on key/value table
Values with different types will need different tables?

SELECT a.filehash, a.value AS value1, b.value AS value2
FROM files a
INNER JOIN files b ON a.filehash = b.filehash AND b.key = 'key2' AND b.value = 'value2'
WHERE a.key = 'key1' AND a.value = 'value1'
GROUP BY filehash

(see jsfiddle)

PostgreSQL's hstore type

http://www.postgresql.org/docs/9.3/static/hstore.html

hstore is a key-value store as a single value (i.e. in a column).

SELECT * FROM files WHERE metadata @> '"key1"=>"value1","key2"=>"value2"'::hstore;

@@ Line 41: / Line 41: @@
   WHERE a.key = 'key1' AND a.value = 'value1'
   GROUP BY filehash
+(see [http://sqlfiddle.com/#!2/73024/1/0 jsfiddle])
 === PostgreSQL's hstore type ===

Difference between revisions of "Archive"

Revision as of 15:57, 25 September 2013

Contents

Summary

Which database?

MongoDB

SQL

PostgreSQL's hstore type

Navigation menu

Search