addedValues Plugin

...Powerful Free! Database Expansion for Manila
logoBottle:

(1 or more words)


Get tropes here!
Click to see internals
Report bug


Saint Louis Zoo

sitepic_stlzoo.jpg:
Viewable with Any Browser

Members
Join Now
Login

Metadata rewrite - Index Routines

Design principles

  1. Indexes are one dimensional tables of native values sorted by value.
  2. Optimised for the questions to be asked. A query never goes to the message/gem/shortcut - if a property isn't indexed, it cannot be the object of query. The tailored scripts for a metadata variable are rewritten if indexing is turned on or off.
  3. All the table entry names use the address of the indexed entity - the message , gems etc - with the root file name stripped off. Multiple values are handled by qualifying that name.

The table being sorted by value has important consequences for the types of search we can do efficiently. Since Frontier does this automatically and much faster than we can in Usertalk with any conceivable algorithm, we have bypassed the great weakness of ASE but of course with the limitation that it only works for data sets of limited size. As I mentioned, this covers how most people use metadata anyway. The trick is to retain the capability to index by ASE as an option , if the problem warrants it.

Using the address as the table index permits retrieving indexed values without accessing the message table (gem, whatever) - this makes better use of Frontiers awful memory routines. It also means that when a search is performed the hits give us the address of the entity containing the value. coercion string to address is another fast operation.

Scalars are simply indexed as (short address, native value) entries.

Multiple values (which btw are stored as tables in the message variable) are indexed as a series of values (sequence number <tab> shortaddress, value) entries - the name resembles xml tables names. Extending this allows multiple values to be indexed in different ways, so now we have ordered multiple values. haven't thought this all the way through yet.

short strings (aka identifiers) are stored in the same way as scalars - limit 63 , is there a magic number for Frontier?

All access to indices are through tailored scripts, which are generated at variable define time and kept in a methods table. these scripts are very simple and contain as few conditionals as possible. For example every variable will have its own unique get script at infoTbl.xxx.methods.get

 on get(adrMsg)  for property xxx
    try
       return adrMsg^.plugin.metadata.xxx
    else
        return literal default value

if the value is indexed, access to the message will be avoided and we use instead (this is for memory optimisation reasons)

 on get(adrMsg)  for property xxx
    try
       return myManilawebsite.["#plugins"].data.metadata.indices.[string.popfilefromaddress(adrMsg)]
    else
        return literal default value

Note to self : I may have to write string.popFilefrom Address for speed, as it isn't kernelised.

string.mid(string(adrMsg), 1, myManilawebsite.["#plugins"].data.metadata.constants.filePrefixLen)

because I can compute the value in filePrefixLen at root open time once and store it.

The put script builds the index if the variable is indexed. Sorted by value tables, means the order is done by Frontier.

on set(adrMsg, value)    //simple scalar case 
    try {delete(@myManilawebsite.["#plugins"].data.metadata.indices.[string.popfilefromaddress(adrMsg)])}     //de-index code - absent if not indexed
 adrMsg^.plugin.metadata.xxx = coerceTo(value, <type>)
 myManilawebsite.["#plugins"].data.metadata.indices.[string.popfilefromaddress(adrMsg)]  = value   //add to index  absent if not indexed

Search Operations

All operations are based on binary chops of the flat table. Lots of cases, but they are all handled again by tailored scripts in the methods table.

Booleans

Basic idea is binary chop to next highest after sought value (ix).

Operators supported

Notice all this works for multiple value indexes too.

Integer/Decimal/Date

Basic idea is binary chop to next highest after sought value (ix).

Operators supported

Notice all this works for multiple value indexes too.

Strings

Operators supported

needs work.

Text

word index - stored as nested table for size limiting firstchar, secondchar, then word has value of key Operators supported

needs work.