Mongodb Strategy - 记往开来

1. Two-phase commits
2. Concurrency control
3. Model

1 Two-phase commits

Two-phase commits is used to update multiple documents as a fake atomic operation. The principle of two-phase commits is creating temporary, inter-media records to support rollback operations

2 Concurrency control

The first method is adding a label to indicate the current accessing application.

The second method is using the old value of files to update as a part of query to ensure that the target fields of current document is not updated.

var myDoc = db.COLLECTION.findOne(condition);
if(myDoc){
   var oldValue = myDoc.value;
   var results = db.COLLECTION.update(
       {
           _id: myDoc._id,
           value: oldValue
       },
       {
           $inc:{value: -fee}
       }
       )
}

Another possible option is: if you want to deduct an amount of money from an account and the rest money should not be negative, you can use following commands:

db.COLLECTION.update({_id: id, value: {$gte: fee}}, {$inc:{value: -fee}})

It means the account will be updated only if the value is enough.

The third method is add an unique field (version) to the document.

3 Model

3.1 Relationships between documents

3.1.1 One-to-One Embeded

Example: patron and address. User informations should be put together, such as the account info (password, email) and application relative info (balance in account).

3.1.2 One-to-Many Embeded/References

It is used for one-to-few such as the authors of a book. One can use array in a document to model such relations. However, if the values of embedded elements are few, it is better to save them in another documents using references. For example, the publishers of books. Another principle is DO NOT create large arrays in document. Large arrays in document is inefficient for three reasons:

When document size is increased, the MongoDB will move the document, which lead to rewriting the entire document.
Indexing the elements in array is inefficient.
Query a large document for a small part of array is inconvenient.

3.2 Tree Structures

3.2.1 References

Parent Refs.
Save each node as a document that contains the reference to the parent
Child Refs
Save each node as document that contains an array of references to children nodes.
Extension
It is also useful to add additional relations such as ancestors of nodes into the document.

Another way to accelerate the searching of a tree is materialize paths. This method add a path attribute that describe the path using a string. Searching such string requires using of regular expression, but is faster than the previous solution.

3.2.2 Nested Sets

This method is tricky. It likes a binary tree that use values to indicate the relative nodes.

 db.categories.insert( { _id: "Books", parent: 0, left: 1, right: 12 } )
 db.categories.insert( { _id: "Programming", parent: "Books", left: 2, right: 11 } )
 db.categories.insert( { _id: "Languages", parent: "Programming", left: 3, right: 4 } )
 db.categories.insert( { _id: "Databases", parent: "Programming", left: 5, right: 10 } )
db.categories.insert( { _id: "MongoDB", parent: "Databases", left: 6, right: 7 } )
 db.categories.insert( { _id: "dbm", parent: "Databases", left: 8, right: 9})

 var databaseCategory = db.categories.findOne( { _id: "Databases" } );
 db.categories.find( { left: { $gt: databaseCategory.left }, right: { $lt: databaseCategory.right } })

One can retrieve the children nodes by using the left and right indicators.

3.3 Index

One of the main benefit of indexing is sorting, the application can quickly find the matched document by traveling through the ordered indices very quickly

Single field index
Compound key index
Note that if one set compound index by
```
db.COLLECTION.ensureIndex({key1: 1, key2: -1})
```
It can help the sort funciton using {key1: 1, key2: -1}, {key1: -1, key2: 1}, but cannot help sorting with the key {key1: 1, key2: 1}.
Multi-key index
It refers to using array as index. You cannot using compound key for two arrays.
Hased index
It is not compatible with multi-key index.
Geospatial index
This index is used to index the geometric data, such as lines, points, shapes.
Text index
Support text search with language stop words, will use very large space.
Hashed index
Compute the hash for entire document while ??collapse?? the sub-document.

Index also support properties like:

TTL index is used to remove outdated index
Uniqu index, which reject duplicate value of indexed filed for all the document. ??Can we use it to ensure that elements in array has unique value in document (can be repeat in different document)??
Sparse index, documents without the field are not indexed. By default, these documents will be indexed to null.

3.4 Others

Atomic operation update, findAddUpdate, remove are atomic. Put fields in one document can ensure the write operations are atomic.
Support keyword search Putting strings in an array, then create a multi-key index enables keyword search.
Keyword search cannot provide NLP/IE functions, such as stemming, synonyms, ranking, which are usually required by search engine.
Document limit
- Field names cannot start with $, cannot contain ., cannot contain null.
- Field value has a maximum index key length limit to be used as index.
- Maximum document size is 16M. GridFS supports large file, which is represented by a group of files that contain different pieces of contents.
- _id field may be any BSON data type except the array. It is ObjectId by default.
DBRef
DBRefs are used to representing a document rather a specific reference type. It contains the name of collection (optional: database name).

Table of Contents