Mongodb Strategy
Table of Contents
1 Two-phase commits
Two-phase commits is used to update multiple documents as a fake atomic operation. The principle of two-phase commits is creating temporary, inter-media records to support rollback operations
2 Concurrency control
The first method is adding a label to indicate the current accessing application.
The second method is using the old value of files to update as a part of query to ensure that the target fields of current document is not updated.
var myDoc = db.COLLECTION.findOne(condition); if(myDoc){ var oldValue = myDoc.value; var results = db.COLLECTION.update( { _id: myDoc._id, value: oldValue }, { $inc:{value: -fee} } ) }
Another possible option is: if you want to deduct an amount of money from an account and the rest money should not be negative, you can use following commands:
db.COLLECTION.update({_id: id, value: {$gte: fee}}, {$inc:{value: -fee}})
It means the account will be updated only if the value is enough.
The third method is add an unique field (version) to the document.
3 Model
3.1 Relationships between documents
3.1.1 One-to-One Embeded
Example: patron and address. User informations should be put together, such as the account info (password, email) and application relative info (balance in account).
3.1.2 One-to-Many Embeded/References
It is used for one-to-few such as the authors of a book. One can use array in a document to model such relations. However, if the values of embedded elements are few, it is better to save them in another documents using references. For example, the publishers of books. Another principle is DO NOT create large arrays in document. Large arrays in document is inefficient for three reasons:
- When document size is increased, the MongoDB will move the document, which lead to rewriting the entire document.
- Indexing the elements in array is inefficient.
- Query a large document for a small part of array is inconvenient.
3.2 Tree Structures
3.2.1 References
- Parent Refs.
Save each node as a document that contains the reference to the parent
- Child Refs
Save each node as document that contains an array of references to children nodes.
- Extension
It is also useful to add additional relations such as ancestors of nodes into the document.
Another way to accelerate the searching of a tree is materialize paths. This method add a
path
attribute that describe the path using a string. Searching such string requires using of regular expression, but is faster than the previous solution.
3.2.2 Nested Sets
This method is tricky. It likes a binary tree that use values to indicate the relative nodes.
db.categories.insert( { _id: "Books", parent: 0, left: 1, right: 12 } ) db.categories.insert( { _id: "Programming", parent: "Books", left: 2, right: 11 } ) db.categories.insert( { _id: "Languages", parent: "Programming", left: 3, right: 4 } ) db.categories.insert( { _id: "Databases", parent: "Programming", left: 5, right: 10 } ) db.categories.insert( { _id: "MongoDB", parent: "Databases", left: 6, right: 7 } ) db.categories.insert( { _id: "dbm", parent: "Databases", left: 8, right: 9}) var databaseCategory = db.categories.findOne( { _id: "Databases" } ); db.categories.find( { left: { $gt: databaseCategory.left }, right: { $lt: databaseCategory.right } })
One can retrieve the children nodes by using the left and right indicators.
3.3 Index
One of the main benefit of indexing is sorting, the application can quickly find the matched document by traveling through the ordered indices very quickly
- Single field index
- Compound key index
Note that if one set compound index bydb.COLLECTION.ensureIndex({key1: 1, key2: -1})
It can help the
sort
funciton using{key1: 1, key2: -1}
,{key1: -1, key2: 1}
, but cannot help sorting with the key{key1: 1, key2: 1}
. - Multi-key index
It refers to using array as index. You cannot using compound key for two arrays. - Hased index
It is not compatible with multi-key index. - Geospatial index
This index is used to index the geometric data, such as lines, points, shapes. - Text index
Support text search with language stop words, will use very large space. - Hashed index
Compute the hash for entire document while ??collapse?? the sub-document.
Index also support properties like:
- TTL index is used to remove outdated index
- Uniqu index, which reject duplicate value of indexed filed for all the document. ??Can we use it to ensure that elements in array has unique value in document (can be repeat in different document)??
- Sparse index, documents without the field are not indexed. By default, these documents will be indexed to null.
3.4 Others
- Atomic operation
update
,findAddUpdate
,remove
are atomic. Put fields in one document can ensure the write operations are atomic. - Support keyword search
Putting strings in an array, then create a multi-key index
enables keyword search.
Keyword search cannot provide NLP/IE functions, such as stemming, synonyms, ranking, which are usually required by search engine.
- Document limit
- Field names cannot start with
$
, cannot contain.
, cannot containnull
. - Field value has a maximum index key length limit to be used as index.
- Maximum document size is 16M. GridFS supports large file, which is represented by a group of files that contain different pieces of contents.
_id
field may be any BSON data type except the array. It isObjectId
by default.
- Field names cannot start with
- DBRef
DBRefs are used to representing a document rather a specific reference type. It contains the name of collection (optional: database name).