Notes of Mongodb
Table of Contents
1 Basic Mongodb Instructions blog
1.1 Insert
db.COLLECTION.insert(doc/docs)
You can insert one document or a list of documents. When you insert a list of documents, the operation is not atomic. The returned result shows the statistics of the write operations including the write errors and the number of successfully inserted documents.
1.2 Find
db.COLLECTION.find({name: value, 'name1.name2.name3':value/condition})
Check all the elements in array when 'name#' corresponds to array. Special conditions
$in, $ne, $nin, $lt, $gt, $lte, $gte
$and, $or, $not, $nor
$exists, $type
$regex, $text, $where, $mod
$all, $elemMatch, $size
$, $slice
Difference between with/without $elemMatch
is that: $elemMatch
restrict that one element in array should match all the conditions,
while without $elemMatch
, different elements in the array can
match different part of the conditions.
Match element / array can be done by query {name: value}
when value
is a single variable/object or an array.
Elements in array can be retrieved by index
db.COLLECTION.find({'name.0': value})
.
1.3 Select
- get attribute
db.COLLECTION.find(***, {name:1})
- get elements in array
db.COLLECTION.find(***, {'name1.name2':1})
This projection will returns all the elments with the path 'name1.name2' when any one in the path represents an array.
db.COLLECTION.find(***, {name1: {$elemMatch: {name2:value}}})
$elemMatch
can be used in the second argument to limit the returned elements in array. However, it CANNOT be used in nested array and only returns the FIRST element. - get elements in nested array.
Aggregation framework/unwind. Aggregations are operations that process data records and return computed results. Pipeline
You can expand the nested arrays by several
$unwind
operations to get a large number of documents that each one contains a single combination of elements in different levels of nested arrays. Then match the elements you need and maybe regroup them together. However, if you want to aggregate the matched elements in nested arrays into array after$unwind
operation, it is very complex.However, you can just return the tasks by
$group
operation that matches, for example, the nested element attributes.
1.4 Update
- update attribute:
db.COLLECTION.update(find condition, {$set: {name: value}})
$inc
increase number, etc.upsert
will insert element when it does not exist in collection. - update array:
db.COLLECTION.update(find condition, {$push/$pull: value})
db.students.update( { _id: 1 }, { $push: { scores: { $each: [ { attempt: 3, score: 7 }, { attempt: 4, score: 4 } ], $sort: { score: 1 }, $slice: -3 } } })
The command above append 2 elements, then sort them by ascending score, then keep the last 3 elements of the ordered array.
- update element in array:
db.COLLECTION.update({"name1.name2":value}, {$set:{'name1.$.name2:value}})
It only works for one match.
If you want to update multiple elments, use forEach function.
db.Projects.find().forEach( function(pj){ pj.groups.forEach( function(gp){ gp.tasks.forEach( function(task){ if (task.name.match(/update/)){ task.weight=5; } }); }); db.Projects.save(pj); } )
- update element in nested array.
Impossible for only one query. The issue is reported on 2010, but is still there on 2015…
Functional key words:
$currentDate, $inc, $max, $min, $mul, $rename, $setOnInsert, $set, $unset
$, $addToSet, $pop, $pullAll, $pull, $pushAll, $push
$each, $position, $slice, $sort
1.5 Remove
- remove document
db.COLLECTION.remove(condition)
remove all documents when condition is not provided.
- remove colleciton
db.COLLECTION.drop()
- remove database
use DATABASE db.dropDatabase()
- remove attribute
db.COLLECTION.update(find, {$unset: {name: value}})
value can be 1, "", true/false.
- remove element in array
db.COLLECTION.update({"name1.name2":value}, {$unset:{'name1.$.name2':1}})
1.6 Commands
show dbs
use DB
show collection
help
db.help()
db.COLLECTION.help()
1.7 Query Cursor Methods
count()
// count of documents returned in a curcorexplain()
// reporthint()
// Forces MongoDB to use a specific index for a querylimit()
// constraints size of returned resultsskip()
// skip the first number of documentssort()
toArray()
1.8 MapReduce
db.COLLECTION.mapReduce(mapFunction, reduceFunction, { query: "query performed at the beginning", out: "out collection name", finalize: finalizeFunction })
Query -> map -> reduce -> finalize -> out.
- The reduce function muse return object with the same type of the output of the map function.
- The order of emited elements should not affect the output of reduce function.
- The reduce function must be Idempotent, which means f(f(x))=f(x).
2 Mongodb Admin
2.1 Backup
One method to backup mongodb is copying the database files
directly. As extension, any methods for file backup can be applied
here such as file system snapshots. With normal solutions using
copy
, rsync
, etc., you have to stop the mongod before copying
the files for data consistency since the copying operation is not
atomic. Drawbacks: not easy for large clusters, do not support
point in time recovery for replica sets.
mongodump creats smaller copy files, which are BSON files. Then you can use mongorestore to import these data into database.
3 Mongodb Strategy blog
3.1 Two-phase commits
Two-phase commits is used to update multiple documents as a fake atomic operation. The principle of two-phase commits is creating temporary, inter-media records to support rollback operations
3.2 Concurrency control
The first method is adding a label to indicate the current accessing application.
The second method is using the old value of files to update as a part of query to ensure that the target fields of current document is not updated.
var myDoc = db.COLLECTION.findOne(condition); if(myDoc){ var oldValue = myDoc.value; var results = db.COLLECTION.update( { _id: myDoc._id, value: oldValue }, { $inc:{value: -fee} } ) }
Another possible option is: if you want to deduct an amount of money from an account and the rest money should not be negative, you can use following commands:
db.COLLECTION.update({_id: id, value: {$gte: fee}}, {$inc:{value: -fee}})
It means the account will be updated only if the value is enough.
The third method is add an unique field (version) to the document.
3.3 Model
3.3.1 Relationships between documents
3.3.1.1 One-to-One Embeded
Example: patron and address. User informations should be put together, such as the account info (password, email) and application relative info (balance in account).
3.3.1.2 One-to-Many Embeded/References
It is used for one-to-few such as the authors of a book. One can use array in a document to model such relations. However, if the values of embedded elements are few, it is better to save them in another documents using references. For example, the publishers of books. Another principle is DO NOT create large arrays in document. Large arrays in document is inefficient for three reasons:
- When document size is increased, the MongoDB will move the document, which lead to rewriting the entire document.
- Indexing the elements in array is inefficient.
- Query a large document for a small part of array is inconvenient.
3.3.2 Tree Structures
3.3.2.1 References
- Parent Refs.
Save each node as a document that contains the reference to the parent
- Child Refs
Save each node as document that contains an array of references to children nodes.
- Extension
It is also useful to add additional relations such as ancestors of nodes into the document.
Another way to accelerate the searching of a tree is materialize paths. This method add a
path
attribute that describe the path using a string. Searching such string requires using of regular expression, but is faster than the previous solution.
3.3.2.2 Nested Sets
This method is tricky. It likes a binary tree that use values to indicate the relative nodes.
db.categories.insert( { _id: "Books", parent: 0, left: 1, right: 12 } ) db.categories.insert( { _id: "Programming", parent: "Books", left: 2, right: 11 } ) db.categories.insert( { _id: "Languages", parent: "Programming", left: 3, right: 4 } ) db.categories.insert( { _id: "Databases", parent: "Programming", left: 5, right: 10 } ) db.categories.insert( { _id: "MongoDB", parent: "Databases", left: 6, right: 7 } ) db.categories.insert( { _id: "dbm", parent: "Databases", left: 8, right: 9}) var databaseCategory = db.categories.findOne( { _id: "Databases" } ); db.categories.find( { left: { $gt: databaseCategory.left }, right: { $lt: databaseCategory.right } })
One can retrieve the children nodes by using the left and right indicators.
3.3.3 Index
One of the main benefit of indexing is sorting, the application can quickly find the matched document by traveling through the ordered indices very quickly
- Single field index
- Compound key index
Note that if one set compound index bydb.COLLECTION.ensureIndex({key1: 1, key2: -1})
It can help the
sort
funciton using{key1: 1, key2: -1}
,{key1: -1, key2: 1}
, but cannot help sorting with the key{key1: 1, key2: 1}
. - Multi-key index
It refers to using array as index. You cannot using compound key for two arrays. - Hased index
It is not compatible with multi-key index. - Geospatial index
This index is used to index the geometric data, such as lines, points, shapes. - Text index
Support text search with language stop words, will use very large space. - Hashed index
Compute the hash for entire document while ??collapse?? the sub-document.
Index also support properties like:
- TTL index is used to remove outdated index
- Uniqu index, which reject duplicate value of indexed filed for all the document. ??Can we use it to ensure that elements in array has unique value in document (can be repeat in different document)??
- Sparse index, documents without the field are not indexed. By default, these documents will be indexed to null.
3.3.4 Others
- Atomic operation
update
,findAddUpdate
,remove
are atomic. Put fields in one document can ensure the write operations are atomic. - Support keyword search
Putting strings in an array, then create a multi-key index
enables keyword search.
Keyword search cannot provide NLP/IE functions, such as stemming, synonyms, ranking, which are usually required by search engine.
- Document limit
- Field names cannot start with
$
, cannot contain.
, cannot containnull
. - Field value has a maximum index key length limit to be used as index.
- Maximum document size is 16M. GridFS supports large file, which is represented by a group of files that contain different pieces of contents.
_id
field may be any BSON data type except the array. It isObjectId
by default.
- Field names cannot start with
- DBRef
DBRefs are used to representing a document rather a specific reference type. It contains the name of collection (optional: database name).
4 Mongoose
4.1 Schema
- methods
- instance
- static
- virtual
- options collection? read?