Notes of Mongodb

1. Basic Mongodb Instructions blog
2. Mongodb Admin
- 2.1. Backup
3. Mongodb Strategy blog
4. Mongoose
- 4.1. Schema

1 Basic Mongodb Instructions blog

1.1 Insert

db.COLLECTION.insert(doc/docs)

You can insert one document or a list of documents. When you insert a list of documents, the operation is not atomic. The returned result shows the statistics of the write operations including the write errors and the number of successfully inserted documents.

1.2 Find

db.COLLECTION.find({name: value, 'name1.name2.name3':value/condition})

Check all the elements in array when 'name#' corresponds to array. Special conditions

$in, $ne, $nin, $lt, $gt, $lte, $gte
$and, $or, $not, $nor
$exists, $type
$regex, $text, $where, $mod
$all, $elemMatch, $size
$, $slice

Difference between with/without $elemMatch is that: $elemMatch restrict that one element in array should match all the conditions, while without $elemMatch, different elements in the array can match different part of the conditions.

Match element / array can be done by query {name: value} when value is a single variable/object or an array.

Elements in array can be retrieved by index db.COLLECTION.find({'name.0': value}).

1.3 Select

get attribute

db.COLLECTION.find(***, {name:1})

get elements in array
```
db.COLLECTION.find(***, {'name1.name2':1})
```
This projection will returns all the elments with the path 'name1.name2' when any one in the path represents an array.
```
db.COLLECTION.find(***, {name1: {$elemMatch: {name2:value}}})
```
$elemMatch can be used in the second argument to limit the returned elements in array. However, it CANNOT be used in nested array and only returns the FIRST element.
get elements in nested array.
Aggregation framework/unwind. Aggregations are operations that process data records and return computed results. Pipeline

You can expand the nested arrays by several $unwind operations to get a large number of documents that each one contains a single combination of elements in different levels of nested arrays. Then match the elements you need and maybe regroup them together. However, if you want to aggregate the matched elements in nested arrays into array after $unwind operation, it is very complex.

However, you can just return the tasks by $group operation that matches, for example, the nested element attributes.

1.4 Update

update attribute:
```
db.COLLECTION.update(find condition, {$set: {name: value}})
```
$inc increase number, etc. upsert will insert element when it does not exist in collection.

update array:

db.COLLECTION.update(find condition, {$push/$pull: value})

db.students.update(
  { _id: 1 },
  {
  $push: {
          scores: {
             $each: [ { attempt: 3, score: 7 }, { attempt: 4, score: 4 } ],
             $sort: { score: 1 },
             $slice: -3
          }
  }

})

The command above append 2 elements, then sort them by ascending score, then keep the last 3 elements of the ordered array.

update element in array:

db.COLLECTION.update({"name1.name2":value}, {$set:{'name1.$.name2:value}})

It only works for one match.

If you want to update multiple elments, use forEach function.

db.Projects.find().forEach(
    function(pj){
        pj.groups.forEach(
            function(gp){
                gp.tasks.forEach(
                    function(task){
                        if (task.name.match(/update/)){
                            task.weight=5;
                        }
                    });
            });
        db.Projects.save(pj);
    }
)

update element in nested array.
Impossible for only one query. The issue is reported on 2010, but is still there on 2015…

Functional key words:

$currentDate, $inc, $max, $min, $mul, $rename, $setOnInsert, $set, $unset
$, $addToSet, $pop, $pullAll, $pull, $pushAll, $push
$each, $position, $slice, $sort

1.5 Remove

remove document
```
db.COLLECTION.remove(condition)
```
remove all documents when condition is not provided.
remove colleciton
```
db.COLLECTION.drop()
```

remove database

use DATABASE
db.dropDatabase()

remove attribute

db.COLLECTION.update(find, {$unset: {name: value}})

value can be 1, "", true/false.

remove element in array

db.COLLECTION.update({"name1.name2":value}, {$unset:{'name1.$.name2':1}})

1.6 Commands

show dbs use DB show collection help db.help() db.COLLECTION.help()

1.7 Query Cursor Methods

count() // count of documents returned in a curcor
explain() // report
hint() // Forces MongoDB to use a specific index for a query
limit() // constraints size of returned results
skip() // skip the first number of documents
sort()
toArray()

1.8 MapReduce

db.COLLECTION.mapReduce(mapFunction,
                        reduceFunction,
                        {
                            query: "query performed at the beginning",
                            out: "out collection name",
                            finalize: finalizeFunction
                        })

Query -> map -> reduce -> finalize -> out.

The reduce function muse return object with the same type of the output of the map function.
The order of emited elements should not affect the output of reduce function.
The reduce function must be Idempotent, which means f(f(x))=f(x).

2 Mongodb Admin

2.1 Backup

One method to backup mongodb is copying the database files directly. As extension, any methods for file backup can be applied here such as file system snapshots. With normal solutions using copy, rsync, etc., you have to stop the mongod before copying the files for data consistency since the copying operation is not atomic. Drawbacks: not easy for large clusters, do not support point in time recovery for replica sets.

mongodump creats smaller copy files, which are BSON files. Then you can use mongorestore to import these data into database.

3 Mongodb Strategy blog

3.1 Two-phase commits

Two-phase commits is used to update multiple documents as a fake atomic operation. The principle of two-phase commits is creating temporary, inter-media records to support rollback operations

3.2 Concurrency control

The first method is adding a label to indicate the current accessing application.

The second method is using the old value of files to update as a part of query to ensure that the target fields of current document is not updated.

var myDoc = db.COLLECTION.findOne(condition);
if(myDoc){
   var oldValue = myDoc.value;
   var results = db.COLLECTION.update(
       {
           _id: myDoc._id,
           value: oldValue
       },
       {
           $inc:{value: -fee}
       }
       )
}

Another possible option is: if you want to deduct an amount of money from an account and the rest money should not be negative, you can use following commands:

db.COLLECTION.update({_id: id, value: {$gte: fee}}, {$inc:{value: -fee}})

It means the account will be updated only if the value is enough.

The third method is add an unique field (version) to the document.

3.3 Model

3.3.1 Relationships between documents

3.3.1.1 One-to-One Embeded

Example: patron and address. User informations should be put together, such as the account info (password, email) and application relative info (balance in account).

3.3.1.2 One-to-Many Embeded/References

It is used for one-to-few such as the authors of a book. One can use array in a document to model such relations. However, if the values of embedded elements are few, it is better to save them in another documents using references. For example, the publishers of books. Another principle is DO NOT create large arrays in document. Large arrays in document is inefficient for three reasons:

When document size is increased, the MongoDB will move the document, which lead to rewriting the entire document.
Indexing the elements in array is inefficient.
Query a large document for a small part of array is inconvenient.

3.3.2 Tree Structures

3.3.2.1 References

Parent Refs.
Save each node as a document that contains the reference to the parent
Child Refs
Save each node as document that contains an array of references to children nodes.
Extension
It is also useful to add additional relations such as ancestors of nodes into the document.

Another way to accelerate the searching of a tree is materialize paths. This method add a path attribute that describe the path using a string. Searching such string requires using of regular expression, but is faster than the previous solution.

3.3.2.2 Nested Sets

This method is tricky. It likes a binary tree that use values to indicate the relative nodes.

 db.categories.insert( { _id: "Books", parent: 0, left: 1, right: 12 } )
 db.categories.insert( { _id: "Programming", parent: "Books", left: 2, right: 11 } )
 db.categories.insert( { _id: "Languages", parent: "Programming", left: 3, right: 4 } )
 db.categories.insert( { _id: "Databases", parent: "Programming", left: 5, right: 10 } )
db.categories.insert( { _id: "MongoDB", parent: "Databases", left: 6, right: 7 } )
 db.categories.insert( { _id: "dbm", parent: "Databases", left: 8, right: 9})

 var databaseCategory = db.categories.findOne( { _id: "Databases" } );
 db.categories.find( { left: { $gt: databaseCategory.left }, right: { $lt: databaseCategory.right } })

One can retrieve the children nodes by using the left and right indicators.

3.3.3 Index

One of the main benefit of indexing is sorting, the application can quickly find the matched document by traveling through the ordered indices very quickly

Single field index
Compound key index
Note that if one set compound index by
```
db.COLLECTION.ensureIndex({key1: 1, key2: -1})
```
It can help the sort funciton using {key1: 1, key2: -1}, {key1: -1, key2: 1}, but cannot help sorting with the key {key1: 1, key2: 1}.
Multi-key index
It refers to using array as index. You cannot using compound key for two arrays.
Hased index
It is not compatible with multi-key index.
Geospatial index
This index is used to index the geometric data, such as lines, points, shapes.
Text index
Support text search with language stop words, will use very large space.
Hashed index
Compute the hash for entire document while ??collapse?? the sub-document.

Index also support properties like:

TTL index is used to remove outdated index
Uniqu index, which reject duplicate value of indexed filed for all the document. ??Can we use it to ensure that elements in array has unique value in document (can be repeat in different document)??
Sparse index, documents without the field are not indexed. By default, these documents will be indexed to null.

3.3.4 Others

Atomic operation update, findAddUpdate, remove are atomic. Put fields in one document can ensure the write operations are atomic.
Support keyword search Putting strings in an array, then create a multi-key index enables keyword search.
Keyword search cannot provide NLP/IE functions, such as stemming, synonyms, ranking, which are usually required by search engine.
Document limit
- Field names cannot start with $, cannot contain ., cannot contain null.
- Field value has a maximum index key length limit to be used as index.
- Maximum document size is 16M. GridFS supports large file, which is represented by a group of files that contain different pieces of contents.
- _id field may be any BSON data type except the array. It is ObjectId by default.
DBRef
DBRefs are used to representing a document rather a specific reference type. It contains the name of collection (optional: database name).

4 Mongoose

4.1 Schema

methods
- instance
- static
virtual
options collection? read?