Developer/ DB Admin
- Efficient Queries/Operations
- Indexes
- Fitting Data Schema
DB Admin/ System Admin
- Hardware & Network
- Sharding
- Replica Sets
- The oldest data is automatically deleted when new data comes in.
- This is efficient for application logs where the most recent logs are stored, or as caching some data.
For capped collection, it returns documents as the same order as it was inserted. But for normal collection it is not always true.
Let's create a collection with capped, max
is optional
Default
size
4 MB,max
is the number of documents that can be stored there.
> use performance
switched to db performance
> db.createCollection('capped', {capped: true, size: 10000, max: 3})
**Output**
{ ok: 1 }
Access capped collections
> db.capped.insertOne({name: 'Max'})
**Output**
{
acknowledged: true,
insertedId: ObjectId("6353cc9a88121c3743e45cef")
}
> db.capped.insertOne({name: 'Manu'})
**Output**
{
acknowledged: true,
insertedId: ObjectId("6353ccb588121c3743e45cf0")
}
db.capped.insertOne({name: 'Anna'})
**Output**
{
acknowledged: true,
insertedId: ObjectId("6353cccc88121c3743e45cf1")
}
> db.capped.find().pretty()
**Output**
[
{ _id: ObjectId("6353cc9a88121c3743e45cef"), name: 'Max' },
{ _id: ObjectId("6353ccb588121c3743e45cf0"), name: 'Manu' },
{ _id: ObjectId("6353cccc88121c3743e45cf1"), name: 'Anna' }
]
If any time need to order the documents to descending order
> db.capped.find().sort({$natural: -1}).pretty()
**Output**
[
{ _id: ObjectId("6353cccc88121c3743e45cf1"), name: 'Anna' },
{ _id: ObjectId("6353ccb588121c3743e45cf0"), name: 'Manu' },
{ _id: ObjectId("6353cc9a88121c3743e45cef"), name: 'Max' }
]
Now if we insert an new document then the first document will automatically get deleted(name: 'Max') and will not show error msg.
> db.capped.insertOne({name: 'Maria'})
**Output**
{
acknowledged: true,
insertedId: ObjectId("6353fac444b95bf60083ef55")
}
> db.capped.find().pretty()
**Output**
[
{ _id: ObjectId("6353ccb588121c3743e45cf0"), name: 'Manu' },
{ _id: ObjectId("6353cccc88121c3743e45cf1"), name: 'Anna' },
{ _id: ObjectId("6353fac444b95bf60083ef55"), name: 'Maria' }
]
But the data is nor permanently deleted, If by using Cache the can retrieve the full data
Client(shell , Driver) <--Read--> MongoDB Server Primary Node --> Replica Set(Primary Node --> 1. Secondary Node 2. Secondary Node
If we add replica node then if we make a write request then it will be go to the Client then MongoDB server then primary node where we have the database. And if we have secondary node MongoDB asynchronously(not instantly) will replicate(copy) to in those secondary node. If primary node goes offline then data read from replica (Secondary Node).
Why Replica Sets?
- Backup / Fault Tolerance
- Improve Read Performance
To read data as first as possible can over distribute read request between the primary and secondary Node. If there are huge amount of read request then this will help to reduce the pressure at primary node.
More memory into a single machine. Horizontal scaling which means need more servers.
MongoDB Server
|
------------------------------------------------------
server1 server2 server3 server4 server5 server6
That issue here of course is this might sound logical but severs now don't duplicate that data, they are not backups, they split that data. So server 1 on the left, let's say stores data for the same application as the other server but a different chunk of it. So with sharding, we have multiple computers who all run MongoDB servers but these servers don’t work alone but work together and split the available data so that data is distributed across shards not replicated. So queries where find data but also insert update and delete operations therefore have to be run against all the servers or the right servers because each chunk manages its data and its range of the data.
Each chunk manages its data and its a range of data
Data can be stored as A-J, J-K, K-S.. to the sequence of a chunk(server).
----------
| Client |
----------
|
------------------
| mongos(Router) |
------------------
|
-------------------------------------------------------------------------
| | |
MongoDB(Server/Shard) MongoDB(Server/Shard) MongoDB(Server/Shard)
--------------------------------------------------------------------------
| Shard Key | | Shard Key | | Shard Key |
---------------------------------------------------------------------------
// shard key: shard key is essential just a field that's added to every document which kid of is important.
------------------------Queries and Sharding-------------------------
----------
| find() |
----------
|
------------------
| mongos(Router) |
------------------
|
-------------------------------------------------------
| |
--------------------------- ---------------------------
| Option1: Operation does | | Option2: Operation does |
| not contain Shared Key | | contain Shared Key |
--------------------------- ---------------------------
| | | Directly Send ---> |
BroadCast > | | to right Shard |
| | | |
-------------------------- --------------------
shared shared shared shared
localhost(mongod) -------> Atlas(mongod)
Have to manage
- Manage Shards
- Secure User / Auth Setup
- Project Web Server / Network
- Manage Replica Set
- Encryption(Transportation & Rest)
- Regular Backups
- Update Software
- Consider Capped Collections for cases where want to clear old data automatically.
- Performance is all about having efficient queries / operations, fitting data formats and a best-practice MongoDB server config.
- Replica sets provide fault tolerancy (with automatic recovery) and improved read performance.
- Sharding allows to scale MongoDB server horizontally.
- Deployment is a complex matter since it involves many tasks - some of them are not even directly related to MongoDB.
- Unless are a experienced admin (or got one), should consider a managed solution like MongoDB Atlas.
- Atlas is a managed service where can configure a MongoDB environment and pay at a by-usage basis.