monitoring - Sudden Mongodb high connections/queues, db completely freezes -


the issue

we have strange issue on our mongodb setup. peaks of high connections , high queues , mongodb process stops responding if let queues , connections increase. need restart instance using sigkill htop.

it seems there system limit / mongodb configuration blocking mongodb operating, because hardware resources ok. versions of issue happening on stand alone , replica set on production servers. details ahead.

about software environment

this stand alone mongodb instance (not sharded nor replica sets), it's operating on dedicated machine, , it's queried other machines. i'm using mongodb-linux-x86_64-2.6.11 under debian 7.7.

the machines querying mongo using django==1.7.4, mongoengine=0.10.1 pymongo==2.8.

on django settings.py file i'm connecting database using following lines:

from mongoengine import connect  connect(     mongo_db,     username = mongo_user,     password = mongo_pwd,     host = mongo_host,     port = mongo_port ) 

mms stats

as can see in following img mms service have peaks on connections , queques:

mms

when happens, our mongodb process freezes. must use sigkill restart mongodb, bad.

in image there 3 freeze events.

queues

as img shows, when happens, have peak on non-mapped virtual memory too.

virtualmemory

also spotted increase on btree chart around 2nd , 3rd freeze.

btree

we have checked logs, there no suspicious query, opcounters don't skyrocket, seems there no more queries usual.

here screenshot on same bug on day/time: moreonthebug

on cases, lock on db not increasing, has peak not reaching 4%:

enter image description here

opcounter drops zero, seems every op goes mongodb queque, database creates new connections try execute new requests, of them going queue well.

machine resources

regarding hardware, machine google cloud compute instance 4 intel xeon cores, 16 gb ram, 100 gb ssd disk.

no noticeable high network/io/cpu/ram issues detected, no peaks on resources, when mongod process frozen.

enter image description here

mysql on machine gets affected

also detect @ same time of mongod peak on queques , connections, spike on mysql connections, running on another machine. when kill mongodb process, mysql connections released (without doing mysql restart).

enter image description here

ulimit

i increased system limits, see if cause of issue seems did not fix problem.

i set recommended on this mongodb article. spike on connections continue. i'm trying find way debug connections coming from.

$ ulimit -a core file size          (blocks, -c) unlimited data seg size           (kbytes, -d) unlimited scheduling priority             (-e) 0 file size               (blocks, -f) unlimited pending signals                 (-i) 60240 max locked memory       (kbytes, -l) 64 max memory size         (kbytes, -m) unlimited open files                      (-n) 409600 pipe size            (512 bytes, -p) 8 posix message queues     (bytes, -q) 819200 real-time priority              (-r) 0 stack size              (kbytes, -s) 8192 cpu time               (seconds, -t) unlimited max user processes              (-u) 60240 virtual memory          (kbytes, -v) unlimited file locks                      (-x) unlimited 

db.currentop

i added shell scripts runs every 1 second following:

var ops = db.currentop().inprog if (ops !== undefined && ops.length > 0){     ops.foreach(function(op){       if(op.secs_running > 0) printjson(op);     }) } 

the log not report operation taking more 1 second execute. thinking process taking long time on seems not case.

mongodb logs

regarding mongodb.log, here full mongodb log around problem. happens on log line 361. there connections start go up, , no more queries executed. cant call mongo shell, says:

[wed feb 10 15:46:01 utc 2016] 2016-02-10t15:48:31.940+0000 dbclientcursor::init call() failed 2016-02-10t15:48:31.941+0000 error: dbclientbase::findn: transport error: 127.0.0.1:27000 ns: admin.$cmd query: { whatsmyuri: 1 } @ src/mongo/shell/mongo.js:148 

log extract

2016-02-10t15:41:39.930+0000 [initandlisten] connection accepted 10.240.0.3:56611 #3665 (79 connections open) 2016-02-10t15:41:39.930+0000 [conn3665] command admin.$cmd command: getnonce { getnonce: 1 } keyupdates:0 numyields:0  reslen:65 0ms 2016-02-10t15:41:39.930+0000 [conn3665] command admin.$cmd command: ping { ping: 1 } keyupdates:0 numyields:0  reslen:37 0ms 2016-02-10t15:41:39.992+0000 [conn3529] command db.$cmd command: count { count: "notification", fields: null, query: { read: false, recipient: 310 } } plansummary: ixscan { recipient: 1 } keyupdates:0 numyields:0 locks(micros) r:215 reslen:48 0ms 2016-02-10t15:41:40.038+0000 [conn2303] query db.column query: { _id: objectid('56b395dfbe66324cbee550b8'), client_id: 20 } plansummary: ixscan { _id: 1 } ntoreturn:2 ntoskip:0 nscanned:1 nscannedobjects:1 keyupdates:0 numyields:0 locks(micros) r:116 nreturned:1 reslen:470 0ms 2016-02-10t15:41:40.044+0000 [conn1871] update db.column query: { _id: objectid('56b395dfbe66324cbee550b8') } update: { $set: { last_request: new date(1455118900040) } } nscanned:1 nscannedobjects:1 nmatched:1 nmodified:1 fastmod:1 keyupdates:0 numyields:0 locks(micros) w:126 0ms 2016-02-10t15:41:40.044+0000 [conn1871] command db.$cmd command: update { update: "column", writeconcern: { w: 1 }, updates: [ { q: { _id: objectid('56b395dfbe66324cbee550b8') }, u: { $set: { last_request: new date(1455118900040) } }, multi: false, upsert: true } ] } keyupdates:0 numyields:0  reslen:55 0ms 2016-02-10t15:41:40.048+0000 [conn1875] query db.user query: { sn: "mobile", client_id: 20, uid: "56990023700" } plansummary: ixscan { client_id: 1, uid: 1, sn: 1 } ntoreturn:2 ntoskip:0 nscanned:1 nscannedobjects:1 keyupdates:0 numyields:0 locks(micros) r:197 nreturned:1 reslen:303 0ms 2016-02-10t15:41:40.056+0000 [conn2303] winning plan had 0 results. not caching. ns: db.case query: { sn: "mobile", client_id: 20, created: { $gt: new date(1454295600000), $lt: new date(1456800900000) }, deleted: false, establishment_users: { $all: [ objectid('5637640afefa2654b5d863e3') ] }, is_closed: true, updated_time: { $gt: new date(1455045840000) } } sort: { updated_time: 1 } projection: {} skip: 0 limit: 15 winner score: 1.0003 winner summary: ixscan { client_id: 1, is_closed: 1, deleted: 1, updated_time: 1 } 2016-02-10t15:41:40.057+0000 [conn2303] query db.case query: { $query: { sn: "mobile", client_id: 20, created: { $gt: new date(1454295600000), $lt: new date(1456800900000) }, deleted: false, establishment_users: { $all: [ objectid('5637640afefa2654b5d863e3') ] }, is_closed: true, updated_time: { $gt: new date(1455045840000) } }, $orderby: { updated_time: 1 } } plansummary: ixscan { client_id: 1, is_closed: 1, deleted: 1, updated_time: 1 } ntoreturn:15 ntoskip:0 nscanned:26 nscannedobjects:26 keyupdates:0 numyields:0 locks(micros) r:5092 nreturned:0 reslen:20 5ms 2016-02-10t15:41:40.060+0000 [conn300] command db.$cmd command: count { count: "notification", fields: null, query: { read: false, recipient: 309 } } plansummary: ixscan { recipient: 1 } keyupdates:0 numyields:0 locks(micros) r:63 reslen:48 0ms 2016-02-10t15:41:40.133+0000 [initandlisten] connection accepted 127.0.0.1:43266 #3666 (80 connections open) 2016-02-10t15:41:40.133+0000 [conn3666] command admin.$cmd command: whatsmyuri { whatsmyuri: 1 } ntoreturn:1 keyupdates:0 numyields:0  reslen:62 0ms 2016-02-10t15:41:40.134+0000 [conn3666] command db.$cmd command: getnonce { getnonce: 1 } ntoreturn:1 keyupdates:0 numyields:0  reslen:65 0ms 2016-02-10t15:41:40.134+0000 [conn3666]  authenticate db: db { authenticate: 1, nonce: "xxx", user: "xxx", key: "xxx" } 2016-02-10t15:41:40.134+0000 [conn3666] command db.$cmd command: authenticate { authenticate: 1, nonce: "xxx", user: "xxx", key: "xxx" } ntoreturn:1 keyupdates:0 numyields:0  reslen:82 0ms 2016-02-10t15:41:40.136+0000 [conn3666] end connection 127.0.0.1:43266 (79 connections open) 2016-02-10t15:41:40.146+0000 [conn3051] command db.$cmd command: count { count: "notification", fields: null, query: { read: false, recipient: 301 } } plansummary: ixscan { recipient: 1 } keyupdates:0 numyields:0 locks(micros) r:284 reslen:48 0ms 2016-02-10t15:41:40.526+0000 [conn3529] query db.column query: { _id: objectid('56a8d864be6632718f9fb087'), client_id: 1 } plansummary: ixscan { _id: 1 } ntoreturn:2 ntoskip:0 nscanned:1 nscannedobjects:1 keyupdates:0 numyields:0 locks(micros) r:176 nreturned:1 reslen:440 0ms 2016-02-10t15:41:40.529+0000 [conn3529] update db.column query: { _id: objectid('56a8d864be6632718f9fb087') } update: { $set: { last_request: new date(1455118900527) } } nscanned:1 nscannedobjects:1 nmatched:1 nmodified:1 fastmod:1 keyupdates:0 numyields:0 locks(micros) w:61 0ms 2016-02-10t15:41:40.529+0000 [conn3529] command db.$cmd command: update { update: "column", writeconcern: { w: 1 }, updates: [ { q: { _id: objectid('56a8d864be6632718f9fb087') }, u: { $set: { last_request: new date(1455118900527) } }, multi: false, upsert: true } ] } keyupdates:0 numyields:0  reslen:55 0ms 2016-02-10t15:41:40.531+0000 [conn3529] query db.user query: { sn: "email", client_id: 1, uid: "asdasdasdasdas" } plansummary: ixscan { client_id: 1, uid: 1, sn: 1 } ntoreturn:2 ntoskip:0 nscanned:1 nscannedobjects:1 keyupdates:0 numyields:0 locks(micros) r:278 nreturned:1 reslen:285 0ms 2016-02-10t15:41:40.546+0000 [conn3529] winning plan had 0 results. not caching. ns: db.case query: { answered: true, sn: "email", client_id: 1, establishment_users: { $all: [ objectid('5669b930fefa2626db389c0e') ] }, deleted: false, is_closed: { $ne: true } } sort: { updated_time: -1 } projection: {} skip: 0 limit: 1 winner score: 1.0003 winner summary: ixscan { client_id: 1, establishment_users: 1, updated_time: 1 } 2016-02-10t15:41:40.547+0000 [conn3529] query db.case query: { $query: { answered: true, sn: "email", client_id: 1, establishment_users: { $all: [ objectid('5669b930fefa2626db389c0e') ] }, deleted: false, is_closed: { $ne: true } }, $orderby: { updated_time: -1 } } plansummary: ixscan { client_id: 1, establishment_users: 1, updated_time: 1 } ntoskip:0 nscanned:103 nscannedobjects:103 keyupdates:0 numyields:0 locks(micros) r:9410 nreturned:0 reslen:20 9ms 2016-02-10t15:41:40.557+0000 [conn3529] winning plan had 0 results. not caching. ns: db.case query: { answered: true, sn: "email", client_id: 1, establishment_users: { $all: [ objectid('5669b930fefa2626db389c0e') ] }, deleted: false, is_closed: { $ne: true } } sort: { updated_time: -1 } projection: {} skip: 0 limit: 15 winner score: 1.0003 winner summary: ixscan { client_id: 1, establishment_users: 1, updated_time: 1 } 2016-02-10t15:41:40.558+0000 [conn3529] query db.case query: { $query: { answered: true, sn: "email", client_id: 1, establishment_users: { $all: [ objectid('5669b930fefa2626db389c0e') ] }, deleted: false, is_closed: { $ne: true } }, $orderby: { updated_time: -1 } } plansummary: ixscan { client_id: 1, establishment_users: 1, updated_time: 1 } ntoreturn:15 ntoskip:0 nscanned:103 nscannedobjects:103 keyupdates:0 numyields:0 locks(micros) r:7572 nreturned:0 reslen:20 7ms 2016-02-10t15:41:40.569+0000 [conn3028] command db.$cmd command: count { count: "notification", fields: null, query: { read: false, recipient: 145 } } plansummary: ixscan { recipient: 1 } keyupdates:0 numyields:0 locks(micros) r:237 reslen:48 0ms 2016-02-10t15:41:40.774+0000 [conn3053] command db.$cmd command: count { count: "notification", fields: null, query: { read: false, recipient: 143 } } plansummary: ixscan { recipient: 1 } keyupdates:0 numyields:0 locks(micros) r:372 reslen:48 0ms 2016-02-10t15:41:41.056+0000 [conn22] command admin.$cmd command: ping { ping: 1 } keyupdates:0 numyields:0  reslen:37 0ms  ######################### here problem starts #########################  2016-02-10t15:41:41.175+0000 [initandlisten] connection accepted 127.0.0.1:43268 #3667 (80 connections open) 2016-02-10t15:41:41.212+0000 [initandlisten] connection accepted 10.240.0.6:46021 #3668 (81 connections open) 2016-02-10t15:41:41.213+0000 [conn3668] command db.$cmd command: getnonce { getnonce: 1 } keyupdates:0 numyields:0  reslen:65 0ms 2016-02-10t15:41:41.213+0000 [conn3668]  authenticate db: db { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" } 2016-02-10t15:41:41.213+0000 [conn3668] command db.$cmd command: authenticate { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" } keyupdates:0 numyields:0  reslen:82 0ms 2016-02-10t15:41:41.348+0000 [initandlisten] connection accepted 10.240.0.6:46024 #3669 (82 connections open) 2016-02-10t15:41:41.349+0000 [conn3669] command db.$cmd command: getnonce { getnonce: 1 } keyupdates:0 numyields:0  reslen:65 0ms 2016-02-10t15:41:41.349+0000 [conn3669]  authenticate db: db { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" } 2016-02-10t15:41:41.349+0000 [conn3669] command db.$cmd command: authenticate { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" } keyupdates:0 numyields:0  reslen:82 0ms 2016-02-10t15:41:43.620+0000 [initandlisten] connection accepted 10.240.0.6:46055 #3670 (83 connections open) 2016-02-10t15:41:43.621+0000 [conn3670] command db.$cmd command: getnonce { getnonce: 1 } keyupdates:0 numyields:0  reslen:65 0ms 2016-02-10t15:41:43.621+0000 [conn3670]  authenticate db: db { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" } 2016-02-10t15:41:43.621+0000 [conn3670] command db.$cmd command: authenticate { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" } keyupdates:0 numyields:0  reslen:82 0ms 2016-02-10t15:41:43.655+0000 [initandlisten] connection accepted 10.240.0.6:46058 #3671 (84 connections open) 2016-02-10t15:41:43.656+0000 [conn3671] command db.$cmd command: getnonce { getnonce: 1 } keyupdates:0 numyields:0  reslen:65 0ms 2016-02-10t15:41:43.656+0000 [conn3671]  authenticate db: db { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" } 2016-02-10t15:41:43.656+0000 [conn3671] command db.$cmd command: authenticate { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" } keyupdates:0 numyields:0  reslen:82 0ms 2016-02-10t15:41:44.045+0000 [initandlisten] connection accepted 10.240.0.6:46071 #3672 (85 connections open) 2016-02-10t15:41:44.045+0000 [conn3672] command db.$cmd command: getnonce { getnonce: 1 } keyupdates:0 numyields:0  reslen:65 0ms 2016-02-10t15:41:44.046+0000 [conn3672]  authenticate db: db { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" } 2016-02-10t15:41:44.046+0000 [conn3672] command db.$cmd command: authenticate { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" } keyupdates:0 numyields:0  reslen:82 0ms 2016-02-10t15:41:44.083+0000 [initandlisten] connection accepted 10.240.0.6:46073 #3673 (86 connections open) 2016-02-10t15:41:44.084+0000 [conn3673] command db.$cmd command: getnonce { getnonce: 1 } keyupdates:0 numyields:0  reslen:65 0ms 2016-02-10t15:41:44.084+0000 [conn3673]  authenticate db: db { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" } 2016-02-10t15:41:44.084+0000 [conn3673] command db.$cmd command: authenticate { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" } keyupdates:0 numyields:0  reslen:82 0ms 2016-02-10t15:41:44.182+0000 [initandlisten] connection accepted 10.240.0.6:46076 #3674 (87 connections open) 2016-02-10t15:41:44.182+0000 [conn3674] command db.$cmd command: getnonce { getnonce: 1 } keyupdates:0 numyields:0  reslen:65 0ms 

collection information

currently our database contains 163 collections. important ones messages, column , cases, ones heavy inserts, updates , queries on. rest if analytics , many collections of 100 records each:

{     "ns" : "db.message",     "count" : 2.96615e+06,     "size" : 3906258304.0000000000000000,     "avgobjsize" : 1316,     "storagesize" : 9305935856.0000000000000000,     "numextents" : 25,     "nindexes" : 21,     "lastextentsize" : 2.14643e+09,     "paddingfactor" : 1.0530000000000086,     "systemflags" : 0,     "userflags" : 1,     "totalindexsize" : 7952525392.0000000000000000,     "indexsizes" : {         "_id_" : 1.63953e+08,         "client_id_1_sn_1_mid_1" : 3.16975e+08,         "client_id_1_created_1" : 1.89086e+08,         "client_id_1_recipients_1_created_1" : 4.3861e+08,         "client_id_1_author_1_created_1" : 2.29713e+08,         "client_id_1_kind_1_created_1" : 2.37088e+08,         "client_id_1_answered_1_created_1" : 1.90934e+08,         "client_id_1_is_mention_1_created_1" : 1.8674e+08,         "client_id_1_has_custom_data_1_created_1" : 1.9566e+08,         "client_id_1_assigned_1_created_1" : 1.86838e+08,         "client_id_1_published_1_created_1" : 1.94352e+08,         "client_id_1_sn_1_created_1" : 2.3681e+08,         "client_id_1_thread_root_1" : 1.88089e+08,         "client_id_1_case_id_1" : 1.89266e+08,         "client_id_1_sender_id_1" : 1.5182e+08,         "client_id_1_recipient_id_1" : 1.49711e+08,         "client_id_1_mid_1_sn_1" : 3.17662e+08,         "text_text_created_1" : 3320641520.0000000000000000,         "client_id_1_sn_1_kind_1_recipient_id_1_created_1" : 3.15226e+08,         "client_id_1_sn_1_thread_root_1_created_1" : 3.06526e+08,         "client_id_1_case_id_1_created_1" : 2.46825e+08     },     "ok" : 1.0000000000000000 }  {     "ns" : "db.case",     "count" : 497661,     "size" : 5.33111e+08,     "avgobjsize" : 1071,     "storagesize" : 6.29637e+08,     "numextents" : 16,     "nindexes" : 34,     "lastextentsize" : 1.68743e+08,     "paddingfactor" : 1.0000000000000000,     "systemflags" : 0,     "userflags" : 1,     "totalindexsize" : 8.46012e+08,     "indexsizes" : {         "_id_" : 2.30073e+07,         "client_id_1" : 1.99985e+07,         "is_closed, deleted_1" : 1.31061e+07,         "is_closed_1" : 1.36948e+07,         "sn_1" : 2.1274e+07,         "deleted_1" : 1.39728e+07,         "created_1" : 1.97777e+07,         "current_assignment_1" : 4.20819e+07,         "assigned_1" : 1.33678e+07,         "commented_1" : 1.36049e+07,         "has_custom_data_1" : 1.42426e+07,         "sentiment_start_1" : 1.36049e+07,         "sentiment_finish_1" : 1.37275e+07,         "updated_time_1" : 2.02192e+07,         "identifier_1" : 1.73822e+07,         "important_1" : 1.38256e+07,         "answered_1" : 1.41772e+07,         "client_id_1_is_closed_1_deleted_1_updated_time_1" : 2.90248e+07,         "client_id_1_is_closed_1_updated_time_1" : 2.86569e+07,         "client_id_1_sn_1_updated_time_1" : 3.58436e+07,         "client_id_1_deleted_1_updated_time_1" : 2.8477e+07,         "client_id_1_updated_time_1" : 2.79619e+07,         "client_id_1_current_assignment_1_updated_time_1" : 5.6071e+07,         "client_id_1_assigned_1_updated_time_1" : 2.87713e+07,         "client_id_1_commented_1_updated_time_1" : 2.86896e+07,         "client_id_1_has_custom_data_1_updated_time_1" : 2.88286e+07,         "client_id_1_sentiment_start_1_updated_time_1" : 2.87223e+07,         "client_id_1_sentiment_finish_1_updated_time_1" : 2.88776e+07,         "client_id_1_identifier_1_updated_time_1" : 3.48216e+07,         "client_id_1_important_1_updated_time_1" : 2.88776e+07,         "client_id_1_answered_1_updated_time_1" : 2.85669e+07,         "client_id_1_establishment_users_1_updated_time_1" : 3.93838e+07,         "client_id_1_identifier_1" : 1.86413e+07,         "client_id_1_sn_1_users_1_updated_time_1" : 4.47309e+07     },     "ok" : 1.0000000000000000 } {     "ns" : "db.column",     "count" : 438,     "size" : 218672,     "avgobjsize" : 499,     "storagesize" : 696320,     "numextents" : 4,     "nindexes" : 2,     "lastextentsize" : 524288,     "paddingfactor" : 1.0000000000000000,     "systemflags" : 0,     "userflags" : 1,     "totalindexsize" : 65408,     "indexsizes" : {         "_id_" : 32704,         "client_id_1_owner_1" : 32704     },     "ok" : 1.0000000000000000 } 

mongostat

here of lines have running mongostat during normal operation:

insert  query update delete getmore command flushes mapped  vsize    res faults        locked db idx miss %     qr|qw   ar|aw  netin netout  conn       time     *0     34      2     *0       0    10|0       0  32.6g  65.5g  1.18g      0 db:0.1%          0       0|0     0|0     4k    39k    87   20:44:44      2     31     13     *0       0     7|0       0  32.6g  65.5g  1.17g      3 db:0.8%          0       0|0     0|0     9k    36k    87   20:44:45      1     18      2     *0       0     5|0       0  32.6g  65.5g  1.12g      0 db:0.4%          0       0|0     0|0     3k    18k    87   20:44:46      5    200     57     *0       0    43|0       0  32.6g  65.5g  1.13g     12 db:2.3%          0       0|0     0|0    46k   225k    86   20:44:47      1     78     23     *0       0     5|0       0  32.6g  65.5g  1.01g      1 db:1.6%          0       0|0     0|0    18k   313k    86   20:44:48     *0     10      1     *0       0     5|0       0  32.6g  65.5g  1004m      0 db:0.2%          0       0|0     1|0     1k     8k    86   20:44:49      3     48     23     *0       0    11|0       0  32.6g  65.5g  1.05g      4 db:1.1%          0       0|0     0|0    16k    48k    86   20:44:50      2     38     13     *0       0     8|0       0  32.6g  65.5g  1.01g      8 db:0.9%          0       0|0     0|0    10k    76k    86   20:44:51      3     28     16     *0       0     9|0       0  32.6g  65.5g  1.01g      7 db:1.1%          0       0|0     1|0    11k    62k    86   20:44:52     *0      9      4     *0       0     8|0       0  32.6g  65.5g  1022m      1 db:0.4%          0       0|0     0|0     3k     6k    87   20:44:53 insert  query update delete getmore command flushes mapped  vsize    res faults        locked db idx miss %     qr|qw   ar|aw  netin netout  conn       time      3    107     34     *0       0     6|0       0  32.6g  65.5g  1.02g      1 db:1.1%          0       0|0     0|0    23k   107k    87   20:44:54      4     65     37     *0       0     8|0       0  32.6g  65.5g  2.69g     57 db:6.2%          0       0|0     0|0    24k   126k    87   20:44:55      9     84     45     *0       0     8|0       0  32.6g  65.5g  2.63g     17 db:5.3%          0       0|0     1|0    32k   109k    87   20:44:56      4     84     47     *0       0    44|0       0  32.6g  65.5g  1.89g     10 db:5.9%          0       0|0     1|0    30k   146k    86   20:44:57      3     73     32     *0       0     9|0       0  32.6g  65.5g  2.58g     12 db:4.7%          0       0|0     0|0    20k   112k    86   20:44:58      2    165     48     *0       0     7|0       0  32.6g  65.5g  2.62g      7 db:1.3%          0       0|0     0|0    34k   147k    86   20:44:59      3     61     26     *0       0    12|0       0  32.6g  65.5g   2.2g      6 db:4.7%          0       0|0     1|0    19k    73k    86   20:45:00      3    252     64     *0       0    12|0       0  32.6g  65.5g  1.87g     85 db:3.2%          0       0|0     0|0    52k   328k    86   20:45:01     *0    189     40     *0       0     6|0       0  32.6g  65.5g  1.65g      0 db:1.6%          0       0|0     0|0    33k   145k    87   20:45:02      1     18     10     *0       0     5|0       0  32.6g  65.5g  1.55g      3 db:0.9%          0       0|0     0|0     6k    15k    87   20:45:03 insert  query update delete getmore command flushes mapped  vsize    res faults        locked db idx miss %     qr|qw   ar|aw  netin netout  conn       time      1     50     11     *0       0     6|0       0  32.6g  65.5g  1.57g      6 db:0.8%          0       0|0     0|0     9k    63k    87   20:45:04      2     49     16     *0       0     6|0       0  32.6g  65.5g  1.56g      1 db:1.1%          0       0|0     0|0    12k    50k    87   20:45:05      1     35     11     *0       0     7|0       0  32.6g  65.5g  1.58g      1 db:0.9%          0       0|0     0|0     8k    41k    87   20:45:06     *0     18      2     *0       0    42|0       0  32.6g  65.5g  1.55g      0 db:0.4%          0       0|0     0|0     5k    19k    86   20:45:07      6     75     40     *0       0    11|0       0  32.6g  65.5g  1.56g     10 db:1.9%          0       0|0     0|0    27k    89k    86   20:45:08      6     60     35     *0       0     7|0       0  32.6g  65.5g  1.89g      5 db:1.5%          0       0|0     1|0    23k   101k    86   20:45:09      2     17     14     *0       0     7|0       0  32.6g  65.5g   1.9g      0 db:1.3%          0       0|0     1|0     8k    29k    86   20:45:10      2     35      7     *0       0     4|0       0  32.6g  65.5g  1.77g      1 db:1.3%          0       0|0     0|0     7k    60k    86   20:45:12      4     50     28     *0       0    10|0       0  32.6g  65.5g  1.75g     10 db:2.0%          0       0|0     0|0    19k    79k    87   20:45:13     *0      3      1     *0       0     5|0       0  32.6g  65.5g  1.63g      0           .:0.7%          0       0|0     0|0     1k     4k    87   20:45:14 insert  query update delete getmore command flushes mapped  vsize    res faults        locked db idx miss %     qr|qw   ar|aw  netin netout  conn       time      5     77     35     *0       0     8|0       0  32.6g  65.5g   1.7g     13 db:3.0%          0       0|0     0|0    23k   124k    88   20:45:15      3     35     18     *0       0     7|0       0  32.6g  65.5g   1.7g      5 db:0.8%          0       0|0     0|0    12k    43k    87   20:45:16      1     18      5     *0       0    11|0       0  32.6g  65.5g  1.63g      2 db:0.9%          0       0|0     0|0     5k    35k    87   20:45:17      3     33     21     *0       0     5|0       0  32.6g  65.5g  1.64g      3 db:0.8%          0       0|0     0|0    13k    32k    87   20:45:18     *0     25      4     *0       0    42|0       0  32.6g  65.5g  1.64g      0 db:0.3%          0       0|0     0|0     5k    34k    86   20:45:19      1     25      5     *0       0     5|0       0  32.6g  65.5g  1.65g      3 db:0.2%          0       0|0     0|0     5k    24k    86   20:45:20     12     88     65     *0       0     7|0       0  32.6g  65.5g   1.7g     25 db:4.2%          0       0|0     0|0    42k   121k    86   20:45:21      2     53     17     *0       0     4|0       0  32.6g  65.5g  1.65g      2 db:1.5%          0       0|0     0|0    12k    82k    86   20:45:22      1      9      6     *0       0     7|0       0  32.6g  65.5g  1.64g      1 db:1.0%          0       0|0     0|0     4k    13k    86   20:45:23     *0      6      2     *0       0     7|0       0  32.6g  65.5g  1.63g      0 db:0.1%          0       0|0     0|0     1k     5k    87   20:45:24 

replica set: updated on may 15th 2016

we migrated our stand alone instance replica set. 2 secondaries serving reads , 1 primary doing writes. machines on replica set area snapshots of original machine. happened new configuration issue changed , it's harder detect.

it happens less instead of sky rocketing connections , queues, whole replica set stops reading/writing, no high connections, no queues no expensive operations @ all. request db time out. fix issue sigkill mongodb process must sent 3 machines.


Comments

Popular posts from this blog

Hatching array of circles in AutoCAD using c# -

ios - UITEXTFIELD InputView Uipicker not working in swift -