From 0866e4a7e080da7f61888e7118cf63c70c218436 Mon Sep 17 00:00:00 2001 From: Flatnotes Date: Sun, 5 May 2024 20:06:10 +0000 Subject: [PATCH] Autocommit action=MODIFY on file=Plan.md detected --- Plan.md | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/Plan.md b/Plan.md index b545242..f2c452a 100644 --- a/Plan.md +++ b/Plan.md @@ -2,9 +2,10 @@ - flap when rolling out agent / deploying new agent version - even crash on big regions - network flap / rabbit partition - - pause-minority + - pause-minority helped crash the cluster - reset cluster was ... the solution + - What's going on with rabbit ? - reproduce workload with rabbit perftest - oslo.metrics @@ -14,23 +15,32 @@ - What we learned ? - rabbitmq does not like at all large queue/connection churn - identified issues were mostly related to neutron - - -- under the hood ? RPC implementation in Openstack: aka oslo.messaging + - rabbit ddos + - too many queue declare + - too many tcp connection churn + - Nova rpc usage is clearly != neutron + + +- Under the hood ? RPC implementation in Openstack: aka oslo.messaging - pub/sub - RPC server: setup endpoints / queues / listeners - - topic, fanout mechanism - publish: rpc provided methods - - call - - cast - - cast / fanout=true + - call - reply (topic / transient for reply) + - cast (topic queue) + - cast / fanout=true (fanout queue) - notifications: kafka - Journey to get stable - Infra - split rabbit-neutron / rabbit-* - - scale some clusters to 5 node + - scale problematic clusters to 5 node - Upgrade to 3.10+ - - openstack - - + - quorum queue recommended + - oslo messaging improvment + - queue fixed naming to avoid + - move from HA queue > Quorum queues + - replace 'fanout' queues by stream queues => reduce queue nb + - reduce queue declared by RPC server + - use same connection for mutiple topics +