47 lines
1.5 KiB
Markdown
47 lines
1.5 KiB
Markdown
- Issues with rabbit ?
|
|
- flap when rolling out agent / deploying new agent version
|
|
- even crash on big regions
|
|
- network flap / rabbit partition
|
|
- pause-minority helped crash the cluster
|
|
- reset cluster was ... the solution
|
|
|
|
|
|
- What's going on with rabbit ?
|
|
- reproduce workload with rabbit perftest
|
|
- oslo.metrics
|
|
- rabbitmq exporter / grafana dashboards
|
|
- smokeping between nodes
|
|
|
|
- What we learned ?
|
|
- rabbitmq does not like at all large queue/connection churn
|
|
- identified issues were mostly related to neutron
|
|
- rabbit ddos
|
|
- too many queue declare
|
|
- too many tcp connection churn
|
|
- Nova rpc usage is clearly != neutron
|
|
|
|
|
|
- Under the hood ? RPC implementation in Openstack: aka oslo.messaging
|
|
- pub/sub
|
|
- RPC server: setup endpoints / queues / listeners
|
|
- publish: rpc provided methods
|
|
- call - reply (topic / transient for reply)
|
|
- cast (topic queue)
|
|
- cast / fanout=true (fanout queue)
|
|
- notifications: kafka
|
|
|
|
|
|
- Journey to get stable
|
|
- Infra
|
|
- split rabbit-neutron / rabbit-*
|
|
- scale problematic clusters to 5 node
|
|
- Upgrade to 3.10+
|
|
- quorum queue recommended
|
|
- oslo messaging improvment
|
|
- queue fixed naming to avoid
|
|
- move from HA queue > Quorum queues
|
|
- replace 'fanout' queues by stream queues => reduce queue nb
|
|
- reduce queue declared by RPC server
|
|
- use same connection for mutiple topics
|
|
|