- Issues with rabbit ? - flap when rolling out agent / deploying new agent version - even crash on big regions - network flap / rabbit partition - pause-minority - reset cluster was ... the solution - What's going on with rabbit ? - reproduce workload with rabbit perftest - oslo.metrics - rabbitmq exporter / grafana dashboards - smokeping between nodes - What we learned ? - rabbitmq does not like at all large queue/connection churn - identified issues were mostly related to neutron - under the hood ? RPC implementation in Openstack: aka oslo.messaging - pub/sub - RPC server: setup endpoints / queues / listeners - topic, fanout mechanism - publish: rpc provided methods - call - cast - cast / fanout=true - notifications: kafka - Journey to get stable - Infra - split rabbit-neutron / rabbit-* - scale some clusters to 5 node - Upgrade to 3.10+ - openstack -