Autocommit action=MODIFY on file=Plan.md detected
This commit is contained in:
parent
0866e4a7e0
commit
eeff712455
95
Plan.md
95
Plan.md
|
|
@ -1,46 +1,65 @@
|
|||
- Issues with rabbit ?
|
||||
- flap when rolling out agent / deploying new agent version
|
||||
- even crash on big regions
|
||||
- network flap / rabbit partition
|
||||
- pause-minority helped crash the cluster
|
||||
- reset cluster was ... the solution
|
||||
|
||||
> What kind of issues we faced with rabbit
|
||||
> Is it a RabbitMQ setup issue or an Openstack issue ?
|
||||
|
||||
* Issues with rabbit ?
|
||||
* flap when rolling out agent / deploying new agent version
|
||||
* even crash on big regions
|
||||
* network flap / rabbit partition
|
||||
* pause-minority helped crash the cluster
|
||||
* reset cluster was ... the solution
|
||||
|
||||
|
||||
- What's going on with rabbit ?
|
||||
- reproduce workload with rabbit perftest
|
||||
- oslo.metrics
|
||||
- rabbitmq exporter / grafana dashboards
|
||||
- smokeping between nodes
|
||||
> Which methods did we use to troubleshoot those issues
|
||||
> Observability, tools
|
||||
|
||||
- What we learned ?
|
||||
- rabbitmq does not like at all large queue/connection churn
|
||||
- identified issues were mostly related to neutron
|
||||
- rabbit ddos
|
||||
- too many queue declare
|
||||
- too many tcp connection churn
|
||||
- Nova rpc usage is clearly != neutron
|
||||
* What's going on with rabbit ?
|
||||
* reproduce workload with rabbit perftest
|
||||
* oslo.metrics
|
||||
* rabbitmq exporter / grafana dashboards
|
||||
* smokeping between nodes
|
||||
* rabbitspy
|
||||
* What we learned ?
|
||||
* rabbitmq does not like at all large queue/connection churn
|
||||
* identified issues were mostly related to neutron
|
||||
* rabbit ddos
|
||||
* too many queue declare
|
||||
* too many tcp connection churn
|
||||
* fanout mechanism 1 message published, duplicated to N queues
|
||||
* Nova rpc usage is clearly != neutron
|
||||
|
||||
|
||||
- Under the hood ? RPC implementation in Openstack: aka oslo.messaging
|
||||
- pub/sub
|
||||
- RPC server: setup endpoints / queues / listeners
|
||||
- publish: rpc provided methods
|
||||
- call - reply (topic / transient for reply)
|
||||
- cast (topic queue)
|
||||
- cast / fanout=true (fanout queue)
|
||||
- notifications: kafka
|
||||
> Before going further, let's take some time to understand how oslo.messaging work
|
||||
> How RPC is implemented in Openstack
|
||||
|
||||
* Under the hood ?
|
||||
* pub/sub mechanism
|
||||
* subscriber: RPC server topic=name
|
||||
* setup class endpoints
|
||||
* create queues / setup consumer thread
|
||||
* publish with rpc provided methods
|
||||
* call - reply (topic / transient for reply)
|
||||
* cast (topic queue)
|
||||
* cast / fanout=true (fanout queue)
|
||||
* notifications for external use: kafka
|
||||
|
||||
|
||||
- Journey to get stable
|
||||
- Infra
|
||||
- split rabbit-neutron / rabbit-*
|
||||
- scale problematic clusters to 5 node
|
||||
- Upgrade to 3.10+
|
||||
- quorum queue recommended
|
||||
- oslo messaging improvment
|
||||
- queue fixed naming to avoid
|
||||
- move from HA queue > Quorum queues
|
||||
- replace 'fanout' queues by stream queues => reduce queue nb
|
||||
- reduce queue declared by RPC server
|
||||
- use same connection for mutiple topics
|
||||
> What we did to put rabbits back to their holes
|
||||
|
||||
* Journey to get a stable infra.
|
||||
* Infra
|
||||
* split rabbit-neutron / rabbit-\*
|
||||
* scale problematic clusters to 5 node
|
||||
* Upgrade to 3.10+
|
||||
* quorum queue recommended
|
||||
* put back partition strategy to pause-minority
|
||||
* oslo messaging improvments
|
||||
* queue fixed naming to avoid queue churn
|
||||
* heartbeat in pthread fix
|
||||
* move from HA queue > Quorum queues
|
||||
* fix to autodelete broken quorum queues
|
||||
* replace 'fanout' queues by stream queues
|
||||
* reduce queue nb a lot
|
||||
* patch to avoid tcp reconnection when a queue is deleted (kombu/oslo)
|
||||
* reduce queues declared by a RPC server (3 queues by default to only 1)
|
||||
* use same connection for mutiple topics
|
||||
Loading…
Reference in New Issue