openinfraday/Plan.md

1.0 KiB

  • Issues with rabbit ?

    • flap when rolling out agent / deploying new agent version
      • even crash on big regions
    • network flap / rabbit partition
      • pause-minority
    • reset cluster was ... the solution
  • What's going on with rabbit ?

    • reproduce workload with rabbit perftest

    • oslo.metrics

    • rabbitmq exporter / grafana dashboards

    • smokeping between nodes

    • What we learned ?

      • rabbitmq does not like at all large queue/connection churn
      • identified issues were mostly related to neutron
  • under the hood ? RPC implementation in Openstack: aka oslo.messaging

    • pub/sub
      • RPC server: setup endpoints / queues / listeners
        • topic, fanout mechanism
      • publish: rpc provided methods
        • call
        • cast
        • cast / fanout=true
    • notifications: kafka
  • Journey to get stable

    • Infra
      • split rabbit-neutron / rabbit-*
      • scale some clusters to 5 node
      • Upgrade to 3.10+
    • openstack