Compare commits
24 Commits
| Author | SHA1 | Date |
|---|---|---|
|
|
e2912d0067 | |
|
|
6d649875b8 | |
|
|
ebb2341c50 | |
|
|
92d3588542 | |
|
|
dab4bb0364 | |
|
|
d9da4532b9 | |
|
|
a49b0f28be | |
|
|
c4205ebe06 | |
|
|
fee991dfd3 | |
|
|
4c3f3195cd | |
|
|
ab0f6574fd | |
|
|
eeff712455 | |
|
|
0866e4a7e0 | |
|
|
01c3c6d78d | |
|
|
6eef026b69 | |
|
|
30632f97c6 | |
|
|
a322b387de | |
|
|
2d68da8612 | |
|
|
20611ec853 | |
|
|
854a37994b | |
|
|
1014610a8b | |
|
|
4fe6fc0eb4 | |
|
|
12ba3c160f | |
|
|
06509916b8 |
|
|
@ -0,0 +1,2 @@
|
||||||
|
.flatnotes
|
||||||
|
\[flatnotes\]*Changelog.md
|
||||||
|
|
@ -0,0 +1,69 @@
|
||||||
|
> Overview of the infra size we operate
|
||||||
|
- Intro
|
||||||
|
|
||||||
|
> What kind of issues we faced with rabbit
|
||||||
|
> Is it a RabbitMQ setup issue or an Openstack issue ?
|
||||||
|
* Issues with rabbit ?
|
||||||
|
* flap when rolling out agent / deploying new agent version
|
||||||
|
* even crash on big regions
|
||||||
|
* network flap / rabbit partition
|
||||||
|
* pause-minority helped crash the cluster
|
||||||
|
* reset cluster was ... the solution
|
||||||
|
|
||||||
|
> Which methods did we use to troubleshoot those issues
|
||||||
|
> Observability, tools
|
||||||
|
* What's going on with rabbit ?
|
||||||
|
* What we deployed to help troubleshooting issues
|
||||||
|
* reproduce workload with rabbit perftest
|
||||||
|
* oslo.metrics
|
||||||
|
* rabbitmq exporter / grafana dashboards
|
||||||
|
* smokeping between nodes
|
||||||
|
* rabbitspy
|
||||||
|
* What we learned ?
|
||||||
|
* rabbitmq does not like at all large queue/connection churn
|
||||||
|
* identified issues were mostly related to neutron
|
||||||
|
* rabbit ddos
|
||||||
|
* too many queue declare
|
||||||
|
* too many tcp connection churn
|
||||||
|
* fanout mechanism 1 message published, duplicated to N queues
|
||||||
|
* Nova rpc usage is clearly != neutron
|
||||||
|
|
||||||
|
> Before going further, let's take some time to understand how oslo.messaging work
|
||||||
|
> How RPC is implemented in Openstack
|
||||||
|
> [[ oslo.messaging - How it works with rabbit]]
|
||||||
|
* Under the hood ?
|
||||||
|
* pub/sub mechanism
|
||||||
|
* subscriber: RPC server topic=name
|
||||||
|
* setup class endpoints
|
||||||
|
* create queues / setup consumer thread
|
||||||
|
* publish with rpc provided methods
|
||||||
|
* call - reply (topic / transient for reply)
|
||||||
|
* cast (topic queue)
|
||||||
|
* cast / fanout=true (fanout queue)
|
||||||
|
* talk about the transient stuff
|
||||||
|
* notifications for external use: kafka
|
||||||
|
|
||||||
|
> What we did to put rabbits back to their holes
|
||||||
|
* Journey to get a stable infra.
|
||||||
|
* Infra improvment
|
||||||
|
* split rabbit-neutron / rabbit-\*
|
||||||
|
* scale problematic clusters to 5 node
|
||||||
|
* Upgrade to 3.10+
|
||||||
|
* quorum queue recommended
|
||||||
|
* put back partition strategy to pause-minority
|
||||||
|
* oslo messaging improvments
|
||||||
|
* queue fixed naming to avoid queue churn
|
||||||
|
* heartbeat in pthread fix
|
||||||
|
* move from HA queue > Quorum queues
|
||||||
|
* fix to autodelete broken quorum queues
|
||||||
|
* replace 'fanout' queues by stream queues
|
||||||
|
* reduce queue nb a lot
|
||||||
|
* patch to avoid tcp reconnection when a queue is deleted (kombu/oslo)
|
||||||
|
* reduce queues declared by a RPC server (3 queues by default to only 1)
|
||||||
|
* use same connection for mutiple topics
|
||||||
|
|
||||||
|
> ...
|
||||||
|
- Conclusion
|
||||||
|
- when rabbitmq is used for what it is designed for, it works better
|
||||||
|
- going further ?
|
||||||
|
- let's write an oslo.messaging driver for another backend ?
|
||||||
|
|
@ -0,0 +1 @@
|
||||||
|
[git](https://git.cosmao.info/ju/openinfraday/src/branch/master/Follow%20the%20RabbitMQ%20-%20Plan.md)
|
||||||
|
|
@ -0,0 +1,78 @@
|
||||||
|
RabbitMQ is a key component in OpenStack deployment.
|
||||||
|
Both nova and neutron heavily rely on it for intra communication (between agents running on computes and API running on control plane).
|
||||||
|
RabbitMQ clustering is a must have to let operators manage the lifecycle of rabbitMQ. This is also true when rabbitmq is running in a kubernetes environment.
|
||||||
|
OpenStack components consume rabbitMQ through oslo.messaging.
|
||||||
|
|
||||||
|
Some recent improvment have been done on oslo.messaging to allow a better scaling and management of rabbitmq queues.
|
||||||
|
|
||||||
|
**Here is a list of what we did on OVH side to achieve better stability at large scale.**
|
||||||
|
|
||||||
|
* Better eventlet / green thread management
|
||||||
|
AMQP protocol rely on "heartbeats" to keep idle connection open.
|
||||||
|
Two patches were done in oslo.messaging to send hearbeats correctly:
|
||||||
|
the first patch was about sending heartbeats more often to respect the protocol definition.
|
||||||
|
the second patch was about using native threads instead of green thread to send hearbeats.
|
||||||
|
Green threads could be paused by eventlet under some circumstances, leading to connection beeing dropped by rabbitmq because of missed heartbeats.
|
||||||
|
While dropping and creating a new connection is not a big deal on small deployment, it leads to some messages loss and a lot of TCP churn at large scale.
|
||||||
|
|
||||||
|
***Both patches are merged upstream and available by default.***
|
||||||
|
|
||||||
|
* Replace classic HA with quorum
|
||||||
|
Rabbitmq is moving out of HA classic queues and replacing those with Quorum queues (based on raft algorithm).
|
||||||
|
This is a huge improvment on rabbitmq side. This allow better scalability as well as redundancy of data.
|
||||||
|
Quorum queues were partially implemented on oslo.messaging.
|
||||||
|
|
||||||
|
OVH did a patch to finish this implementation (for 'transient' queues)
|
||||||
|
|
||||||
|
**Using quorum queues is not yet the default and we would like to enable this by default.**
|
||||||
|
|
||||||
|
* Consistent queue naming
|
||||||
|
oslo.messaging was relying on random queue naming.
|
||||||
|
While this seems not a problem on small deployments, it has two bad side effects :
|
||||||
|
* it's harder to figure out which service created a specific queue
|
||||||
|
* as soon as you restart your services, new random queues are created, leaving a lot of orphaned queues in rabbitmq
|
||||||
|
|
||||||
|
These side effects are highly visible at large scale, and even more visible when using quorum queues.
|
||||||
|
|
||||||
|
**We did a patch on oslo.messaging to stop using random name.**
|
||||||
|
|
||||||
|
This is now merged upstream, but disable by default.
|
||||||
|
We would like to enable this by default in the future.
|
||||||
|
|
||||||
|
* Reduce the number of queues
|
||||||
|
Both neutron and nova are heavily relying on rabbitmq communication.
|
||||||
|
While nova is the one sending most messages (5x more than neutron), neutron is the one creating most queues (10x more than nova).
|
||||||
|
RabbitMQ is a message broker, not a queue broker.
|
||||||
|
Neutron is creating a lot of queues without even using them (neutron instanciate oslo.messaging for one queue, but oslo.messaging is creating multiples queues for multiple purpose, even if neutron does not need them)
|
||||||
|
With a high number of queues, rabbitmq does not work correctly (timeouts / cpu usage / network usage / etc.).
|
||||||
|
|
||||||
|
OVH did some patches to reduce the number of queues created by neutron by patching oslo.messaging and neutron code (we divide neutron number of queues by 5).
|
||||||
|
|
||||||
|
**We would like to push this upstream.**
|
||||||
|
|
||||||
|
* Replace classic fanouts with streams
|
||||||
|
Both neutron and nova rely on fanout queues to send messages to all computes.
|
||||||
|
Neutron mostly use that to trigger a security group update or any other update on object (populating the remote cache).
|
||||||
|
|
||||||
|
When classic queues were used to perform such thing, messages were replicated in all queues for all computes.
|
||||||
|
If you were having a region with 2k computes, you would be sending 2k identical messages in 2k queues (1 message per queue). This is not efficient at all.
|
||||||
|
|
||||||
|
**OVH did a patch to rely on "stream" queues to replace classic fanouts.**
|
||||||
|
With stream queues, all computes listen to the same queue, so only 1 message is sent to 1 queue and is received on 2k computes.
|
||||||
|
This is also reducing the number of queues on rabbitmq.
|
||||||
|
|
||||||
|
Those patches are merged upstream but disabled by default
|
||||||
|
|
||||||
|
**We would like to enable this by default.**
|
||||||
|
|
||||||
|
* Get rid of 'transient' queues
|
||||||
|
oslo.messaging is distinguishing 'transient' queues from other queues but it make no sense anymore.
|
||||||
|
Neutron and nova are expecting all queues to be fully replicated and highly available.
|
||||||
|
There is no transient concept in nova / neutron code.
|
||||||
|
This concept lead to bad practices when managing rabbitmq cluster. E.G. not replicating the transient queues, which is bad for both nova and neutron.
|
||||||
|
|
||||||
|
OVH stopped distinguishing transients and manage all queues in a high available fashion (using quorum queues).
|
||||||
|
This allow us the stop a rabbitmq server from the cluster without any impact on the service.
|
||||||
|
|
||||||
|
What we would like is to patch oslo.messaging in the future to stop considering some queues as transient.
|
||||||
|
This would simplify the code a lot.
|
||||||
|
|
@ -0,0 +1,39 @@
|
||||||
|
# Messaging in Openstack
|
||||||
|
|
||||||
|
## oslo_messaging
|
||||||
|
|
||||||
|
In PCI infra, oslo_messaging is configured using:
|
||||||
|
|
||||||
|
- rabbitmq driver for RPC server/agent communication
|
||||||
|
- kafka and log driver for notifications (send events to third party app)
|
||||||
|
|
||||||
|
|
||||||
|
### RPC implementation in rabbitmq
|
||||||
|
|
||||||
|
[RPC in openstack](https://docs.openstack.org/oslo.messaging/stein/reference/rpcclient.html) is implemented using oslo_messaging library.
|
||||||
|
|
||||||
|
!!! note "tldr"
|
||||||
|
|
||||||
|
- rpc call()
|
||||||
|
- blocking call to invoke a method on a topic with 1 reply expected
|
||||||
|
- rpc cast()
|
||||||
|
- invoke a method on a topic in 'best effort' mode without reply. If fanout=true message, is broadcasted to all topic consumers
|
||||||
|
|
||||||
|
|
||||||
|
In rabbitmq, a message is published to a queue using an exchange/routing_key.
|
||||||
|
Consumers are directly connected to a queue to read messages from.
|
||||||
|
|
||||||
|
A oslo.messaging 'topic' is almost equivalent to a rabbitmq queue.
|
||||||
|
|
||||||
|
With a rpc call, message will be sent to rabbitmq through exchange=target.exchange queue={target.topic}.{target.server}
|
||||||
|
Response will be sent back to caller using exchange=target.exchange queue={message.reply_queue}
|
||||||
|
|
||||||
|
With a rpc cast fanout=false, it's the same but there is no reply mechanism
|
||||||
|
|
||||||
|
With a rpc cast fanout=true, message will be sent to rabbitmq through exchange=target.exchange queue={target.topic}_fanout
|
||||||
|
|
||||||
|
For rpc call and rpc cast (fanout=false), we are using quorum queues (1 publisher / 1 consumer).
|
||||||
|
For rpc cast (fanout=true), stream queues are used because it's purpose is to broadcast messages (1 publisher / N consumers).
|
||||||
|
|
||||||
|
On startup, every server/agent declare queues they will consume from. If queue does not exist on rabbit cluster, it is created.
|
||||||
|
It's the same for publishing part with the exchange.
|
||||||
|
|
@ -0,0 +1,24 @@
|
||||||
|
```
|
||||||
|
<section>image presentation</section>
|
||||||
|
|
||||||
|
<!-- Intro -->
|
||||||
|
<section>
|
||||||
|
▏ ▏ <section>Intro</section>
|
||||||
|
▏ ▏ <section>Vertical Slide 1</section>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section data-markdown data-separator="^\n---\n$" data-separator-vertical="^\n--\n$">
|
||||||
|
▏ ▏ <textarea data-template>
|
||||||
|
▏ ▏ ▏ ### Issues with rabbit ?
|
||||||
|
|
||||||
|
▏ ▏ ▏ --
|
||||||
|
|
||||||
|
### common
|
||||||
|
▏ ▏ ▏ - flap when rolling out agent / deploying new agent version
|
||||||
|
▏ ▏ ▏ ▏ - even crash on big regions
|
||||||
|
▏ ▏ ▏ - network flap / rabbit partition
|
||||||
|
▏ ▏ ▏ - pause-minority helped crash the cluster
|
||||||
|
▏ ▏ ▏ - reset cluster was ... the solution
|
||||||
|
▏ ▏ </textarea>
|
||||||
|
</section>
|
||||||
|
```
|
||||||
Loading…
Reference in New Issue