From ab0f6574fd19a8f84a8fcc7cb265c6166f3ef31f Mon Sep 17 00:00:00 2001
From: Flatnotes <flatnotes@autocommit>
Date: Sun, 5 May 2024 20:31:50 +0000
Subject: [PATCH] Autocommit action=MODIFY on file=RabbitMQ recent
 improvments.md detected

---
 1. Follow the Rabbitmq.md      | 101 ---------------------------------
 RabbitMQ recent improvments.md |  78 +++++++++++++++++++++++++
 2 files changed, 78 insertions(+), 101 deletions(-)
 delete mode 100644 1. Follow the Rabbitmq.md
 create mode 100644 RabbitMQ recent improvments.md

diff --git a/1. Follow the Rabbitmq.md b/1. Follow the Rabbitmq.md
deleted file mode 100644
index 04f6e8c..0000000
--- a/1. Follow the Rabbitmq.md	
+++ /dev/null
@@ -1,101 +0,0 @@
- 
-RabbitMQ recent improvments
-
-RabbitMQ is a key component in OpenStack deployment.
-Both nova and neutron heavily rely on it for intra communication (between agents running on computes and API running on control plane).
-RabbitMQ clustering is a must have to let operators manage the lifecycle of rabbitMQ. This is also true when rabbitmq is running in a kubernetes environment.
-OpenStack components consume rabbitMQ through oslo.messaging.
-
-Some recent improvment have been done on oslo.messaging to allow a better scaling and management of rabbitmq queues.
-
-Here is a list of what we did on OVH side to achieve better stability at large scale.
-
-
-
-
-- Better eventlet / green thread management
-AMQP protocol rely on "heartbeats" to keep idle connection open.
-Two patches were done in oslo.messaging to send hearbeats correctly:
-the first patch was about sending heartbeats more often to respect the protocol definition.
-the second patch was about using native threads instead of green thread to send hearbeats.
-Green threads could be paused by eventlet under some circumstances, leading to connection beeing dropped by rabbitmq because of missed heartbeats.
-While dropping and creating a new connection is not a big deal on small deployment, it leads to some messages loss and a lot of TCP churn at large scale.
-
-Both patches are merged upstream and available by default.
-
-
-
-
-- Replace classic HA with quorum
-Rabbitmq is moving out of HA classic queues and replacing those with Quorum queues (based on raft algorithm).
-This is a huge improvment on rabbitmq side. This allow better scalability as well as redundancy of data.
-Quorum queues were partially implemented on oslo.messaging.
-
-OVH did a patch to finish this implementation (for 'transient' queues)
-
-Using quorum queues is not yet the default and we would like to enable this by default.
-
-
-
-
-- Consistent queue naming
-oslo.messaging was relying on random queue naming.
-While this seems not a problem on small deployments, it has two bad side effects :
-- it's harder to figure out which service created a specific queue
-- as soon as you restart your services, new random queues are created, leaving a lot of orphaned queues in rabbitmq
-
-These side effects are highly visible at large scale, and even more visible when using quorum queues.
-
-We did a patch on oslo.messaging to stop using random name.
-
-This is now merged upstream, but disable by default.
-We would like to enable this by default in the future.
-
-
-
-
-
-- Reduce the number of queues
-Both neutron and nova are heavily relying on rabbitmq communication.
-While nova is the one sending most messages (5x more than neutron), neutron is the one creating most queues (10x more than nova).
-RabbitMQ is a message broker, not a queue broker.
-Neutron is creating a lot of queues without even using them (neutron instanciate oslo.messaging for one queue, but oslo.messaging is creating multiples queues for multiple purpose, even if neutron does not need them)
-With a high number of queues, rabbitmq does not work correctly (timeouts / cpu usage / network usage / etc.).
-
-OVH did some patches to reduce the number of queues created by neutron by patching oslo.messaging and neutron code (we divide neutron number of queues by 5).
-
-We would like to push this upstream.
-
-
-
-
-
-- Replace classic fanouts with streams
-Both neutron and nova rely on fanout queues to send messages to all computes.
-Neutron mostly use that to trigger a security group update or any other update on object (populating the remote cache).
-
-When classic queues were used to perform such thing, messages were replicated in all queues for all computes.
-If you were having a region with 2k computes, you would be sending 2k identical messages in 2k queues (1 message per queue). This is not efficient at all.
-
-OVH did a patch to rely on "stream" queues to replace classic fanouts.
-With stream queues, all computes listen to the same queue, so only 1 message is sent to 1 queue and is received on 2k computes.
-This is also reducing the number of queues on rabbitmq.
-
-Those patches are merged upstream but disabled by default
-
-We would like to enable this by default.
-
-
-
-
-- Get rid of 'transient' queues
-oslo.messaging is distinguishing 'transient' queues from other queues but it make no sense anymore.
-Neutron and nova are expecting all queues to be fully replicated and highly available.
-There is no transient concept in nova / neutron code.
-This concept lead to bad practices when managing rabbitmq cluster. E.G. not replicating the transient queues, which is bad for both nova and neutron.
-
-OVH stopped distinguishing transients and manage all queues in a high available fashion (using quorum queues).
-This allow us the stop a rabbitmq server from the cluster without any impact on the service.
-
-What we would like is to patch oslo.messaging in the future to stop considering some queues as transient.
-This would simplify the code a lot.
diff --git a/RabbitMQ recent improvments.md b/RabbitMQ recent improvments.md
new file mode 100644
index 0000000..117d2cc
--- /dev/null
+++ b/RabbitMQ recent improvments.md	
@@ -0,0 +1,78 @@
+RabbitMQ is a key component in OpenStack deployment.
+Both nova and neutron heavily rely on it for intra communication (between agents running on computes and API running on control plane).
+RabbitMQ clustering is a must have to let operators manage the lifecycle of rabbitMQ. This is also true when rabbitmq is running in a kubernetes environment.
+OpenStack components consume rabbitMQ through oslo.messaging.
+
+Some recent improvment have been done on oslo.messaging to allow a better scaling and management of rabbitmq queues.
+
+**Here is a list of what we did on OVH side to achieve better stability at large scale.**
+
+* Better eventlet / green thread management
+    AMQP protocol rely on "heartbeats" to keep idle connection open.
+    Two patches were done in oslo.messaging to send hearbeats correctly:
+    the first patch was about sending heartbeats more often to respect the protocol definition.
+    the second patch was about using native threads instead of green thread to send hearbeats.
+    Green threads could be paused by eventlet under some circumstances, leading to connection beeing dropped by rabbitmq because of missed heartbeats.
+    While dropping and creating a new connection is not a big deal on small deployment, it leads to some messages loss and a lot of TCP churn at large scale.
+
+***Both patches are merged upstream and available by default.***
+
+* Replace classic HA with quorum
+    Rabbitmq is moving out of HA classic queues and replacing those with Quorum queues (based on raft algorithm).
+    This is a huge improvment on rabbitmq side. This allow better scalability as well as redundancy of data.
+    Quorum queues were partially implemented on oslo.messaging.
+
+OVH did a patch to finish this implementation (for 'transient' queues)
+
+**Using quorum queues is not yet the default and we would like to enable this by default.**
+
+* Consistent queue naming
+    oslo.messaging was relying on random queue naming.
+    While this seems not a problem on small deployments, it has two bad side effects :
+* it's harder to figure out which service created a specific queue
+* as soon as you restart your services, new random queues are created, leaving a lot of orphaned queues in rabbitmq
+
+These side effects are highly visible at large scale, and even more visible when using quorum queues.
+
+**We did a patch on oslo.messaging to stop using random name.**
+
+This is now merged upstream, but disable by default.
+We would like to enable this by default in the future.
+
+* Reduce the number of queues
+    Both neutron and nova are heavily relying on rabbitmq communication.
+    While nova is the one sending most messages (5x more than neutron), neutron is the one creating most queues (10x more than nova).
+    RabbitMQ is a message broker, not a queue broker.
+    Neutron is creating a lot of queues without even using them (neutron instanciate oslo.messaging for one queue, but oslo.messaging is creating multiples queues for multiple purpose, even if neutron does not need them)
+    With a high number of queues, rabbitmq does not work correctly (timeouts / cpu usage / network usage / etc.).
+
+OVH did some patches to reduce the number of queues created by neutron by patching oslo.messaging and neutron code (we divide neutron number of queues by 5).
+
+**We would like to push this upstream.**
+
+* Replace classic fanouts with streams
+    Both neutron and nova rely on fanout queues to send messages to all computes.
+    Neutron mostly use that to trigger a security group update or any other update on object (populating the remote cache).
+
+When classic queues were used to perform such thing, messages were replicated in all queues for all computes.
+If you were having a region with 2k computes, you would be sending 2k identical messages in 2k queues (1 message per queue). This is not efficient at all.
+
+**OVH did a patch to rely on "stream" queues to replace classic fanouts.**
+With stream queues, all computes listen to the same queue, so only 1 message is sent to 1 queue and is received on 2k computes.
+This is also reducing the number of queues on rabbitmq.
+
+Those patches are merged upstream but disabled by default
+
+**We would like to enable this by default.**
+
+* Get rid of 'transient' queues
+    oslo.messaging is distinguishing 'transient' queues from other queues but it make no sense anymore.
+    Neutron and nova are expecting all queues to be fully replicated and highly available.
+    There is no transient concept in nova / neutron code.
+    This concept lead to bad practices when managing rabbitmq cluster. E.G. not replicating the transient queues, which is bad for both nova and neutron.
+
+OVH stopped distinguishing transients and manage all queues in a high available fashion (using quorum queues).
+This allow us the stop a rabbitmq server from the cluster without any impact on the service.
+
+What we would like is to patch oslo.messaging in the future to stop considering some queues as transient.
+This would simplify the code a lot.
\ No newline at end of file