Docker design: exchange data between containers or put multiple processes in one container?2019 Community Moderator ElectionHow to list containers in DockerHow to remove old Docker containersRun a Docker Image as a ContainerExposing a port on a live Docker containerCopying files from Docker container to hostWhat is the difference between “expose” and “publish” in Docker?Copying files from host to Docker containerWhat is the difference between a Docker image and a container?From inside of a Docker container, how do I connect to the localhost of the machine?How do I pass environment variables to Docker containers?

How to get the first element while continue streaming?

Are there other characters in the Star Wars universe who had damaged bodies and needed to wear an outfit like Darth Vader?

Inconsistent behaviour between dict.values() and dict.keys() equality in Python 3.x and Python 2.7

Giving a talk in my old university, how prominently should I tell students my salary?

Practical reasons to have both a large police force and bounty hunting network?

How can neutral atoms have exactly zero electric field when there is a difference in the positions of the charges?

Specific Chinese carabiner QA?

Draw bounding region by list of points

Why doesn't "adolescent" take any articles in "listen to adolescent agonising"?

Why did the Cray-1 have 8 parity bits per word?

Called into a meeting and told we are being made redundant (laid off) and "not to share outside". Can I tell my partner?

Lock enemy's y-axis when using Vector3.MoveTowards to follow the player

Make me a metasequence

Being asked to review a paper in conference one has submitted to

1970s scifi/horror novel where protagonist is used by a crablike creature to feed its larvae, goes mad, and is defeated by retraumatising him

Is there a full canon version of Tyrion's jackass/honeycomb joke?

What is the meaning of "notice to quit at once" and "Lotty points”

PTIJ: Aharon, King of Egypt

Can a space-faring robot still function over a billion years?

School performs periodic password audits. Is my password compromised?

Must 40/100G uplink ports on a 10G switch be connected to another switch?

When do _WA_Sys_ statistics Get Updated?

Where is this quote about overcoming the impossible said in "Interstellar"?

How to merge row in the first column in LaTeX



Docker design: exchange data between containers or put multiple processes in one container?



2019 Community Moderator ElectionHow to list containers in DockerHow to remove old Docker containersRun a Docker Image as a ContainerExposing a port on a live Docker containerCopying files from Docker container to hostWhat is the difference between “expose” and “publish” in Docker?Copying files from host to Docker containerWhat is the difference between a Docker image and a container?From inside of a Docker container, how do I connect to the localhost of the machine?How do I pass environment variables to Docker containers?










1















In a current project I have to perform the following tasks (among others):



  • capture video frames from five IP cameras and stitch a panorama

  • run machine learning based object detection on the panorama

  • stream the panorama so it can be displayed in a UI

Currently, the stitching and the streaming runs in one docker container, and the object detection runs in another, reading the panorama stream as input.



Since I need to increase the input resolution for the the object detector while maintaining the stream resolution for the UI, I have to look for alternative ways of getting the stitched (full resolution) panorama (~10 MB per frame) from the stitcher container to the detector container.



My thoughts regarding potential solutions:



  • shared volume. Potential downside: One extra write and read per frame might be too slow?

  • Using a message queue or e.g. redis. Potential downside: yet another component in the architecture.

  • merging the two containers. Potential downside(s): Not only does it not feel right, but the two containers have completely different base images and dependencies. Plus I'd have to worry about parallelization.

Since I'm not the sharpest knife in the docker drawer, what I'm asking for are tips, experiences and best practices regarding fast data exchange between docker containers.










share|improve this question


























    1















    In a current project I have to perform the following tasks (among others):



    • capture video frames from five IP cameras and stitch a panorama

    • run machine learning based object detection on the panorama

    • stream the panorama so it can be displayed in a UI

    Currently, the stitching and the streaming runs in one docker container, and the object detection runs in another, reading the panorama stream as input.



    Since I need to increase the input resolution for the the object detector while maintaining the stream resolution for the UI, I have to look for alternative ways of getting the stitched (full resolution) panorama (~10 MB per frame) from the stitcher container to the detector container.



    My thoughts regarding potential solutions:



    • shared volume. Potential downside: One extra write and read per frame might be too slow?

    • Using a message queue or e.g. redis. Potential downside: yet another component in the architecture.

    • merging the two containers. Potential downside(s): Not only does it not feel right, but the two containers have completely different base images and dependencies. Plus I'd have to worry about parallelization.

    Since I'm not the sharpest knife in the docker drawer, what I'm asking for are tips, experiences and best practices regarding fast data exchange between docker containers.










    share|improve this question
























      1












      1








      1


      1






      In a current project I have to perform the following tasks (among others):



      • capture video frames from five IP cameras and stitch a panorama

      • run machine learning based object detection on the panorama

      • stream the panorama so it can be displayed in a UI

      Currently, the stitching and the streaming runs in one docker container, and the object detection runs in another, reading the panorama stream as input.



      Since I need to increase the input resolution for the the object detector while maintaining the stream resolution for the UI, I have to look for alternative ways of getting the stitched (full resolution) panorama (~10 MB per frame) from the stitcher container to the detector container.



      My thoughts regarding potential solutions:



      • shared volume. Potential downside: One extra write and read per frame might be too slow?

      • Using a message queue or e.g. redis. Potential downside: yet another component in the architecture.

      • merging the two containers. Potential downside(s): Not only does it not feel right, but the two containers have completely different base images and dependencies. Plus I'd have to worry about parallelization.

      Since I'm not the sharpest knife in the docker drawer, what I'm asking for are tips, experiences and best practices regarding fast data exchange between docker containers.










      share|improve this question














      In a current project I have to perform the following tasks (among others):



      • capture video frames from five IP cameras and stitch a panorama

      • run machine learning based object detection on the panorama

      • stream the panorama so it can be displayed in a UI

      Currently, the stitching and the streaming runs in one docker container, and the object detection runs in another, reading the panorama stream as input.



      Since I need to increase the input resolution for the the object detector while maintaining the stream resolution for the UI, I have to look for alternative ways of getting the stitched (full resolution) panorama (~10 MB per frame) from the stitcher container to the detector container.



      My thoughts regarding potential solutions:



      • shared volume. Potential downside: One extra write and read per frame might be too slow?

      • Using a message queue or e.g. redis. Potential downside: yet another component in the architecture.

      • merging the two containers. Potential downside(s): Not only does it not feel right, but the two containers have completely different base images and dependencies. Plus I'd have to worry about parallelization.

      Since I'm not the sharpest knife in the docker drawer, what I'm asking for are tips, experiences and best practices regarding fast data exchange between docker containers.







      docker architecture






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 13 hours ago









      creimerscreimers

      2,27811940




      2,27811940






















          2 Answers
          2






          active

          oldest

          votes


















          1














          Usually most communication between Docker containers is over network sockets. This is fine when you're talking to something like a relational database or an HTTP server. It sounds like your application is a little more about sharing files, though, and that's something Docker is a little less good at.



          If you only want one copy of each component, or are still actively developing the pipeline: I'd probably not use Docker for this. Since each container has an isolated filesystem and its own user ID space, sharing files can be unexpectedly tricky (every container must agree on numeric user IDs). But if you just run everything on the host, as the same user, pointing at the same directory, this isn't a problem.



          If you're trying to scale this in production: I'd add some sort of shared filesystem and a message queueing system like RabbitMQ. For local work this could be a Docker named volume or bind-mounted host directory; cloud storage like Amazon S3 will work fine too. The setup is like this:



          • Each component knows about the shared storage and connects to RabbitMQ, but is unaware of the other components.

          • Each component reads a message from a RabbitMQ queue that names a file to process.

          • The component reads the file and does its work.

          • When it finishes, the component writes the result file back to the shared storage, and writes its location to a RabbitMQ exchange.

          In this setup each component is totally stateless. If you discover that, for example, the machine-learning component of this is slowest, you can run duplicate copies of it. If something breaks, RabbitMQ will remember that a given message hasn't been fully processed (acknowledged); and again because of the isolation you can run that specific component locally to reproduce and fix the issue.



          This model also translates well to larger-scale Docker-based cluster-computing systems like Kubernetes.



          Running this locally, I would absolutely keep separate concerns in separate containers (especially if individual image-processing and ML tasks are expensive). The setup I propose needs both a message queue (to keep track of the work) and a shared filesystem (because message queues tend to not be optimized for 10+ MB individual messages). You get a choice between Docker named volumes and host bind-mounts as readily available shared storage. Bind mounts are easier to inspect and administer, but on some platforms are legendarily slow. Named volumes I think are reasonably fast, but you can only access them from Docker containers, which means needing to launch more containers to do basic things like backup and pruning.






          share|improve this answer






























            0














            Alright, Let's unpack this:



            • IMHO Shared Volume works just fine, but gets way too messy over time. Especially if you're handling Stateful services.

            • MQ: This seems like a best option in my opinion. Yes, it's another component in your architecture, but it makes sense to have it rather than maintaining messy shared Volumes or handling massive container images (if you manage to combine 2 container images)

            • Yes, You could potentially do this, but not a good idea. Considering your use case, I'm going to go ahead and make an assumption that you have a massive list of dependencies which could potentially lead to a conflict. Also, lot of dependencies = larger image = Larger attack surface - which from a security perspective is not a good thing.

            If you really want to run multiple processes in one container, it's possible. There are multiple ways to achieve that, however I prefer supervisord.



            https://docs.docker.com/config/containers/multi-service_container/






            share|improve this answer






















              Your Answer






              StackExchange.ifUsing("editor", function ()
              StackExchange.using("externalEditor", function ()
              StackExchange.using("snippets", function ()
              StackExchange.snippets.init();
              );
              );
              , "code-snippets");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "1"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55021273%2fdocker-design-exchange-data-between-containers-or-put-multiple-processes-in-one%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1














              Usually most communication between Docker containers is over network sockets. This is fine when you're talking to something like a relational database or an HTTP server. It sounds like your application is a little more about sharing files, though, and that's something Docker is a little less good at.



              If you only want one copy of each component, or are still actively developing the pipeline: I'd probably not use Docker for this. Since each container has an isolated filesystem and its own user ID space, sharing files can be unexpectedly tricky (every container must agree on numeric user IDs). But if you just run everything on the host, as the same user, pointing at the same directory, this isn't a problem.



              If you're trying to scale this in production: I'd add some sort of shared filesystem and a message queueing system like RabbitMQ. For local work this could be a Docker named volume or bind-mounted host directory; cloud storage like Amazon S3 will work fine too. The setup is like this:



              • Each component knows about the shared storage and connects to RabbitMQ, but is unaware of the other components.

              • Each component reads a message from a RabbitMQ queue that names a file to process.

              • The component reads the file and does its work.

              • When it finishes, the component writes the result file back to the shared storage, and writes its location to a RabbitMQ exchange.

              In this setup each component is totally stateless. If you discover that, for example, the machine-learning component of this is slowest, you can run duplicate copies of it. If something breaks, RabbitMQ will remember that a given message hasn't been fully processed (acknowledged); and again because of the isolation you can run that specific component locally to reproduce and fix the issue.



              This model also translates well to larger-scale Docker-based cluster-computing systems like Kubernetes.



              Running this locally, I would absolutely keep separate concerns in separate containers (especially if individual image-processing and ML tasks are expensive). The setup I propose needs both a message queue (to keep track of the work) and a shared filesystem (because message queues tend to not be optimized for 10+ MB individual messages). You get a choice between Docker named volumes and host bind-mounts as readily available shared storage. Bind mounts are easier to inspect and administer, but on some platforms are legendarily slow. Named volumes I think are reasonably fast, but you can only access them from Docker containers, which means needing to launch more containers to do basic things like backup and pruning.






              share|improve this answer



























                1














                Usually most communication between Docker containers is over network sockets. This is fine when you're talking to something like a relational database or an HTTP server. It sounds like your application is a little more about sharing files, though, and that's something Docker is a little less good at.



                If you only want one copy of each component, or are still actively developing the pipeline: I'd probably not use Docker for this. Since each container has an isolated filesystem and its own user ID space, sharing files can be unexpectedly tricky (every container must agree on numeric user IDs). But if you just run everything on the host, as the same user, pointing at the same directory, this isn't a problem.



                If you're trying to scale this in production: I'd add some sort of shared filesystem and a message queueing system like RabbitMQ. For local work this could be a Docker named volume or bind-mounted host directory; cloud storage like Amazon S3 will work fine too. The setup is like this:



                • Each component knows about the shared storage and connects to RabbitMQ, but is unaware of the other components.

                • Each component reads a message from a RabbitMQ queue that names a file to process.

                • The component reads the file and does its work.

                • When it finishes, the component writes the result file back to the shared storage, and writes its location to a RabbitMQ exchange.

                In this setup each component is totally stateless. If you discover that, for example, the machine-learning component of this is slowest, you can run duplicate copies of it. If something breaks, RabbitMQ will remember that a given message hasn't been fully processed (acknowledged); and again because of the isolation you can run that specific component locally to reproduce and fix the issue.



                This model also translates well to larger-scale Docker-based cluster-computing systems like Kubernetes.



                Running this locally, I would absolutely keep separate concerns in separate containers (especially if individual image-processing and ML tasks are expensive). The setup I propose needs both a message queue (to keep track of the work) and a shared filesystem (because message queues tend to not be optimized for 10+ MB individual messages). You get a choice between Docker named volumes and host bind-mounts as readily available shared storage. Bind mounts are easier to inspect and administer, but on some platforms are legendarily slow. Named volumes I think are reasonably fast, but you can only access them from Docker containers, which means needing to launch more containers to do basic things like backup and pruning.






                share|improve this answer

























                  1












                  1








                  1







                  Usually most communication between Docker containers is over network sockets. This is fine when you're talking to something like a relational database or an HTTP server. It sounds like your application is a little more about sharing files, though, and that's something Docker is a little less good at.



                  If you only want one copy of each component, or are still actively developing the pipeline: I'd probably not use Docker for this. Since each container has an isolated filesystem and its own user ID space, sharing files can be unexpectedly tricky (every container must agree on numeric user IDs). But if you just run everything on the host, as the same user, pointing at the same directory, this isn't a problem.



                  If you're trying to scale this in production: I'd add some sort of shared filesystem and a message queueing system like RabbitMQ. For local work this could be a Docker named volume or bind-mounted host directory; cloud storage like Amazon S3 will work fine too. The setup is like this:



                  • Each component knows about the shared storage and connects to RabbitMQ, but is unaware of the other components.

                  • Each component reads a message from a RabbitMQ queue that names a file to process.

                  • The component reads the file and does its work.

                  • When it finishes, the component writes the result file back to the shared storage, and writes its location to a RabbitMQ exchange.

                  In this setup each component is totally stateless. If you discover that, for example, the machine-learning component of this is slowest, you can run duplicate copies of it. If something breaks, RabbitMQ will remember that a given message hasn't been fully processed (acknowledged); and again because of the isolation you can run that specific component locally to reproduce and fix the issue.



                  This model also translates well to larger-scale Docker-based cluster-computing systems like Kubernetes.



                  Running this locally, I would absolutely keep separate concerns in separate containers (especially if individual image-processing and ML tasks are expensive). The setup I propose needs both a message queue (to keep track of the work) and a shared filesystem (because message queues tend to not be optimized for 10+ MB individual messages). You get a choice between Docker named volumes and host bind-mounts as readily available shared storage. Bind mounts are easier to inspect and administer, but on some platforms are legendarily slow. Named volumes I think are reasonably fast, but you can only access them from Docker containers, which means needing to launch more containers to do basic things like backup and pruning.






                  share|improve this answer













                  Usually most communication between Docker containers is over network sockets. This is fine when you're talking to something like a relational database or an HTTP server. It sounds like your application is a little more about sharing files, though, and that's something Docker is a little less good at.



                  If you only want one copy of each component, or are still actively developing the pipeline: I'd probably not use Docker for this. Since each container has an isolated filesystem and its own user ID space, sharing files can be unexpectedly tricky (every container must agree on numeric user IDs). But if you just run everything on the host, as the same user, pointing at the same directory, this isn't a problem.



                  If you're trying to scale this in production: I'd add some sort of shared filesystem and a message queueing system like RabbitMQ. For local work this could be a Docker named volume or bind-mounted host directory; cloud storage like Amazon S3 will work fine too. The setup is like this:



                  • Each component knows about the shared storage and connects to RabbitMQ, but is unaware of the other components.

                  • Each component reads a message from a RabbitMQ queue that names a file to process.

                  • The component reads the file and does its work.

                  • When it finishes, the component writes the result file back to the shared storage, and writes its location to a RabbitMQ exchange.

                  In this setup each component is totally stateless. If you discover that, for example, the machine-learning component of this is slowest, you can run duplicate copies of it. If something breaks, RabbitMQ will remember that a given message hasn't been fully processed (acknowledged); and again because of the isolation you can run that specific component locally to reproduce and fix the issue.



                  This model also translates well to larger-scale Docker-based cluster-computing systems like Kubernetes.



                  Running this locally, I would absolutely keep separate concerns in separate containers (especially if individual image-processing and ML tasks are expensive). The setup I propose needs both a message queue (to keep track of the work) and a shared filesystem (because message queues tend to not be optimized for 10+ MB individual messages). You get a choice between Docker named volumes and host bind-mounts as readily available shared storage. Bind mounts are easier to inspect and administer, but on some platforms are legendarily slow. Named volumes I think are reasonably fast, but you can only access them from Docker containers, which means needing to launch more containers to do basic things like backup and pruning.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 12 hours ago









                  David MazeDavid Maze

                  14.8k31328




                  14.8k31328























                      0














                      Alright, Let's unpack this:



                      • IMHO Shared Volume works just fine, but gets way too messy over time. Especially if you're handling Stateful services.

                      • MQ: This seems like a best option in my opinion. Yes, it's another component in your architecture, but it makes sense to have it rather than maintaining messy shared Volumes or handling massive container images (if you manage to combine 2 container images)

                      • Yes, You could potentially do this, but not a good idea. Considering your use case, I'm going to go ahead and make an assumption that you have a massive list of dependencies which could potentially lead to a conflict. Also, lot of dependencies = larger image = Larger attack surface - which from a security perspective is not a good thing.

                      If you really want to run multiple processes in one container, it's possible. There are multiple ways to achieve that, however I prefer supervisord.



                      https://docs.docker.com/config/containers/multi-service_container/






                      share|improve this answer



























                        0














                        Alright, Let's unpack this:



                        • IMHO Shared Volume works just fine, but gets way too messy over time. Especially if you're handling Stateful services.

                        • MQ: This seems like a best option in my opinion. Yes, it's another component in your architecture, but it makes sense to have it rather than maintaining messy shared Volumes or handling massive container images (if you manage to combine 2 container images)

                        • Yes, You could potentially do this, but not a good idea. Considering your use case, I'm going to go ahead and make an assumption that you have a massive list of dependencies which could potentially lead to a conflict. Also, lot of dependencies = larger image = Larger attack surface - which from a security perspective is not a good thing.

                        If you really want to run multiple processes in one container, it's possible. There are multiple ways to achieve that, however I prefer supervisord.



                        https://docs.docker.com/config/containers/multi-service_container/






                        share|improve this answer

























                          0












                          0








                          0







                          Alright, Let's unpack this:



                          • IMHO Shared Volume works just fine, but gets way too messy over time. Especially if you're handling Stateful services.

                          • MQ: This seems like a best option in my opinion. Yes, it's another component in your architecture, but it makes sense to have it rather than maintaining messy shared Volumes or handling massive container images (if you manage to combine 2 container images)

                          • Yes, You could potentially do this, but not a good idea. Considering your use case, I'm going to go ahead and make an assumption that you have a massive list of dependencies which could potentially lead to a conflict. Also, lot of dependencies = larger image = Larger attack surface - which from a security perspective is not a good thing.

                          If you really want to run multiple processes in one container, it's possible. There are multiple ways to achieve that, however I prefer supervisord.



                          https://docs.docker.com/config/containers/multi-service_container/






                          share|improve this answer













                          Alright, Let's unpack this:



                          • IMHO Shared Volume works just fine, but gets way too messy over time. Especially if you're handling Stateful services.

                          • MQ: This seems like a best option in my opinion. Yes, it's another component in your architecture, but it makes sense to have it rather than maintaining messy shared Volumes or handling massive container images (if you manage to combine 2 container images)

                          • Yes, You could potentially do this, but not a good idea. Considering your use case, I'm going to go ahead and make an assumption that you have a massive list of dependencies which could potentially lead to a conflict. Also, lot of dependencies = larger image = Larger attack surface - which from a security perspective is not a good thing.

                          If you really want to run multiple processes in one container, it's possible. There are multiple ways to achieve that, however I prefer supervisord.



                          https://docs.docker.com/config/containers/multi-service_container/







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered 12 hours ago









                          Subramanya VajirayaSubramanya Vajiraya

                          17210




                          17210



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55021273%2fdocker-design-exchange-data-between-containers-or-put-multiple-processes-in-one%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              1928 у кіно

                              Захаров Федір Захарович

                              Ель Греко