How to maximize throughput when processing many filesHow do I copy a file in Python?How many files can I put in a directory?When and how should I use a ThreadLocal variable?Improve INSERT-per-second performance of SQLite?Technically, why are processes in Erlang more efficient than OS threads?What is the Haskell response to Node.js?Disk Throughput - new file vs. existing file using ddHow does HDFS write to a disk on the data nodeReplacing a 32-bit loop counter with 64-bit introduces crazy performance deviationsPoor disk read throughput in Neo4j when database is hosted on SAN

Grepping string, but include all non-blank lines following each grep match

Should I warn new/prospective PhD Student that supervisor is terrible?

Why is the Sun approximated as a black body at ~ 5800 K?

How do you justify more code being written by following clean code practices?

Quoting Keynes in a lecture

Alignment of six matrices

Check if object is null and return null

Echo with obfuscation

Visualizing the difference curve in a 2D plot?

In One Punch Man, is King actually weak?

Make a Bowl of Alphabet Soup

Personal or impersonal in a technical resume

Given this phrasing in the lease, when should I pay my rent?

How can I, as DM, avoid the Conga Line of Death occurring when implementing some form of flanking rule?

What's the name of the logical fallacy where a debater extends a statement far beyond the original statement to make it true?

Why didn’t Eve recognize the little cockroach as a living organism?

Limit max CPU usage SQL SERVER with WSRM

How much do grades matter for a future academia position?

How to make money from a browser who sees 5 seconds into the future of any web page?

How would you translate "more" for use as an interface button?

Giving feedback to someone without sounding prejudiced

How to get directions in deep space?

What is this high flying aircraft over Pennsylvania?

I'm just a whisper. Who am I?

How to maximize throughput when processing many files

How do I copy a file in Python?How many files can I put in a directory?When and how should I use a ThreadLocal variable?Improve INSERT-per-second performance of SQLite?Technically, why are processes in Erlang more efficient than OS threads?What is the Haskell response to Node.js?Disk Throughput - new file vs. existing file using ddHow does HDFS write to a disk on the data nodeReplacing a 32-bit loop counter with 64-bit introduces crazy performance deviationsPoor disk read throughput in Neo4j when database is hosted on SAN

Say you want to process many files as quickly as possible, where processing time > file read time.

Will reading multiple files using a thread pool increase throughput? or does it just cause more disk contention?

If a thread pool does help, what determines how many threads are needed to achieve the maximum? can this be calculated based on the target system?

For a single core, will a loop reading and processing asynchronously via threads be faster than doing it synchronously? I assume since disk latency is so high, it would be. But maybe if the file read is much much smaller than processing time, it is better to let the processing step finish uninterrupted without context switches.

Also, do you have any other tips for maximizing disk throughput?

asked Mar 7 at 3:23

Azmisov

2,35842953

any other tips for maximizing disk throughput? Buy faster disks and be done with the problem for all time, without worrying about bugs in your processing algorithms, spending time and money writing code, and having to maintain all that code in the future.

– Andrew Henle
Mar 7 at 10:22

It depends on a lot of factors. OS, available CPU's, available memory, disk performance and so on. When the files are small < 1 MB and they're just a few hundrets then maybe reading them all into memory and then processing them can be faster than reading, precessing, reading and so on. But I belive you have to test and profile things on your own.

– user743414
Mar 7 at 13:54

@AndrewHenle That's always good to keep in mind. Though, if the software is intended to run on a variety of different hardware/OS configurations, like a framework, you still would want to employ some software-based techniques as well.

– Azmisov
Mar 15 at 0:44

@user743414 Despite the numerous possible configurations, I suspect the internals of reading data from disk is implemented pretty similarly across the board for various motherboards, CPU's, RAM, etc. I was hoping someone with more expertise on the internals could describe the general principles, without having to benchmark across many rigs.

– Azmisov
Mar 15 at 0:48

add a comment |

Say you want to process many files as quickly as possible, where processing time > file read time.

Will reading multiple files using a thread pool increase throughput? or does it just cause more disk contention?

If a thread pool does help, what determines how many threads are needed to achieve the maximum? can this be calculated based on the target system?

For a single core, will a loop reading and processing asynchronously via threads be faster than doing it synchronously? I assume since disk latency is so high, it would be. But maybe if the file read is much much smaller than processing time, it is better to let the processing step finish uninterrupted without context switches.

Also, do you have any other tips for maximizing disk throughput?

asked Mar 7 at 3:23

Azmisov

2,35842953

any other tips for maximizing disk throughput? Buy faster disks and be done with the problem for all time, without worrying about bugs in your processing algorithms, spending time and money writing code, and having to maintain all that code in the future.

– Andrew Henle
Mar 7 at 10:22

It depends on a lot of factors. OS, available CPU's, available memory, disk performance and so on. When the files are small < 1 MB and they're just a few hundrets then maybe reading them all into memory and then processing them can be faster than reading, precessing, reading and so on. But I belive you have to test and profile things on your own.

– user743414
Mar 7 at 13:54

@AndrewHenle That's always good to keep in mind. Though, if the software is intended to run on a variety of different hardware/OS configurations, like a framework, you still would want to employ some software-based techniques as well.

– Azmisov
Mar 15 at 0:44

@user743414 Despite the numerous possible configurations, I suspect the internals of reading data from disk is implemented pretty similarly across the board for various motherboards, CPU's, RAM, etc. I was hoping someone with more expertise on the internals could describe the general principles, without having to benchmark across many rigs.

– Azmisov
Mar 15 at 0:48

add a comment |

Say you want to process many files as quickly as possible, where processing time > file read time.

Will reading multiple files using a thread pool increase throughput? or does it just cause more disk contention?

If a thread pool does help, what determines how many threads are needed to achieve the maximum? can this be calculated based on the target system?

For a single core, will a loop reading and processing asynchronously via threads be faster than doing it synchronously? I assume since disk latency is so high, it would be. But maybe if the file read is much much smaller than processing time, it is better to let the processing step finish uninterrupted without context switches.

Also, do you have any other tips for maximizing disk throughput?

asked Mar 7 at 3:23

Azmisov

2,35842953

Say you want to process many files as quickly as possible, where processing time > file read time.

Will reading multiple files using a thread pool increase throughput? or does it just cause more disk contention?

If a thread pool does help, what determines how many threads are needed to achieve the maximum? can this be calculated based on the target system?

For a single core, will a loop reading and processing asynchronously via threads be faster than doing it synchronously? I assume since disk latency is so high, it would be. But maybe if the file read is much much smaller than processing time, it is better to let the processing step finish uninterrupted without context switches.

Also, do you have any other tips for maximizing disk throughput?

multithreading optimization operating-system filesystems disk

asked Mar 7 at 3:23

Azmisov

2,35842953

asked Mar 7 at 3:23

Azmisov

2,35842953

asked Mar 7 at 3:23

Azmisov

2,35842953

asked Mar 7 at 3:23

Azmisov

2,35842953

asked Mar 7 at 3:23

Azmisov

2,35842953

any other tips for maximizing disk throughput? Buy faster disks and be done with the problem for all time, without worrying about bugs in your processing algorithms, spending time and money writing code, and having to maintain all that code in the future.

– Andrew Henle
Mar 7 at 10:22

It depends on a lot of factors. OS, available CPU's, available memory, disk performance and so on. When the files are small < 1 MB and they're just a few hundrets then maybe reading them all into memory and then processing them can be faster than reading, precessing, reading and so on. But I belive you have to test and profile things on your own.

– user743414
Mar 7 at 13:54

@AndrewHenle That's always good to keep in mind. Though, if the software is intended to run on a variety of different hardware/OS configurations, like a framework, you still would want to employ some software-based techniques as well.

– Azmisov
Mar 15 at 0:44

@user743414 Despite the numerous possible configurations, I suspect the internals of reading data from disk is implemented pretty similarly across the board for various motherboards, CPU's, RAM, etc. I was hoping someone with more expertise on the internals could describe the general principles, without having to benchmark across many rigs.

– Azmisov
Mar 15 at 0:48

add a comment |

any other tips for maximizing disk throughput? Buy faster disks and be done with the problem for all time, without worrying about bugs in your processing algorithms, spending time and money writing code, and having to maintain all that code in the future.

– Andrew Henle
Mar 7 at 10:22

It depends on a lot of factors. OS, available CPU's, available memory, disk performance and so on. When the files are small < 1 MB and they're just a few hundrets then maybe reading them all into memory and then processing them can be faster than reading, precessing, reading and so on. But I belive you have to test and profile things on your own.

– user743414
Mar 7 at 13:54

@AndrewHenle That's always good to keep in mind. Though, if the software is intended to run on a variety of different hardware/OS configurations, like a framework, you still would want to employ some software-based techniques as well.

– Azmisov
Mar 15 at 0:44

@user743414 Despite the numerous possible configurations, I suspect the internals of reading data from disk is implemented pretty similarly across the board for various motherboards, CPU's, RAM, etc. I was hoping someone with more expertise on the internals could describe the general principles, without having to benchmark across many rigs.

– Azmisov
Mar 15 at 0:48

any other tips for maximizing disk throughput? Buy faster disks and be done with the problem for all time, without worrying about bugs in your processing algorithms, spending time and money writing code, and having to maintain all that code in the future.

– Andrew Henle
Mar 7 at 10:22

It depends on a lot of factors. OS, available CPU's, available memory, disk performance and so on. When the files are small < 1 MB and they're just a few hundrets then maybe reading them all into memory and then processing them can be faster than reading, precessing, reading and so on. But I belive you have to test and profile things on your own.

– user743414
Mar 7 at 13:54

@AndrewHenle That's always good to keep in mind. Though, if the software is intended to run on a variety of different hardware/OS configurations, like a framework, you still would want to employ some software-based techniques as well.

– Azmisov
Mar 15 at 0:44

@user743414 Despite the numerous possible configurations, I suspect the internals of reading data from disk is implemented pretty similarly across the board for various motherboards, CPU's, RAM, etc. I was hoping someone with more expertise on the internals could describe the general principles, without having to benchmark across many rigs.

– Azmisov
Mar 15 at 0:48

add a comment |

1 Answer
1

active

oldest

votes

I did some benchmarking to come up with some general guidelines. I tested with about ~500k smallish (~14kb) files. I think the results should be similar for medium sized files; but for larger files, I suspect disk contention becomes more significant. It would be appreciated if someone with deeper knowledge of OS/hardware internals could supplement this answer with more concrete explanations for why some things are faster than others.

I tested with a 16 virtual core (8 physical) computer with dual channel RAM and Linux kernel 4.18.

Do multiple threads increase read throughput?

The answer is yes. I think this could be either due to 1) a hardware bandwidth limitation for single threaded applications or 2) the OS's disk request queue is better utilized when many threads are making requests. The best performance is with virtual_cores*2 threads. Throughput slowly degrades beyond that, perhaps because of increased disk contention. If the pages happen to be cached in RAM, then it is better to have a thread pool of size virtual_cores. If however < 50% of pages are cached (which I think is the more common case), then virtual_cores*2 will do just fine.

I think the reason why virtual_cores*2 is better than just virtual_cores is that a file read also includes some non-disk related latency like system calls, decoding, etc. So perhaps the processor can interleave the threads more effectively: while one is waiting on the disk, a second can be executing the non-disk related file read operations. (Could it also be due to the fact that the RAM is dual channel?)

I tested reading random files vs sequentially (by looking up the files' physical block location in storage, and ordering the requests by this). Sequential access gives a pretty significant improvement with HDDs, which is to be expected. If the limiting factor in your application is file read time, as opposed to the processing of said files, I suggest you reorder the requests for sequential access to get a boost.

read throughput vs thread count

There is the possibility to use asynchronous disk IO, instead of a thread pool. However, from my readings it appears there is not a portable way to do it yet (see this reddit thread). Also, libuv which powers NodeJS uses a thread pool to handle its file IO.

Balancing read vs processing throughput

Ideally, we could have reading and processing in separate threads. While we are processing the first file, we can be queuing up the next one in another thread. But the more threads we allocate for reading files, the more CPU contention with the processing threads. The solution is to give the faster operation (reading vs processing) the fewest number of threads while still giving zero processing delay between files. This formula seemed to give good results in my tests:

prop = read_time/process_time
if prop > 1:
 # double virtual core count gives fastest reads, as per tests above
 read_threads = virtual_cores*2
 process_threads = ceil(read_threads/(2*prop))
else:
 process_threads = virtual_cores
 # double read thread pool so CPU can interleave better, as mentioned above
 read_threads = 2*ceil(process_threads*prop)

For example: Read = 2s, Process = 10s; so have 2 reading threads for every 5 processing threads

In my tests, there is only about a 1-1.5% performance penalty for having extra reading threads. In my tests, for a prop close to zero, 1 read + 16 process threads had nearly the same throughput as 32 read + 16 process threads. Modern threads should be pretty lightweight, and the read threads should be sleeping anyways if the files aren't being consumed fast enough. (The same should be true of process threads when prop is very large)

On the other hand, too few reading threads has a much more significant impact (my third original question). For example, for a very large prop, 1 read + 16 process threads was 36% slower than 1 read + 15 process threads. Since the process threads are occupying all the benchmark computer's cores, the read thread has too much CPU contention and fails 36% of the time to queue up the next file to be processed. So, my recommendation is to err in favor of too many read threads. Doubling the read thread pool size as in my formula above should accomplish this.

Side note: You can limit the CPU resources your application consumes by setting virtual_cores to be a smaller percentage of the available cores. You may also choose to forego doubling, since CPU contention may be less of an issue when there is a spare core or more that is not executing the more intensive processing threads.

Summary

Based on my test results, using a thread pool with virtual_cores*2 file reading threads + virtual_cores file processing threads, will give you good performance for a variety of different timing scenarios. This configuration should give you within ~2% of the maximal throughput, without having to spend lots of time benchmarking.

edited Mar 15 at 0:49

answered Mar 15 at 0:41

Azmisov

2,35842953

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55035572%2fhow-to-maximize-throughput-when-processing-many-files%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I tested with a 16 virtual core (8 physical) computer with dual channel RAM and Linux kernel 4.18.

Do multiple threads increase read throughput?

read throughput vs thread count

Balancing read vs processing throughput

prop = read_time/process_time
if prop > 1:
 # double virtual core count gives fastest reads, as per tests above
 read_threads = virtual_cores*2
 process_threads = ceil(read_threads/(2*prop))
else:
 process_threads = virtual_cores
 # double read thread pool so CPU can interleave better, as mentioned above
 read_threads = 2*ceil(process_threads*prop)

For example: Read = 2s, Process = 10s; so have 2 reading threads for every 5 processing threads

Summary

edited Mar 15 at 0:49

answered Mar 15 at 0:41

Azmisov

2,35842953

add a comment |

I tested with a 16 virtual core (8 physical) computer with dual channel RAM and Linux kernel 4.18.

Do multiple threads increase read throughput?

read throughput vs thread count

Balancing read vs processing throughput

prop = read_time/process_time
if prop > 1:
 # double virtual core count gives fastest reads, as per tests above
 read_threads = virtual_cores*2
 process_threads = ceil(read_threads/(2*prop))
else:
 process_threads = virtual_cores
 # double read thread pool so CPU can interleave better, as mentioned above
 read_threads = 2*ceil(process_threads*prop)

For example: Read = 2s, Process = 10s; so have 2 reading threads for every 5 processing threads

Summary

edited Mar 15 at 0:49

answered Mar 15 at 0:41

Azmisov

2,35842953

add a comment |

I tested with a 16 virtual core (8 physical) computer with dual channel RAM and Linux kernel 4.18.

Do multiple threads increase read throughput?

read throughput vs thread count

Balancing read vs processing throughput

prop = read_time/process_time
if prop > 1:
 # double virtual core count gives fastest reads, as per tests above
 read_threads = virtual_cores*2
 process_threads = ceil(read_threads/(2*prop))
else:
 process_threads = virtual_cores
 # double read thread pool so CPU can interleave better, as mentioned above
 read_threads = 2*ceil(process_threads*prop)

For example: Read = 2s, Process = 10s; so have 2 reading threads for every 5 processing threads

Summary

edited Mar 15 at 0:49

answered Mar 15 at 0:41

Azmisov

2,35842953

I tested with a 16 virtual core (8 physical) computer with dual channel RAM and Linux kernel 4.18.

Do multiple threads increase read throughput?

read throughput vs thread count

Balancing read vs processing throughput

prop = read_time/process_time
if prop > 1:
 # double virtual core count gives fastest reads, as per tests above
 read_threads = virtual_cores*2
 process_threads = ceil(read_threads/(2*prop))
else:
 process_threads = virtual_cores
 # double read thread pool so CPU can interleave better, as mentioned above
 read_threads = 2*ceil(process_threads*prop)

For example: Read = 2s, Process = 10s; so have 2 reading threads for every 5 processing threads

Summary

edited Mar 15 at 0:49

answered Mar 15 at 0:41

Azmisov

2,35842953

edited Mar 15 at 0:49

answered Mar 15 at 0:41

Azmisov

2,35842953

answered Mar 15 at 0:41

Azmisov

2,35842953

answered Mar 15 at 0:41

Azmisov

2,35842953

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ufdjrw

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1