How to know what kind of work each Spark task/executor runs Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!SPARK_EXECUTOR_INSTANCES not working in SPARK SHELL, YARN CLIENT MODEI am getting the executor running beyond memory limits when running big join in sparkSpark Worker asking for absurd amounts of virtual memoryAdd extra classpath to executors in Spark client modemodule error in multi-node spark job on google cloud clusterAWS EMR Spark - get CSV And use with SparkSql apiSimple ETL job in AWS Glue says “File Already Exists”Spark on yarn runs indefinityTensorflow on Hadoop/Spark using TensorflowOnSparkWhat happens when OutOfMemory Error happens on spark container

preposition before coffee

How do I find out the mythology and history of my Fortress?

Should a wizard buy fine inks every time he want to copy spells into his spellbook?

Project Euler #1 in C++

Most bit efficient text communication method?

If Windows 7 doesn't support WSL, then what is "Subsystem for UNIX-based Applications"?

How does Belgium enforce obligatory attendance in elections?

Why can't I install Tomboy in Ubuntu Mate 19.04?

Putting class ranking in CV, but against dept guidelines

What would you call this weird metallic apparatus that allows you to lift people?

The test team as an enemy of development? And how can this be avoided?

Deconstruction is ambiguous

What is the chair depicted in Cesare Maccari's 1889 painting "Cicerone denuncia Catilina"?

Does "shooting for effect" have contradictory meanings in different areas?

Co-worker has annoying ringtone

Crossing US/Canada Border for less than 24 hours

Is CEO the "profession" with the most psychopaths?

What is the difference between a "ranged attack" and a "ranged weapon attack"?

Can a Beast Master ranger change beast companions?

How can I set the aperture on my DSLR when it's attached to a telescope instead of a lens?

How can I prevent/balance waiting and turtling as a response to cooldown mechanics

How often does castling occur in grandmaster games?

The Nth Gryphon Number

Why is it faster to reheat something than it is to cook it?



How to know what kind of work each Spark task/executor runs



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!SPARK_EXECUTOR_INSTANCES not working in SPARK SHELL, YARN CLIENT MODEI am getting the executor running beyond memory limits when running big join in sparkSpark Worker asking for absurd amounts of virtual memoryAdd extra classpath to executors in Spark client modemodule error in multi-node spark job on google cloud clusterAWS EMR Spark - get CSV And use with SparkSql apiSimple ETL job in AWS Glue says “File Already Exists”Spark on yarn runs indefinityTensorflow on Hadoop/Spark using TensorflowOnSparkWhat happens when OutOfMemory Error happens on spark container



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















When my application runs on a Spark cluster, I know the following



1) the execution plan



2) the DAG with nodes as RDD or operations



3) all jobs/stages/executors/tasks



However, I do not find how to know given a task ID what kinds of work (RDD or operations) the task does.



From a task, I can know its executor ID and which machine it runs. On the machine, if we grep Java and the ID, we can get



/bin/bash -c /usr/lib/jvm/jdk1.8.0_192/bin/java -server -Xmx12288m '-XX:MaxMetaspaceSize=256M' '-Djava.library.path=/opt/hadoop/lib/native' '-Djava.util.logging.config.file=/opt/spark2/conf/parquet.logging.properties' -Djava.io.tmpdir=/tmp/hadoop-root/nmlocaldir/usercache/appcache/application_1549756402460_92964/container_1549756402460_92964_01_000012/tmp '-Dspark.driver.port=35617' '-Dspark.network.timeout=3000s' -Dspark.yarn.app.container.log.dir=/mnt/yarn-logs/userlogs/application_1549756402460_92964/container_1549756402460_92964_01_000012 -XX:OnOutOfMemoryError='kill %p' org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@10.0.72.160:35617 --executor-id 11 --hostname abc --cores 3 --app-id application_1549756402460_92964 --user-class-path file:/tmp/hadoop-root/nm-local-dir/usercache/appcache/application_1549756402460_92964/container_1549756402460_92964_01_000012/__app__.jar 1>/mnt/yarn-logs/userlogs/application_1549756402460_92964/container_1549756402460_92964_01_000012/stdout 2> /mnt/yarn-logs/userlogs/application_1549756402460_92964/container_1549756402460_92964_01_000012/stderr


But it does not tell me what it does... Does Spark expose the information?










share|improve this question






















  • Yes, I have the same issue, I just add some logs to the workers code and than I see what is going on.

    – Ehud Lev
    Mar 10 at 13:04

















1















When my application runs on a Spark cluster, I know the following



1) the execution plan



2) the DAG with nodes as RDD or operations



3) all jobs/stages/executors/tasks



However, I do not find how to know given a task ID what kinds of work (RDD or operations) the task does.



From a task, I can know its executor ID and which machine it runs. On the machine, if we grep Java and the ID, we can get



/bin/bash -c /usr/lib/jvm/jdk1.8.0_192/bin/java -server -Xmx12288m '-XX:MaxMetaspaceSize=256M' '-Djava.library.path=/opt/hadoop/lib/native' '-Djava.util.logging.config.file=/opt/spark2/conf/parquet.logging.properties' -Djava.io.tmpdir=/tmp/hadoop-root/nmlocaldir/usercache/appcache/application_1549756402460_92964/container_1549756402460_92964_01_000012/tmp '-Dspark.driver.port=35617' '-Dspark.network.timeout=3000s' -Dspark.yarn.app.container.log.dir=/mnt/yarn-logs/userlogs/application_1549756402460_92964/container_1549756402460_92964_01_000012 -XX:OnOutOfMemoryError='kill %p' org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@10.0.72.160:35617 --executor-id 11 --hostname abc --cores 3 --app-id application_1549756402460_92964 --user-class-path file:/tmp/hadoop-root/nm-local-dir/usercache/appcache/application_1549756402460_92964/container_1549756402460_92964_01_000012/__app__.jar 1>/mnt/yarn-logs/userlogs/application_1549756402460_92964/container_1549756402460_92964_01_000012/stdout 2> /mnt/yarn-logs/userlogs/application_1549756402460_92964/container_1549756402460_92964_01_000012/stderr


But it does not tell me what it does... Does Spark expose the information?










share|improve this question






















  • Yes, I have the same issue, I just add some logs to the workers code and than I see what is going on.

    – Ehud Lev
    Mar 10 at 13:04













1












1








1








When my application runs on a Spark cluster, I know the following



1) the execution plan



2) the DAG with nodes as RDD or operations



3) all jobs/stages/executors/tasks



However, I do not find how to know given a task ID what kinds of work (RDD or operations) the task does.



From a task, I can know its executor ID and which machine it runs. On the machine, if we grep Java and the ID, we can get



/bin/bash -c /usr/lib/jvm/jdk1.8.0_192/bin/java -server -Xmx12288m '-XX:MaxMetaspaceSize=256M' '-Djava.library.path=/opt/hadoop/lib/native' '-Djava.util.logging.config.file=/opt/spark2/conf/parquet.logging.properties' -Djava.io.tmpdir=/tmp/hadoop-root/nmlocaldir/usercache/appcache/application_1549756402460_92964/container_1549756402460_92964_01_000012/tmp '-Dspark.driver.port=35617' '-Dspark.network.timeout=3000s' -Dspark.yarn.app.container.log.dir=/mnt/yarn-logs/userlogs/application_1549756402460_92964/container_1549756402460_92964_01_000012 -XX:OnOutOfMemoryError='kill %p' org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@10.0.72.160:35617 --executor-id 11 --hostname abc --cores 3 --app-id application_1549756402460_92964 --user-class-path file:/tmp/hadoop-root/nm-local-dir/usercache/appcache/application_1549756402460_92964/container_1549756402460_92964_01_000012/__app__.jar 1>/mnt/yarn-logs/userlogs/application_1549756402460_92964/container_1549756402460_92964_01_000012/stdout 2> /mnt/yarn-logs/userlogs/application_1549756402460_92964/container_1549756402460_92964_01_000012/stderr


But it does not tell me what it does... Does Spark expose the information?










share|improve this question














When my application runs on a Spark cluster, I know the following



1) the execution plan



2) the DAG with nodes as RDD or operations



3) all jobs/stages/executors/tasks



However, I do not find how to know given a task ID what kinds of work (RDD or operations) the task does.



From a task, I can know its executor ID and which machine it runs. On the machine, if we grep Java and the ID, we can get



/bin/bash -c /usr/lib/jvm/jdk1.8.0_192/bin/java -server -Xmx12288m '-XX:MaxMetaspaceSize=256M' '-Djava.library.path=/opt/hadoop/lib/native' '-Djava.util.logging.config.file=/opt/spark2/conf/parquet.logging.properties' -Djava.io.tmpdir=/tmp/hadoop-root/nmlocaldir/usercache/appcache/application_1549756402460_92964/container_1549756402460_92964_01_000012/tmp '-Dspark.driver.port=35617' '-Dspark.network.timeout=3000s' -Dspark.yarn.app.container.log.dir=/mnt/yarn-logs/userlogs/application_1549756402460_92964/container_1549756402460_92964_01_000012 -XX:OnOutOfMemoryError='kill %p' org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@10.0.72.160:35617 --executor-id 11 --hostname abc --cores 3 --app-id application_1549756402460_92964 --user-class-path file:/tmp/hadoop-root/nm-local-dir/usercache/appcache/application_1549756402460_92964/container_1549756402460_92964_01_000012/__app__.jar 1>/mnt/yarn-logs/userlogs/application_1549756402460_92964/container_1549756402460_92964_01_000012/stdout 2> /mnt/yarn-logs/userlogs/application_1549756402460_92964/container_1549756402460_92964_01_000012/stderr


But it does not tell me what it does... Does Spark expose the information?







apache-spark pyspark spark-ui






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 8 at 21:56









Joe CJoe C

1,24111428




1,24111428












  • Yes, I have the same issue, I just add some logs to the workers code and than I see what is going on.

    – Ehud Lev
    Mar 10 at 13:04

















  • Yes, I have the same issue, I just add some logs to the workers code and than I see what is going on.

    – Ehud Lev
    Mar 10 at 13:04
















Yes, I have the same issue, I just add some logs to the workers code and than I see what is going on.

– Ehud Lev
Mar 10 at 13:04





Yes, I have the same issue, I just add some logs to the workers code and than I see what is going on.

– Ehud Lev
Mar 10 at 13:04












0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55071551%2fhow-to-know-what-kind-of-work-each-spark-task-executor-runs%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55071551%2fhow-to-know-what-kind-of-work-each-spark-task-executor-runs%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Save data to MySQL database using ExtJS and PHP [closed]2019 Community Moderator ElectionHow can I prevent SQL injection in PHP?Which MySQL data type to use for storing boolean valuesPHP: Delete an element from an arrayHow do I connect to a MySQL Database in Python?Should I use the datetime or timestamp data type in MySQL?How to get a list of MySQL user accountsHow Do You Parse and Process HTML/XML in PHP?Reference — What does this symbol mean in PHP?How does PHP 'foreach' actually work?Why shouldn't I use mysql_* functions in PHP?

Compiling GNU Global with universal-ctags support Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Tags for Emacs: Relationship between etags, ebrowse, cscope, GNU Global and exuberant ctagsVim and Ctags tips and trickscscope or ctags why choose one over the other?scons and ctagsctags cannot open option file “.ctags”Adding tag scopes in universal-ctagsShould I use Universal-ctags?Universal ctags on WindowsHow do I install GNU Global with universal ctags support using Homebrew?Universal ctags with emacsHow to highlight ctags generated by Universal Ctags in Vim?

Add ONERROR event to image from jsp tldHow to add an image to a JPanel?Saving image from PHP URLHTML img scalingCheck if an image is loaded (no errors) with jQueryHow to force an <img> to take up width, even if the image is not loadedHow do I populate hidden form field with a value set in Spring ControllerStyling Raw elements Generated from JSP tagds with Jquery MobileLimit resizing of images with explicitly set width and height attributeserror TLD use in a jsp fileJsp tld files cannot be resolved