spark structured stream read to hdfs files fails if data is read immediately Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) The Ask Question Wizard is Live! Data science time! April 2019 and salary with experience Should we burninate the [wrap] tag?How to read from Hive HDFS in spark 1.6?spark structured streaming: query incoming data via HiveParquet data and partition issue in Spark Structured streamingHow to create an EXTERNAL Spark table from data in HDFSSpark structured streaming how to control hdfs read number of partitionsSpark Structured Streaming Writestream to Hive ORC Partioned External TableAccessing Hive Tables from Spark SQL when Data is Stored in Object StorageHow to insert spark structured streaming DataFrame to Hive external table/location?spark structured streaming producing .c000.csv filesparquet fields showing NULL when reading through HIVE, BUT showing values when reading through spark

Is 1 ppb equal to 1 μg/kg?

How does a Death Domain cleric's Touch of Death feature work with Touch-range spells delivered by familiars?

How can players work together to take actions that are otherwise impossible?

Center align columns in table ignoring minus signs?

Why aren't air breathing engines used as small first stages

Is there a service that would inform me whenever a new direct route is scheduled from a given airport?

Is the Standard Deduction better than Itemized when both are the same amount?

What are the pros and cons of Aerospike nosecones?

Proof involving the spectral radius and the Jordan canonical form

When to stop saving and start investing?

Right-skewed distribution with mean equals to mode?

Is there a documented rationale why the House Ways and Means chairman can demand tax info?

How discoverable are IPv6 addresses and AAAA names by potential attackers?

Is there a concise way to say "all of the X, one of each"?

How widely used is the term Treppenwitz? Is it something that most Germans know?

How much radiation do nuclear physics experiments expose researchers to nowadays?

What's the purpose of writing one's academic bio in 3rd person?

Did Kevin spill real chili?

How to bypass password on Windows XP account?

Diagram with tikz

Why did the IBM 650 use bi-quinary?

What makes black pepper strong or mild?

Disable hyphenation for an entire paragraph

Why does Python start at index -1 when indexing a list from the end?

spark structured stream read to hdfs files fails if data is read immediately

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)

The Ask Question Wizard is Live!

Data science time! April 2019 and salary with experience

Should we burninate the [wrap] tag?How to read from Hive HDFS in spark 1.6?spark structured streaming: query incoming data via HiveParquet data and partition issue in Spark Structured streamingHow to create an EXTERNAL Spark table from data in HDFSSpark structured streaming how to control hdfs read number of partitionsSpark Structured Streaming Writestream to Hive ORC Partioned External TableAccessing Hive Tables from Spark SQL when Data is Stored in Object StorageHow to insert spark structured streaming DataFrame to Hive external table/location?spark structured streaming producing .c000.csv filesparquet fields showing NULL when reading through HIVE, BUT showing values when reading through spark

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I'd like to load a Hive table (target_table) as a DataFrame after writing a new batch out to HDFS (target_table_dir) using Spark Structured Streaming as follows:

df.writeStream
 .trigger(processingTime='5 seconds')
 .foreachBatch(lambda df, partition_id:
 df.write
 .option("path", target_table_dir)
 .format("parquet")
 .mode("append")
 .saveAsTable(target_table))
 .start()

When we immediately read same data back from the Hive table we get a "partition not found exception". If we read with some delay, we have data correct.

It seems that Spark is still writing data to HDFS while execution has stopped and Hive Metastore is updated but data is still being written out to HDFS.

How to know when the writing of data to the Hive table (into the HDFS) is complete?

Note:
we have found that if we use processAllAvailable() after writing out,subsequent read works fine.but processAllAvailable() will block execution forever if we are dealing with continuous streams

edited Mar 15 at 10:59

asked Mar 8 at 16:08

Vish

51961541

How do you do "when we immediately try to read same data back from hive table.we are getting partition not found exception."? How do you know when a table is appended (after a 5-sec trigger is executed)?

– Jacek Laskowski
Mar 13 at 17:47

we are not able to identify when table is appended.As part of workflow ,we are trying to read back data as next step and getting error.We need way to identify that table append is finished

– Vish
Mar 14 at 17:49

@JacekLaskowski ..we have found that if we use processAllAvailable() after writing out.subsequent read works fine..but processAllAvailable() will block execution forever if we are dealing with continuous streams.

– Vish
Mar 15 at 10:58

add a comment |

I'd like to load a Hive table (target_table) as a DataFrame after writing a new batch out to HDFS (target_table_dir) using Spark Structured Streaming as follows:

df.writeStream
 .trigger(processingTime='5 seconds')
 .foreachBatch(lambda df, partition_id:
 df.write
 .option("path", target_table_dir)
 .format("parquet")
 .mode("append")
 .saveAsTable(target_table))
 .start()

When we immediately read same data back from the Hive table we get a "partition not found exception". If we read with some delay, we have data correct.

It seems that Spark is still writing data to HDFS while execution has stopped and Hive Metastore is updated but data is still being written out to HDFS.

How to know when the writing of data to the Hive table (into the HDFS) is complete?

Note:
we have found that if we use processAllAvailable() after writing out,subsequent read works fine.but processAllAvailable() will block execution forever if we are dealing with continuous streams

edited Mar 15 at 10:59

asked Mar 8 at 16:08

Vish

51961541

How do you do "when we immediately try to read same data back from hive table.we are getting partition not found exception."? How do you know when a table is appended (after a 5-sec trigger is executed)?

– Jacek Laskowski
Mar 13 at 17:47

we are not able to identify when table is appended.As part of workflow ,we are trying to read back data as next step and getting error.We need way to identify that table append is finished

– Vish
Mar 14 at 17:49

@JacekLaskowski ..we have found that if we use processAllAvailable() after writing out.subsequent read works fine..but processAllAvailable() will block execution forever if we are dealing with continuous streams.

– Vish
Mar 15 at 10:58

add a comment |

I'd like to load a Hive table (target_table) as a DataFrame after writing a new batch out to HDFS (target_table_dir) using Spark Structured Streaming as follows:

df.writeStream
 .trigger(processingTime='5 seconds')
 .foreachBatch(lambda df, partition_id:
 df.write
 .option("path", target_table_dir)
 .format("parquet")
 .mode("append")
 .saveAsTable(target_table))
 .start()

When we immediately read same data back from the Hive table we get a "partition not found exception". If we read with some delay, we have data correct.

It seems that Spark is still writing data to HDFS while execution has stopped and Hive Metastore is updated but data is still being written out to HDFS.

How to know when the writing of data to the Hive table (into the HDFS) is complete?

Note:
we have found that if we use processAllAvailable() after writing out,subsequent read works fine.but processAllAvailable() will block execution forever if we are dealing with continuous streams

edited Mar 15 at 10:59

asked Mar 8 at 16:08

Vish

51961541

I'd like to load a Hive table (target_table) as a DataFrame after writing a new batch out to HDFS (target_table_dir) using Spark Structured Streaming as follows:

df.writeStream
 .trigger(processingTime='5 seconds')
 .foreachBatch(lambda df, partition_id:
 df.write
 .option("path", target_table_dir)
 .format("parquet")
 .mode("append")
 .saveAsTable(target_table))
 .start()

When we immediately read same data back from the Hive table we get a "partition not found exception". If we read with some delay, we have data correct.

It seems that Spark is still writing data to HDFS while execution has stopped and Hive Metastore is updated but data is still being written out to HDFS.

How to know when the writing of data to the Hive table (into the HDFS) is complete?

Note:
we have found that if we use processAllAvailable() after writing out,subsequent read works fine.but processAllAvailable() will block execution forever if we are dealing with continuous streams

apache-spark hive hdfs spark-structured-streaming

edited Mar 15 at 10:59

asked Mar 8 at 16:08

Vish

51961541

edited Mar 15 at 10:59

asked Mar 8 at 16:08

Vish

51961541

edited Mar 15 at 10:59

asked Mar 8 at 16:08

Vish

51961541

asked Mar 8 at 16:08

Vish

51961541

asked Mar 8 at 16:08

Vish

51961541

How do you do "when we immediately try to read same data back from hive table.we are getting partition not found exception."? How do you know when a table is appended (after a 5-sec trigger is executed)?

– Jacek Laskowski
Mar 13 at 17:47

we are not able to identify when table is appended.As part of workflow ,we are trying to read back data as next step and getting error.We need way to identify that table append is finished

– Vish
Mar 14 at 17:49

@JacekLaskowski ..we have found that if we use processAllAvailable() after writing out.subsequent read works fine..but processAllAvailable() will block execution forever if we are dealing with continuous streams.

– Vish
Mar 15 at 10:58

add a comment |

How do you do "when we immediately try to read same data back from hive table.we are getting partition not found exception."? How do you know when a table is appended (after a 5-sec trigger is executed)?

– Jacek Laskowski
Mar 13 at 17:47

we are not able to identify when table is appended.As part of workflow ,we are trying to read back data as next step and getting error.We need way to identify that table append is finished

– Vish
Mar 14 at 17:49

@JacekLaskowski ..we have found that if we use processAllAvailable() after writing out.subsequent read works fine..but processAllAvailable() will block execution forever if we are dealing with continuous streams.

– Vish
Mar 15 at 10:58

How do you do "when we immediately try to read same data back from hive table.we are getting partition not found exception."? How do you know when a table is appended (after a 5-sec trigger is executed)?

– Jacek Laskowski
Mar 13 at 17:47

we are not able to identify when table is appended.As part of workflow ,we are trying to read back data as next step and getting error.We need way to identify that table append is finished

– Vish
Mar 14 at 17:49

@JacekLaskowski ..we have found that if we use processAllAvailable() after writing out.subsequent read works fine..but processAllAvailable() will block execution forever if we are dealing with continuous streams.

– Vish
Mar 15 at 10:58

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55066919%2fspark-structured-stream-read-to-hdfs-files-fails-if-data-is-read-immediately%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ufdjrw

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович