spark structured stream read to hdfs files fails if data is read immediately Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) The Ask Question Wizard is Live! Data science time! April 2019 and salary with experience Should we burninate the [wrap] tag?How to read from Hive HDFS in spark 1.6?spark structured streaming: query incoming data via HiveParquet data and partition issue in Spark Structured streamingHow to create an EXTERNAL Spark table from data in HDFSSpark structured streaming how to control hdfs read number of partitionsSpark Structured Streaming Writestream to Hive ORC Partioned External TableAccessing Hive Tables from Spark SQL when Data is Stored in Object StorageHow to insert spark structured streaming DataFrame to Hive external table/location?spark structured streaming producing .c000.csv filesparquet fields showing NULL when reading through HIVE, BUT showing values when reading through spark
Is 1 ppb equal to 1 μg/kg?
How does a Death Domain cleric's Touch of Death feature work with Touch-range spells delivered by familiars?
How can players work together to take actions that are otherwise impossible?
Center align columns in table ignoring minus signs?
Why aren't air breathing engines used as small first stages
Is there a service that would inform me whenever a new direct route is scheduled from a given airport?
Is the Standard Deduction better than Itemized when both are the same amount?
What are the pros and cons of Aerospike nosecones?
Proof involving the spectral radius and the Jordan canonical form
When to stop saving and start investing?
Right-skewed distribution with mean equals to mode?
Is there a documented rationale why the House Ways and Means chairman can demand tax info?
How discoverable are IPv6 addresses and AAAA names by potential attackers?
Is there a concise way to say "all of the X, one of each"?
How widely used is the term Treppenwitz? Is it something that most Germans know?
How much radiation do nuclear physics experiments expose researchers to nowadays?
What's the purpose of writing one's academic bio in 3rd person?
Did Kevin spill real chili?
How to bypass password on Windows XP account?
Diagram with tikz
Why did the IBM 650 use bi-quinary?
What makes black pepper strong or mild?
Disable hyphenation for an entire paragraph
Why does Python start at index -1 when indexing a list from the end?
spark structured stream read to hdfs files fails if data is read immediately
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
The Ask Question Wizard is Live!
Data science time! April 2019 and salary with experience
Should we burninate the [wrap] tag?How to read from Hive HDFS in spark 1.6?spark structured streaming: query incoming data via HiveParquet data and partition issue in Spark Structured streamingHow to create an EXTERNAL Spark table from data in HDFSSpark structured streaming how to control hdfs read number of partitionsSpark Structured Streaming Writestream to Hive ORC Partioned External TableAccessing Hive Tables from Spark SQL when Data is Stored in Object StorageHow to insert spark structured streaming DataFrame to Hive external table/location?spark structured streaming producing .c000.csv filesparquet fields showing NULL when reading through HIVE, BUT showing values when reading through spark
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I'd like to load a Hive table (target_table
) as a DataFrame after writing a new batch out to HDFS (target_table_dir
) using Spark Structured Streaming as follows:
df.writeStream
.trigger(processingTime='5 seconds')
.foreachBatch(lambda df, partition_id:
df.write
.option("path", target_table_dir)
.format("parquet")
.mode("append")
.saveAsTable(target_table))
.start()
When we immediately read same data back from the Hive table we get a "partition not found exception". If we read with some delay, we have data correct.
It seems that Spark is still writing data to HDFS while execution has stopped and Hive Metastore is updated but data is still being written out to HDFS.
How to know when the writing of data to the Hive table (into the HDFS) is complete?
Note:
we have found that if we use processAllAvailable() after writing out,subsequent read works fine.but processAllAvailable() will block execution forever if we are dealing with continuous streams
apache-spark hive hdfs spark-structured-streaming
add a comment |
I'd like to load a Hive table (target_table
) as a DataFrame after writing a new batch out to HDFS (target_table_dir
) using Spark Structured Streaming as follows:
df.writeStream
.trigger(processingTime='5 seconds')
.foreachBatch(lambda df, partition_id:
df.write
.option("path", target_table_dir)
.format("parquet")
.mode("append")
.saveAsTable(target_table))
.start()
When we immediately read same data back from the Hive table we get a "partition not found exception". If we read with some delay, we have data correct.
It seems that Spark is still writing data to HDFS while execution has stopped and Hive Metastore is updated but data is still being written out to HDFS.
How to know when the writing of data to the Hive table (into the HDFS) is complete?
Note:
we have found that if we use processAllAvailable() after writing out,subsequent read works fine.but processAllAvailable() will block execution forever if we are dealing with continuous streams
apache-spark hive hdfs spark-structured-streaming
How do you do "when we immediately try to read same data back from hive table.we are getting partition not found exception."? How do you know when a table is appended (after a 5-sec trigger is executed)?
– Jacek Laskowski
Mar 13 at 17:47
we are not able to identify when table is appended.As part of workflow ,we are trying to read back data as next step and getting error.We need way to identify that table append is finished
– Vish
Mar 14 at 17:49
@JacekLaskowski ..we have found that if we use processAllAvailable() after writing out.subsequent read works fine..but processAllAvailable() will block execution forever if we are dealing with continuous streams.
– Vish
Mar 15 at 10:58
add a comment |
I'd like to load a Hive table (target_table
) as a DataFrame after writing a new batch out to HDFS (target_table_dir
) using Spark Structured Streaming as follows:
df.writeStream
.trigger(processingTime='5 seconds')
.foreachBatch(lambda df, partition_id:
df.write
.option("path", target_table_dir)
.format("parquet")
.mode("append")
.saveAsTable(target_table))
.start()
When we immediately read same data back from the Hive table we get a "partition not found exception". If we read with some delay, we have data correct.
It seems that Spark is still writing data to HDFS while execution has stopped and Hive Metastore is updated but data is still being written out to HDFS.
How to know when the writing of data to the Hive table (into the HDFS) is complete?
Note:
we have found that if we use processAllAvailable() after writing out,subsequent read works fine.but processAllAvailable() will block execution forever if we are dealing with continuous streams
apache-spark hive hdfs spark-structured-streaming
I'd like to load a Hive table (target_table
) as a DataFrame after writing a new batch out to HDFS (target_table_dir
) using Spark Structured Streaming as follows:
df.writeStream
.trigger(processingTime='5 seconds')
.foreachBatch(lambda df, partition_id:
df.write
.option("path", target_table_dir)
.format("parquet")
.mode("append")
.saveAsTable(target_table))
.start()
When we immediately read same data back from the Hive table we get a "partition not found exception". If we read with some delay, we have data correct.
It seems that Spark is still writing data to HDFS while execution has stopped and Hive Metastore is updated but data is still being written out to HDFS.
How to know when the writing of data to the Hive table (into the HDFS) is complete?
Note:
we have found that if we use processAllAvailable() after writing out,subsequent read works fine.but processAllAvailable() will block execution forever if we are dealing with continuous streams
apache-spark hive hdfs spark-structured-streaming
apache-spark hive hdfs spark-structured-streaming
edited Mar 15 at 10:59
Vish
asked Mar 8 at 16:08
VishVish
51961541
51961541
How do you do "when we immediately try to read same data back from hive table.we are getting partition not found exception."? How do you know when a table is appended (after a 5-sec trigger is executed)?
– Jacek Laskowski
Mar 13 at 17:47
we are not able to identify when table is appended.As part of workflow ,we are trying to read back data as next step and getting error.We need way to identify that table append is finished
– Vish
Mar 14 at 17:49
@JacekLaskowski ..we have found that if we use processAllAvailable() after writing out.subsequent read works fine..but processAllAvailable() will block execution forever if we are dealing with continuous streams.
– Vish
Mar 15 at 10:58
add a comment |
How do you do "when we immediately try to read same data back from hive table.we are getting partition not found exception."? How do you know when a table is appended (after a 5-sec trigger is executed)?
– Jacek Laskowski
Mar 13 at 17:47
we are not able to identify when table is appended.As part of workflow ,we are trying to read back data as next step and getting error.We need way to identify that table append is finished
– Vish
Mar 14 at 17:49
@JacekLaskowski ..we have found that if we use processAllAvailable() after writing out.subsequent read works fine..but processAllAvailable() will block execution forever if we are dealing with continuous streams.
– Vish
Mar 15 at 10:58
How do you do "when we immediately try to read same data back from hive table.we are getting partition not found exception."? How do you know when a table is appended (after a 5-sec trigger is executed)?
– Jacek Laskowski
Mar 13 at 17:47
How do you do "when we immediately try to read same data back from hive table.we are getting partition not found exception."? How do you know when a table is appended (after a 5-sec trigger is executed)?
– Jacek Laskowski
Mar 13 at 17:47
we are not able to identify when table is appended.As part of workflow ,we are trying to read back data as next step and getting error.We need way to identify that table append is finished
– Vish
Mar 14 at 17:49
we are not able to identify when table is appended.As part of workflow ,we are trying to read back data as next step and getting error.We need way to identify that table append is finished
– Vish
Mar 14 at 17:49
@JacekLaskowski ..we have found that if we use processAllAvailable() after writing out.subsequent read works fine..but processAllAvailable() will block execution forever if we are dealing with continuous streams.
– Vish
Mar 15 at 10:58
@JacekLaskowski ..we have found that if we use processAllAvailable() after writing out.subsequent read works fine..but processAllAvailable() will block execution forever if we are dealing with continuous streams.
– Vish
Mar 15 at 10:58
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55066919%2fspark-structured-stream-read-to-hdfs-files-fails-if-data-is-read-immediately%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55066919%2fspark-structured-stream-read-to-hdfs-files-fails-if-data-is-read-immediately%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
How do you do "when we immediately try to read same data back from hive table.we are getting partition not found exception."? How do you know when a table is appended (after a 5-sec trigger is executed)?
– Jacek Laskowski
Mar 13 at 17:47
we are not able to identify when table is appended.As part of workflow ,we are trying to read back data as next step and getting error.We need way to identify that table append is finished
– Vish
Mar 14 at 17:49
@JacekLaskowski ..we have found that if we use processAllAvailable() after writing out.subsequent read works fine..but processAllAvailable() will block execution forever if we are dealing with continuous streams.
– Vish
Mar 15 at 10:58