How to combine the two rows of a dataset into a single row in spark using javaHow do I efficiently iterate over each entry in a Java Map?How can I concatenate two arrays in Java?How do I call one constructor from another in Java?How do I read / convert an InputStream into a String in Java?How do I generate random integers within a specific range in Java?How to get an enum value from a string value in Java?How do I determine whether an array contains a particular value in Java?How do I declare and initialize an array in Java?How to split a string in JavaHow do I convert a String to an int in Java?
Today is the Center
Can I make popcorn with any corn?
Why doesn't H₄O²⁺ exist?
Can a Warlock become Neutral Good?
Is it tax fraud for an individual to declare non-taxable revenue as taxable income? (US tax laws)
What are these boxed doors outside store fronts in New York?
Is a tag line useful on a cover?
What's the point of deactivating Num Lock on login screens?
Dragon forelimb placement
Why was the small council so happy for Tyrion to become the Master of Coin?
Maximum likelihood parameters deviate from posterior distributions
Modeling an IPv4 Address
What's the output of a record cartridge playing an out-of-speed record
Is it possible to do 50 km distance without any previous training?
What do you call a Matrix-like slowdown and camera movement effect?
What is the word for reserving something for yourself before others do?
Mathematical cryptic clues
Show that if two triangles built on parallel lines, with equal bases have the same perimeter only if they are congruent.
Accidentally leaked the solution to an assignment, what to do now? (I'm the prof)
Python: next in for loop
How is the claim "I am in New York only if I am in America" the same as "If I am in New York, then I am in America?
How can I make my BBEG immortal short of making them a Lich or Vampire?
Prove that NP is closed under karp reduction?
Why don't electron-positron collisions release infinite energy?
How to combine the two rows of a dataset into a single row in spark using java
How do I efficiently iterate over each entry in a Java Map?How can I concatenate two arrays in Java?How do I call one constructor from another in Java?How do I read / convert an InputStream into a String in Java?How do I generate random integers within a specific range in Java?How to get an enum value from a string value in Java?How do I determine whether an array contains a particular value in Java?How do I declare and initialize an array in Java?How to split a string in JavaHow do I convert a String to an int in Java?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I am reading the transactions from a kafka topic in json format. then
i applied some transformations to get the aggregations based on the
txn_status . Below is the schema.
root |-- window: struct (nullable = true) | |-- start: timestamp
(nullable = true) | |-- end: timestamp (nullable = true) |--
txn_status: string (nullable = true) |-- count: long (nullable =
false)
My batch output is like below after applying grouping for the given
window. [![enter image description here][1]][1]
but i want the output like below json format.
“start_end_time”: “28/12/2018 11:32:00.000”,
“count_Total” : 6
“count_RCVD” : 5,
“count_FAILED”: 1
> how to combine two rows in a spark dataset.
>
>
> [1]: https://i.stack.imgur.com/sCJuX.jpg
java apache-spark dataset
add a comment |
I am reading the transactions from a kafka topic in json format. then
i applied some transformations to get the aggregations based on the
txn_status . Below is the schema.
root |-- window: struct (nullable = true) | |-- start: timestamp
(nullable = true) | |-- end: timestamp (nullable = true) |--
txn_status: string (nullable = true) |-- count: long (nullable =
false)
My batch output is like below after applying grouping for the given
window. [![enter image description here][1]][1]
but i want the output like below json format.
“start_end_time”: “28/12/2018 11:32:00.000”,
“count_Total” : 6
“count_RCVD” : 5,
“count_FAILED”: 1
> how to combine two rows in a spark dataset.
>
>
> [1]: https://i.stack.imgur.com/sCJuX.jpg
java apache-spark dataset
add a comment |
I am reading the transactions from a kafka topic in json format. then
i applied some transformations to get the aggregations based on the
txn_status . Below is the schema.
root |-- window: struct (nullable = true) | |-- start: timestamp
(nullable = true) | |-- end: timestamp (nullable = true) |--
txn_status: string (nullable = true) |-- count: long (nullable =
false)
My batch output is like below after applying grouping for the given
window. [![enter image description here][1]][1]
but i want the output like below json format.
“start_end_time”: “28/12/2018 11:32:00.000”,
“count_Total” : 6
“count_RCVD” : 5,
“count_FAILED”: 1
> how to combine two rows in a spark dataset.
>
>
> [1]: https://i.stack.imgur.com/sCJuX.jpg
java apache-spark dataset
I am reading the transactions from a kafka topic in json format. then
i applied some transformations to get the aggregations based on the
txn_status . Below is the schema.
root |-- window: struct (nullable = true) | |-- start: timestamp
(nullable = true) | |-- end: timestamp (nullable = true) |--
txn_status: string (nullable = true) |-- count: long (nullable =
false)
My batch output is like below after applying grouping for the given
window. [![enter image description here][1]][1]
but i want the output like below json format.
“start_end_time”: “28/12/2018 11:32:00.000”,
“count_Total” : 6
“count_RCVD” : 5,
“count_FAILED”: 1
> how to combine two rows in a spark dataset.
>
>
> [1]: https://i.stack.imgur.com/sCJuX.jpg
java apache-spark dataset
java apache-spark dataset
edited Mar 8 at 7:19
Swetha
asked Mar 8 at 4:07
SwethaSwetha
11
11
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
As per the image you have shown, I have created a data frame or a temp table and provided the solution for your question.
Scala Code:
case class txn_rec(txn_status: String, count: Int, start_end_time: String)
var txDf=sc.parallelize(Array(new txn_rec("FAIL",9,"2019-03-08 016:40:00, 2019-03-08 016:57:00"),
new txn_rec("RCVD",161,"2019-03-08 016:40:00, 2019-03-08 016:57:00"))).toDF
txDf.createOrReplaceTempView("temp")
var resDF=spark.sql("select start_end_time, (select sum(count) from temp) as total_count , (select count from temp where txn_status='RCVD') as rcvd_count,(select count from temp where txn_status='FAIL') as failed_count from temp group by start_end_time")
resDF.show
resDF.toJSON.collectAsList.toString
You can see the output as shown in the screen shot.


Thanks you. I am new to spark. I want to use Dataset to impliment this in java, i didn't find parallelize in dataframe functions. Can we get it done in java.
– Swetha
Mar 8 at 6:19
Can you tell me more about how input batch data is represented? How you want to convert into output.
– Sasi
Mar 8 at 6:49
i have edited the post , can you check now.
– Swetha
Mar 8 at 7:20
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55056587%2fhow-to-combine-the-two-rows-of-a-dataset-into-a-single-row-in-spark-using-java%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
As per the image you have shown, I have created a data frame or a temp table and provided the solution for your question.
Scala Code:
case class txn_rec(txn_status: String, count: Int, start_end_time: String)
var txDf=sc.parallelize(Array(new txn_rec("FAIL",9,"2019-03-08 016:40:00, 2019-03-08 016:57:00"),
new txn_rec("RCVD",161,"2019-03-08 016:40:00, 2019-03-08 016:57:00"))).toDF
txDf.createOrReplaceTempView("temp")
var resDF=spark.sql("select start_end_time, (select sum(count) from temp) as total_count , (select count from temp where txn_status='RCVD') as rcvd_count,(select count from temp where txn_status='FAIL') as failed_count from temp group by start_end_time")
resDF.show
resDF.toJSON.collectAsList.toString
You can see the output as shown in the screen shot.


Thanks you. I am new to spark. I want to use Dataset to impliment this in java, i didn't find parallelize in dataframe functions. Can we get it done in java.
– Swetha
Mar 8 at 6:19
Can you tell me more about how input batch data is represented? How you want to convert into output.
– Sasi
Mar 8 at 6:49
i have edited the post , can you check now.
– Swetha
Mar 8 at 7:20
add a comment |
As per the image you have shown, I have created a data frame or a temp table and provided the solution for your question.
Scala Code:
case class txn_rec(txn_status: String, count: Int, start_end_time: String)
var txDf=sc.parallelize(Array(new txn_rec("FAIL",9,"2019-03-08 016:40:00, 2019-03-08 016:57:00"),
new txn_rec("RCVD",161,"2019-03-08 016:40:00, 2019-03-08 016:57:00"))).toDF
txDf.createOrReplaceTempView("temp")
var resDF=spark.sql("select start_end_time, (select sum(count) from temp) as total_count , (select count from temp where txn_status='RCVD') as rcvd_count,(select count from temp where txn_status='FAIL') as failed_count from temp group by start_end_time")
resDF.show
resDF.toJSON.collectAsList.toString
You can see the output as shown in the screen shot.


Thanks you. I am new to spark. I want to use Dataset to impliment this in java, i didn't find parallelize in dataframe functions. Can we get it done in java.
– Swetha
Mar 8 at 6:19
Can you tell me more about how input batch data is represented? How you want to convert into output.
– Sasi
Mar 8 at 6:49
i have edited the post , can you check now.
– Swetha
Mar 8 at 7:20
add a comment |
As per the image you have shown, I have created a data frame or a temp table and provided the solution for your question.
Scala Code:
case class txn_rec(txn_status: String, count: Int, start_end_time: String)
var txDf=sc.parallelize(Array(new txn_rec("FAIL",9,"2019-03-08 016:40:00, 2019-03-08 016:57:00"),
new txn_rec("RCVD",161,"2019-03-08 016:40:00, 2019-03-08 016:57:00"))).toDF
txDf.createOrReplaceTempView("temp")
var resDF=spark.sql("select start_end_time, (select sum(count) from temp) as total_count , (select count from temp where txn_status='RCVD') as rcvd_count,(select count from temp where txn_status='FAIL') as failed_count from temp group by start_end_time")
resDF.show
resDF.toJSON.collectAsList.toString
You can see the output as shown in the screen shot.


As per the image you have shown, I have created a data frame or a temp table and provided the solution for your question.
Scala Code:
case class txn_rec(txn_status: String, count: Int, start_end_time: String)
var txDf=sc.parallelize(Array(new txn_rec("FAIL",9,"2019-03-08 016:40:00, 2019-03-08 016:57:00"),
new txn_rec("RCVD",161,"2019-03-08 016:40:00, 2019-03-08 016:57:00"))).toDF
txDf.createOrReplaceTempView("temp")
var resDF=spark.sql("select start_end_time, (select sum(count) from temp) as total_count , (select count from temp where txn_status='RCVD') as rcvd_count,(select count from temp where txn_status='FAIL') as failed_count from temp group by start_end_time")
resDF.show
resDF.toJSON.collectAsList.toString
You can see the output as shown in the screen shot.


answered Mar 8 at 5:27
SasiSasi
407
407
Thanks you. I am new to spark. I want to use Dataset to impliment this in java, i didn't find parallelize in dataframe functions. Can we get it done in java.
– Swetha
Mar 8 at 6:19
Can you tell me more about how input batch data is represented? How you want to convert into output.
– Sasi
Mar 8 at 6:49
i have edited the post , can you check now.
– Swetha
Mar 8 at 7:20
add a comment |
Thanks you. I am new to spark. I want to use Dataset to impliment this in java, i didn't find parallelize in dataframe functions. Can we get it done in java.
– Swetha
Mar 8 at 6:19
Can you tell me more about how input batch data is represented? How you want to convert into output.
– Sasi
Mar 8 at 6:49
i have edited the post , can you check now.
– Swetha
Mar 8 at 7:20
Thanks you. I am new to spark. I want to use Dataset to impliment this in java, i didn't find parallelize in dataframe functions. Can we get it done in java.
– Swetha
Mar 8 at 6:19
Thanks you. I am new to spark. I want to use Dataset to impliment this in java, i didn't find parallelize in dataframe functions. Can we get it done in java.
– Swetha
Mar 8 at 6:19
Can you tell me more about how input batch data is represented? How you want to convert into output.
– Sasi
Mar 8 at 6:49
Can you tell me more about how input batch data is represented? How you want to convert into output.
– Sasi
Mar 8 at 6:49
i have edited the post , can you check now.
– Swetha
Mar 8 at 7:20
i have edited the post , can you check now.
– Swetha
Mar 8 at 7:20
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55056587%2fhow-to-combine-the-two-rows-of-a-dataset-into-a-single-row-in-spark-using-java%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown