How to combine the two rows of a dataset into a single row in spark using javaHow do I efficiently iterate over each entry in a Java Map?How can I concatenate two arrays in Java?How do I call one constructor from another in Java?How do I read / convert an InputStream into a String in Java?How do I generate random integers within a specific range in Java?How to get an enum value from a string value in Java?How do I determine whether an array contains a particular value in Java?How do I declare and initialize an array in Java?How to split a string in JavaHow do I convert a String to an int in Java?

Today is the Center

Can I make popcorn with any corn?

Why doesn't H₄O²⁺ exist?

Can a Warlock become Neutral Good?

Is it tax fraud for an individual to declare non-taxable revenue as taxable income? (US tax laws)

What are these boxed doors outside store fronts in New York?

Is a tag line useful on a cover?

What's the point of deactivating Num Lock on login screens?

Dragon forelimb placement

Why was the small council so happy for Tyrion to become the Master of Coin?

Maximum likelihood parameters deviate from posterior distributions

Modeling an IPv4 Address

What's the output of a record cartridge playing an out-of-speed record

Is it possible to do 50 km distance without any previous training?

What do you call a Matrix-like slowdown and camera movement effect?

What is the word for reserving something for yourself before others do?

Mathematical cryptic clues

Show that if two triangles built on parallel lines, with equal bases have the same perimeter only if they are congruent.

Accidentally leaked the solution to an assignment, what to do now? (I'm the prof)

Python: next in for loop

How is the claim "I am in New York only if I am in America" the same as "If I am in New York, then I am in America?

How can I make my BBEG immortal short of making them a Lich or Vampire?

Prove that NP is closed under karp reduction?

Why don't electron-positron collisions release infinite energy?

How to combine the two rows of a dataset into a single row in spark using java

How do I efficiently iterate over each entry in a Java Map?How can I concatenate two arrays in Java?How do I call one constructor from another in Java?How do I read / convert an InputStream into a String in Java?How do I generate random integers within a specific range in Java?How to get an enum value from a string value in Java?How do I determine whether an array contains a particular value in Java?How do I declare and initialize an array in Java?How to split a string in JavaHow do I convert a String to an int in Java?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

-1

I am reading the transactions from a kafka topic in json format. then
i applied some transformations to get the aggregations based on the
txn_status . Below is the schema.

root |-- window: struct (nullable = true) | |-- start: timestamp
(nullable = true) | |-- end: timestamp (nullable = true) |--
txn_status: string (nullable = true) |-- count: long (nullable =
false)

My batch output is like below after applying grouping for the given
window. [![enter image description here][1]][1]

but i want the output like below json format.
 “start_end_time”: “28/12/2018 11:32:00.000”,
 “count_Total” : 6
 “count_RCVD” : 5,
 “count_FAILED”: 1
 


> how to combine two rows in a spark dataset.
> 
> 
> [1]: https://i.stack.imgur.com/sCJuX.jpg

edited Mar 8 at 7:19

asked Mar 8 at 4:07

Swetha

add a comment |

-1

I am reading the transactions from a kafka topic in json format. then
i applied some transformations to get the aggregations based on the
txn_status . Below is the schema.

root |-- window: struct (nullable = true) | |-- start: timestamp
(nullable = true) | |-- end: timestamp (nullable = true) |--
txn_status: string (nullable = true) |-- count: long (nullable =
false)

My batch output is like below after applying grouping for the given
window. [![enter image description here][1]][1]

but i want the output like below json format.
 “start_end_time”: “28/12/2018 11:32:00.000”,
 “count_Total” : 6
 “count_RCVD” : 5,
 “count_FAILED”: 1
 


> how to combine two rows in a spark dataset.
> 
> 
> [1]: https://i.stack.imgur.com/sCJuX.jpg

edited Mar 8 at 7:19

asked Mar 8 at 4:07

Swetha

add a comment |

-1

I am reading the transactions from a kafka topic in json format. then
i applied some transformations to get the aggregations based on the
txn_status . Below is the schema.

root |-- window: struct (nullable = true) | |-- start: timestamp
(nullable = true) | |-- end: timestamp (nullable = true) |--
txn_status: string (nullable = true) |-- count: long (nullable =
false)

My batch output is like below after applying grouping for the given
window. [![enter image description here][1]][1]

but i want the output like below json format.
 “start_end_time”: “28/12/2018 11:32:00.000”,
 “count_Total” : 6
 “count_RCVD” : 5,
 “count_FAILED”: 1
 


> how to combine two rows in a spark dataset.
> 
> 
> [1]: https://i.stack.imgur.com/sCJuX.jpg

edited Mar 8 at 7:19

asked Mar 8 at 4:07

Swetha

I am reading the transactions from a kafka topic in json format. then
i applied some transformations to get the aggregations based on the
txn_status . Below is the schema.

root |-- window: struct (nullable = true) | |-- start: timestamp
(nullable = true) | |-- end: timestamp (nullable = true) |--
txn_status: string (nullable = true) |-- count: long (nullable =
false)

My batch output is like below after applying grouping for the given
window. [![enter image description here][1]][1]

but i want the output like below json format.
 “start_end_time”: “28/12/2018 11:32:00.000”,
 “count_Total” : 6
 “count_RCVD” : 5,
 “count_FAILED”: 1
 


> how to combine two rows in a spark dataset.
> 
> 
> [1]: https://i.stack.imgur.com/sCJuX.jpg

java apache-spark dataset

edited Mar 8 at 7:19

asked Mar 8 at 4:07

Swetha

edited Mar 8 at 7:19

asked Mar 8 at 4:07

Swetha

edited Mar 8 at 7:19

asked Mar 8 at 4:07

Swetha

asked Mar 8 at 4:07

Swetha

asked Mar 8 at 4:07

Swetha

add a comment |

1 Answer
1

active

oldest

votes

As per the image you have shown, I have created a data frame or a temp table and provided the solution for your question.

Scala Code:

case class txn_rec(txn_status: String, count: Int, start_end_time: String)

var txDf=sc.parallelize(Array(new txn_rec("FAIL",9,"2019-03-08 016:40:00, 2019-03-08 016:57:00"), 
 new txn_rec("RCVD",161,"2019-03-08 016:40:00, 2019-03-08 016:57:00"))).toDF

txDf.createOrReplaceTempView("temp")

var resDF=spark.sql("select start_end_time, (select sum(count) from temp) as total_count , (select count from temp where txn_status='RCVD') as rcvd_count,(select count from temp where txn_status='FAIL') as failed_count from temp group by start_end_time")

resDF.show

resDF.toJSON.collectAsList.toString

You can see the output as shown in the screen shot.

Output-1

Output-2

answered Mar 8 at 5:27

Sasi

407

Thanks you. I am new to spark. I want to use Dataset to impliment this in java, i didn't find parallelize in dataframe functions. Can we get it done in java.

– Swetha
Mar 8 at 6:19

Can you tell me more about how input batch data is represented? How you want to convert into output.

– Sasi
Mar 8 at 6:49

i have edited the post , can you check now.

– Swetha
Mar 8 at 7:20

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55056587%2fhow-to-combine-the-two-rows-of-a-dataset-into-a-single-row-in-spark-using-java%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

As per the image you have shown, I have created a data frame or a temp table and provided the solution for your question.

Scala Code:

case class txn_rec(txn_status: String, count: Int, start_end_time: String)

var txDf=sc.parallelize(Array(new txn_rec("FAIL",9,"2019-03-08 016:40:00, 2019-03-08 016:57:00"), 
 new txn_rec("RCVD",161,"2019-03-08 016:40:00, 2019-03-08 016:57:00"))).toDF

txDf.createOrReplaceTempView("temp")

var resDF=spark.sql("select start_end_time, (select sum(count) from temp) as total_count , (select count from temp where txn_status='RCVD') as rcvd_count,(select count from temp where txn_status='FAIL') as failed_count from temp group by start_end_time")

resDF.show

resDF.toJSON.collectAsList.toString

You can see the output as shown in the screen shot.

Output-1

Output-2

answered Mar 8 at 5:27

Sasi

407

Thanks you. I am new to spark. I want to use Dataset to impliment this in java, i didn't find parallelize in dataframe functions. Can we get it done in java.

– Swetha
Mar 8 at 6:19

Can you tell me more about how input batch data is represented? How you want to convert into output.

– Sasi
Mar 8 at 6:49

i have edited the post , can you check now.

– Swetha
Mar 8 at 7:20

add a comment |

As per the image you have shown, I have created a data frame or a temp table and provided the solution for your question.

Scala Code:

case class txn_rec(txn_status: String, count: Int, start_end_time: String)

var txDf=sc.parallelize(Array(new txn_rec("FAIL",9,"2019-03-08 016:40:00, 2019-03-08 016:57:00"), 
 new txn_rec("RCVD",161,"2019-03-08 016:40:00, 2019-03-08 016:57:00"))).toDF

txDf.createOrReplaceTempView("temp")

var resDF=spark.sql("select start_end_time, (select sum(count) from temp) as total_count , (select count from temp where txn_status='RCVD') as rcvd_count,(select count from temp where txn_status='FAIL') as failed_count from temp group by start_end_time")

resDF.show

resDF.toJSON.collectAsList.toString

You can see the output as shown in the screen shot.

Output-1

Output-2

answered Mar 8 at 5:27

Sasi

407

Thanks you. I am new to spark. I want to use Dataset to impliment this in java, i didn't find parallelize in dataframe functions. Can we get it done in java.

– Swetha
Mar 8 at 6:19

Can you tell me more about how input batch data is represented? How you want to convert into output.

– Sasi
Mar 8 at 6:49

i have edited the post , can you check now.

– Swetha
Mar 8 at 7:20

add a comment |

As per the image you have shown, I have created a data frame or a temp table and provided the solution for your question.

Scala Code:

case class txn_rec(txn_status: String, count: Int, start_end_time: String)

var txDf=sc.parallelize(Array(new txn_rec("FAIL",9,"2019-03-08 016:40:00, 2019-03-08 016:57:00"), 
 new txn_rec("RCVD",161,"2019-03-08 016:40:00, 2019-03-08 016:57:00"))).toDF

txDf.createOrReplaceTempView("temp")

var resDF=spark.sql("select start_end_time, (select sum(count) from temp) as total_count , (select count from temp where txn_status='RCVD') as rcvd_count,(select count from temp where txn_status='FAIL') as failed_count from temp group by start_end_time")

resDF.show

resDF.toJSON.collectAsList.toString

You can see the output as shown in the screen shot.

Output-1

Output-2

answered Mar 8 at 5:27

Sasi

407

As per the image you have shown, I have created a data frame or a temp table and provided the solution for your question.

Scala Code:

case class txn_rec(txn_status: String, count: Int, start_end_time: String)

var txDf=sc.parallelize(Array(new txn_rec("FAIL",9,"2019-03-08 016:40:00, 2019-03-08 016:57:00"), 
 new txn_rec("RCVD",161,"2019-03-08 016:40:00, 2019-03-08 016:57:00"))).toDF

txDf.createOrReplaceTempView("temp")

var resDF=spark.sql("select start_end_time, (select sum(count) from temp) as total_count , (select count from temp where txn_status='RCVD') as rcvd_count,(select count from temp where txn_status='FAIL') as failed_count from temp group by start_end_time")

resDF.show

resDF.toJSON.collectAsList.toString

You can see the output as shown in the screen shot.

Output-1

Output-2

answered Mar 8 at 5:27

Sasi

407

answered Mar 8 at 5:27

Sasi

407

answered Mar 8 at 5:27

Sasi

407

answered Mar 8 at 5:27

Sasi

407

Thanks you. I am new to spark. I want to use Dataset to impliment this in java, i didn't find parallelize in dataframe functions. Can we get it done in java.

– Swetha
Mar 8 at 6:19

Can you tell me more about how input batch data is represented? How you want to convert into output.

– Sasi
Mar 8 at 6:49

i have edited the post , can you check now.

– Swetha
Mar 8 at 7:20

add a comment |

Thanks you. I am new to spark. I want to use Dataset to impliment this in java, i didn't find parallelize in dataframe functions. Can we get it done in java.

– Swetha
Mar 8 at 6:19

Can you tell me more about how input batch data is represented? How you want to convert into output.

– Sasi
Mar 8 at 6:49

i have edited the post , can you check now.

– Swetha
Mar 8 at 7:20

Thanks you. I am new to spark. I want to use Dataset to impliment this in java, i didn't find parallelize in dataframe functions. Can we get it done in java.

– Swetha
Mar 8 at 6:19

Can you tell me more about how input batch data is represented? How you want to convert into output.

– Sasi
Mar 8 at 6:49

i have edited the post , can you check now.

– Swetha
Mar 8 at 7:20

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ufdjrw

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

1 Answer
1

1 Answer
1

1 Answer
1