Repeating a string based on a column value (like multiplication of a string and a number in python)2019 Community Moderator ElectionUsing a column value as a parameter to a spark DataFrame functionHow to concatenate text from multiple rows into a single text string in SQL server?How do I check if a string is a number (float)?How do I parse a string to a float or int in Python?Reverse a string in PythonConverting integer to string in Python?How to get the number of elements in a list in Python?Does Python have a string 'contains' substring method?How to lowercase a string in Python?“Large data” work flows using pandasSelect rows from a DataFrame based on values in a column in pandas

Professor forcing me to attend a conference, I can't afford even with 50% funding

Outlet with 3 sets of wires

Specifying a starting column with colortbl package and xcolor

I reported the illegal activity of my boss to his boss. My boss found out. Now I am being punished. What should I do?

Can I negotiate a patent idea for a raise, under French law?

Is it safe to abruptly remove Arduino power?

Doesn't allowing a user mode program to access kernel space memory and execute the IN and OUT instructions defeat the purpose of having CPU modes?

How can I get players to focus on the story aspect of D&D?

In the late 1940’s to early 1950’s what technology was available that could melt a LOT of ice?

Why couldn't the separatists legally leave the Republic?

Haman going to the second feast dirty

What's the 'present simple' form of the word "нашла́" in 3rd person singular female?

Having the player face themselves after the mid-game

Does a difference of tense count as a difference of meaning in a minimal pair?

Did Amazon pay $0 in taxes last year?

Why does Central Limit Theorem break down in my simulation?

Source permutation

What problems would a superhuman have who's skin is constantly hot?

What are some noteworthy "mic-drop" moments in math?

Is it possible that a question has only two answers?

Help find my computational error for logarithms

What can I do if someone tampers with my SSH public key?

What is Tony Stark injecting into himself in Iron Man 3?

Are all players supposed to be able to see each others' character sheets?

Repeating a string based on a column value (like multiplication of a string and a number in python)

2019 Community Moderator ElectionUsing a column value as a parameter to a spark DataFrame functionHow to concatenate text from multiple rows into a single text string in SQL server?How do I check if a string is a number (float)?How do I parse a string to a float or int in Python?Reverse a string in PythonConverting integer to string in Python?How to get the number of elements in a list in Python?Does Python have a string 'contains' substring method?How to lowercase a string in Python?“Large data” work flows using pandasSelect rows from a DataFrame based on values in a column in pandas

I have the following data frame (called df) with columns item_name and item_level:

 item_name item_level 
----------------------------
 Item1 1
 Item2 2
 Item3 2
 Item4 3

I would like to create a new column which produces indentdation of the items, depending on their level. To do that, I would like to multiply the item_level by the string '---', with the idea that when I do that the string gets concatenated with itself as many times as the value of the integer I am multiplying the string with.

My desired result is something like this:

 item_name item_level new_column
------------------------------------------------
 Item1 1 ---Item1
 Item2 2 ------Item2
 Item3 2 ------Item3
 Item4 3 ---------Item4

In pyspark when I write the following command, the created column contains only null values:

from pyspark.sql import functions as F
df = df.withColumn('new_column',F.concat(F.lit(df.item_level*'---'),df.item_name))

The null values seem to come from the multiplication of the integers with the string. The concat function seems to work properly. For instance, the following works:

df = df.withColumn('new_column',F.concat(df.item_name,df.item_name))

I also tried few other things. If I use a constant number to multiply the string, the resulting string is displayed as wished:

number = 3
df = df.withColumn('new_column', F.lit(number*'---'))

Furthermore, adding the '---' string first in a column (with identical rows '---'), and then multiplying that column with the item_level column gives null values as well:

df = df.withColumn('padding',F.lit('---'))
df = df.withColumn('test',df.padding*df.item_name)

If I use pandas, however, this last piece of code does what I want. But I need to do this in pyspark.

edited Mar 6 at 15:54

pault

16k32552

asked Mar 6 at 14:46

Irena Kuzmanovska

New contributor

add a comment |

I have the following data frame (called df) with columns item_name and item_level:

 item_name item_level 
----------------------------
 Item1 1
 Item2 2
 Item3 2
 Item4 3

My desired result is something like this:

 item_name item_level new_column
------------------------------------------------
 Item1 1 ---Item1
 Item2 2 ------Item2
 Item3 2 ------Item3
 Item4 3 ---------Item4

In pyspark when I write the following command, the created column contains only null values:

from pyspark.sql import functions as F
df = df.withColumn('new_column',F.concat(F.lit(df.item_level*'---'),df.item_name))

The null values seem to come from the multiplication of the integers with the string. The concat function seems to work properly. For instance, the following works:

df = df.withColumn('new_column',F.concat(df.item_name,df.item_name))

I also tried few other things. If I use a constant number to multiply the string, the resulting string is displayed as wished:

number = 3
df = df.withColumn('new_column', F.lit(number*'---'))

Furthermore, adding the '---' string first in a column (with identical rows '---'), and then multiplying that column with the item_level column gives null values as well:

df = df.withColumn('padding',F.lit('---'))
df = df.withColumn('test',df.padding*df.item_name)

If I use pandas, however, this last piece of code does what I want. But I need to do this in pyspark.

edited Mar 6 at 15:54

pault

16k32552

asked Mar 6 at 14:46

Irena Kuzmanovska

New contributor

add a comment |

I have the following data frame (called df) with columns item_name and item_level:

 item_name item_level 
----------------------------
 Item1 1
 Item2 2
 Item3 2
 Item4 3

My desired result is something like this:

 item_name item_level new_column
------------------------------------------------
 Item1 1 ---Item1
 Item2 2 ------Item2
 Item3 2 ------Item3
 Item4 3 ---------Item4

In pyspark when I write the following command, the created column contains only null values:

from pyspark.sql import functions as F
df = df.withColumn('new_column',F.concat(F.lit(df.item_level*'---'),df.item_name))

The null values seem to come from the multiplication of the integers with the string. The concat function seems to work properly. For instance, the following works:

df = df.withColumn('new_column',F.concat(df.item_name,df.item_name))

I also tried few other things. If I use a constant number to multiply the string, the resulting string is displayed as wished:

number = 3
df = df.withColumn('new_column', F.lit(number*'---'))

Furthermore, adding the '---' string first in a column (with identical rows '---'), and then multiplying that column with the item_level column gives null values as well:

df = df.withColumn('padding',F.lit('---'))
df = df.withColumn('test',df.padding*df.item_name)

If I use pandas, however, this last piece of code does what I want. But I need to do this in pyspark.

edited Mar 6 at 15:54

pault

16k32552

asked Mar 6 at 14:46

Irena Kuzmanovska

New contributor

I have the following data frame (called df) with columns item_name and item_level:

 item_name item_level 
----------------------------
 Item1 1
 Item2 2
 Item3 2
 Item4 3

My desired result is something like this:

 item_name item_level new_column
------------------------------------------------
 Item1 1 ---Item1
 Item2 2 ------Item2
 Item3 2 ------Item3
 Item4 3 ---------Item4

In pyspark when I write the following command, the created column contains only null values:

from pyspark.sql import functions as F
df = df.withColumn('new_column',F.concat(F.lit(df.item_level*'---'),df.item_name))

The null values seem to come from the multiplication of the integers with the string. The concat function seems to work properly. For instance, the following works:

df = df.withColumn('new_column',F.concat(df.item_name,df.item_name))

I also tried few other things. If I use a constant number to multiply the string, the resulting string is displayed as wished:

number = 3
df = df.withColumn('new_column', F.lit(number*'---'))

Furthermore, adding the '---' string first in a column (with identical rows '---'), and then multiplying that column with the item_level column gives null values as well:

df = df.withColumn('padding',F.lit('---'))
df = df.withColumn('test',df.padding*df.item_name)

If I use pandas, however, this last piece of code does what I want. But I need to do this in pyspark.

python apache-spark pyspark apache-spark-sql string-concatenation

edited Mar 6 at 15:54

pault

16k32552

asked Mar 6 at 14:46

Irena Kuzmanovska

New contributor

edited Mar 6 at 15:54

pault

16k32552

asked Mar 6 at 14:46

Irena Kuzmanovska

New contributor

edited Mar 6 at 15:54

pault

16k32552

edited Mar 6 at 15:54

pault

16k32552

edited Mar 6 at 15:54

pault

16k32552

asked Mar 6 at 14:46

Irena Kuzmanovska

New contributor

asked Mar 6 at 14:46

Irena Kuzmanovska

asked Mar 6 at 14:46

Irena Kuzmanovska

New contributor

Irena Kuzmanovska is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

1 Answer
1

active

oldest

votes

There is a function pyspark.sql.functions.repeat that:

Repeats a string column n times, and returns it as a new string column.

Concatenate the result of repeat with the item_name as you were doing in your code. The only wrinkle is that you need to use pyspark.sql.functions.expr in order to pass a column value as an argument to a spark function.

from pyspark.sql.functions import concat, expr

df.withColumn(
 "new_column", 
 concat(expr("repeat('---', item_level)"), "item_name")
).show()
#+---------+----------+--------------+
#|item_name|item_level| new_column|
#+---------+----------+--------------+
#| Item1| 1| ---Item1|
#| Item2| 2| ------Item2|
#| Item3| 2| ------Item3|
#| Item4| 3|---------Item4|
#+---------+----------+--------------+

Note that show() will right justify the output that is displayed, but the underlying data is as you desired.

answered Mar 6 at 15:51

pault

16k32552

Thanks so much! This actually does the job! I was struggling so much to find the right way, and this is perfect!

– Irena Kuzmanovska
Mar 7 at 9:28

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Irena Kuzmanovska is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55025809%2frepeating-a-string-based-on-a-column-value-like-multiplication-of-a-string-and%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

There is a function pyspark.sql.functions.repeat that:

Repeats a string column n times, and returns it as a new string column.

from pyspark.sql.functions import concat, expr

df.withColumn(
 "new_column", 
 concat(expr("repeat('---', item_level)"), "item_name")
).show()
#+---------+----------+--------------+
#|item_name|item_level| new_column|
#+---------+----------+--------------+
#| Item1| 1| ---Item1|
#| Item2| 2| ------Item2|
#| Item3| 2| ------Item3|
#| Item4| 3|---------Item4|
#+---------+----------+--------------+

Note that show() will right justify the output that is displayed, but the underlying data is as you desired.

answered Mar 6 at 15:51

pault

16k32552

Thanks so much! This actually does the job! I was struggling so much to find the right way, and this is perfect!

– Irena Kuzmanovska
Mar 7 at 9:28

add a comment |

There is a function pyspark.sql.functions.repeat that:

Repeats a string column n times, and returns it as a new string column.

from pyspark.sql.functions import concat, expr

df.withColumn(
 "new_column", 
 concat(expr("repeat('---', item_level)"), "item_name")
).show()
#+---------+----------+--------------+
#|item_name|item_level| new_column|
#+---------+----------+--------------+
#| Item1| 1| ---Item1|
#| Item2| 2| ------Item2|
#| Item3| 2| ------Item3|
#| Item4| 3|---------Item4|
#+---------+----------+--------------+

Note that show() will right justify the output that is displayed, but the underlying data is as you desired.

answered Mar 6 at 15:51

pault

16k32552

Thanks so much! This actually does the job! I was struggling so much to find the right way, and this is perfect!

– Irena Kuzmanovska
Mar 7 at 9:28

add a comment |

There is a function pyspark.sql.functions.repeat that:

Repeats a string column n times, and returns it as a new string column.

from pyspark.sql.functions import concat, expr

df.withColumn(
 "new_column", 
 concat(expr("repeat('---', item_level)"), "item_name")
).show()
#+---------+----------+--------------+
#|item_name|item_level| new_column|
#+---------+----------+--------------+
#| Item1| 1| ---Item1|
#| Item2| 2| ------Item2|
#| Item3| 2| ------Item3|
#| Item4| 3|---------Item4|
#+---------+----------+--------------+

Note that show() will right justify the output that is displayed, but the underlying data is as you desired.

answered Mar 6 at 15:51

pault

16k32552

There is a function pyspark.sql.functions.repeat that:

Repeats a string column n times, and returns it as a new string column.

from pyspark.sql.functions import concat, expr

df.withColumn(
 "new_column", 
 concat(expr("repeat('---', item_level)"), "item_name")
).show()
#+---------+----------+--------------+
#|item_name|item_level| new_column|
#+---------+----------+--------------+
#| Item1| 1| ---Item1|
#| Item2| 2| ------Item2|
#| Item3| 2| ------Item3|
#| Item4| 3|---------Item4|
#+---------+----------+--------------+

Note that show() will right justify the output that is displayed, but the underlying data is as you desired.

answered Mar 6 at 15:51

pault

16k32552

answered Mar 6 at 15:51

pault

16k32552

answered Mar 6 at 15:51

pault

16k32552

answered Mar 6 at 15:51

pault

16k32552

Thanks so much! This actually does the job! I was struggling so much to find the right way, and this is perfect!

– Irena Kuzmanovska
Mar 7 at 9:28

add a comment |

Thanks so much! This actually does the job! I was struggling so much to find the right way, and this is perfect!

– Irena Kuzmanovska
Mar 7 at 9:28

Thanks so much! This actually does the job! I was struggling so much to find the right way, and this is perfect!

– Irena Kuzmanovska
Mar 7 at 9:28

add a comment |

Irena Kuzmanovska is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Irena Kuzmanovska is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ufdjrw

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

Гладіатор

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

Гладіатор

1 Answer
1

1 Answer
1

1 Answer
1