Remove comma from quoted text in first column of a csv with sed2019 Community Moderator ElectionWhat's the most robust way to efficiently parse CSV using awk?How to concatenate text from multiple rows into a single text string in SQL server?How can I remove the first line of a text file using bash/sed script?Dealing with commas in a CSV fileHow do I remove all .pyc files from a project?Save PL/pgSQL output from PostgreSQL to a CSV fileShell Script - remove first and last quote (") from a variableReplace comma with newline in sed on MacOS?Remove a fixed prefix/suffix from a string in BashRemove comma as thousands separator from quoted numbers in a CSV using sedUsing awk or sed to print column of CSV file enclosed in double quotes

How is the partial sum of a geometric sequence calculated?

How to get the n-th line after a grepped one?

In what cases must I use 了 and in what cases not?

Bash - pair each line of file

Is it insecure to send a password in a `curl` command?

Variable completely messes up echoed string

Geography in 3D perspective

Maths symbols and unicode-math input inside siunitx commands

PTIJ What is the inyan of the Konami code in Uncle Moishy's song?

Generic TVP tradeoffs?

Replace four times with sed

Is this an example of a Neapolitan chord?

Does the attack bonus from a Masterwork weapon stack with the attack bonus from Masterwork ammunition?

What is the relationship between relativity and the Doppler effect?

Wrapping homogeneous Python objects

How can an organ that provides biological immortality be unable to regenerate?

Knife as defense against stray dogs

Hausdorff dimension of the boundary of fibres of Lipschitz maps

How could an airship be repaired midflight?

Is there a hypothetical scenario that would make Earth uninhabitable for humans, but not for (the majority of) other animals?

Naive Monte Carlo, MCMC and their use in Bayesian Theory

Print last inputted byte

Is honey really a supersaturated solution? Does heating to un-crystalize redissolve it or melt it?

Is it true that good novels will automatically sell themselves on Amazon (and so on) and there is no need for one to waste time promoting?

Remove comma from quoted text in first column of a csv with sed

2019 Community Moderator ElectionWhat's the most robust way to efficiently parse CSV using awk?How to concatenate text from multiple rows into a single text string in SQL server?How can I remove the first line of a text file using bash/sed script?Dealing with commas in a CSV fileHow do I remove all .pyc files from a project?Save PL/pgSQL output from PostgreSQL to a CSV fileShell Script - remove first and last quote (") from a variableReplace comma with newline in sed on MacOS?Remove a fixed prefix/suffix from a string in BashRemove comma as thousands separator from quoted numbers in a CSV using sedUsing awk or sed to print column of CSV file enclosed in double quotes

I have big_file.csv containing a bunch of company information. Here's a snippet

CompanyName, CompanyNumber,RegAddress.CareOf,...
"! # 1 AVAILABLE LOCKSMITH LTD","05905727","",...
"!NSPIRED LIMITED","06019953",""...
"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD","07981734",""...

I only need the CompanyName and CompanyNumber fields, so I did the following:

cut -d, -f 1,2 big_file.csv > big_file_names_codes_only.csv

As you can see tho (and I understand why) the third entry in the big_file.csv gets cut after the first comma which is actually part of CompanyName. I know how to remove in sed the first comma (but that would break the whole csv strucutre), so i was wondering if any of you knew how to remove the comma from the first (it's always on position 1) "string, with, commas, or not and non alphanum chars!".

So basically the intermediate output i am looking for is:

CompanyName, CompanyNumber
"! # 1 AVAILABLE LOCKSMITH LTD","05905727"
"!NSPIRED LIMITED","06019953"
"CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD","07981734"

But this last line becomes:

"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD"

Once I get this intermediate output I need to clean the company of all non alpha num characters in the name and leading spaces - which works very well with this:

sed -i 's/[^a-zA-Z0-9 ,]//g; s/^[ t]*//'

In the end my file should be:

CompanyName, CompanyNumber,RegAddress.CareOf,...
AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

edited Mar 6 at 10:13

Toto

66.3k175799

asked Mar 5 at 17:29

Tytire Recubans

476

Why keep RegAddress... in the header ?

– ctac_
Mar 5 at 20:05

No need at all!

– Tytire Recubans
Mar 6 at 9:24

If you have a working answer, put it as an answer. Don't insert it in the question!

– Toto
Mar 6 at 10:14

aaah! Yes sorry sorry!

– Tytire Recubans
Mar 6 at 10:14

add a comment |

I have big_file.csv containing a bunch of company information. Here's a snippet

CompanyName, CompanyNumber,RegAddress.CareOf,...
"! # 1 AVAILABLE LOCKSMITH LTD","05905727","",...
"!NSPIRED LIMITED","06019953",""...
"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD","07981734",""...

I only need the CompanyName and CompanyNumber fields, so I did the following:

cut -d, -f 1,2 big_file.csv > big_file_names_codes_only.csv

So basically the intermediate output i am looking for is:

CompanyName, CompanyNumber
"! # 1 AVAILABLE LOCKSMITH LTD","05905727"
"!NSPIRED LIMITED","06019953"
"CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD","07981734"

But this last line becomes:

"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD"

Once I get this intermediate output I need to clean the company of all non alpha num characters in the name and leading spaces - which works very well with this:

sed -i 's/[^a-zA-Z0-9 ,]//g; s/^[ t]*//'

In the end my file should be:

CompanyName, CompanyNumber,RegAddress.CareOf,...
AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

edited Mar 6 at 10:13

Toto

66.3k175799

asked Mar 5 at 17:29

Tytire Recubans

476

Why keep RegAddress... in the header ?

– ctac_
Mar 5 at 20:05

No need at all!

– Tytire Recubans
Mar 6 at 9:24

If you have a working answer, put it as an answer. Don't insert it in the question!

– Toto
Mar 6 at 10:14

aaah! Yes sorry sorry!

– Tytire Recubans
Mar 6 at 10:14

add a comment |

I have big_file.csv containing a bunch of company information. Here's a snippet

CompanyName, CompanyNumber,RegAddress.CareOf,...
"! # 1 AVAILABLE LOCKSMITH LTD","05905727","",...
"!NSPIRED LIMITED","06019953",""...
"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD","07981734",""...

I only need the CompanyName and CompanyNumber fields, so I did the following:

cut -d, -f 1,2 big_file.csv > big_file_names_codes_only.csv

So basically the intermediate output i am looking for is:

CompanyName, CompanyNumber
"! # 1 AVAILABLE LOCKSMITH LTD","05905727"
"!NSPIRED LIMITED","06019953"
"CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD","07981734"

But this last line becomes:

"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD"

Once I get this intermediate output I need to clean the company of all non alpha num characters in the name and leading spaces - which works very well with this:

sed -i 's/[^a-zA-Z0-9 ,]//g; s/^[ t]*//'

In the end my file should be:

CompanyName, CompanyNumber,RegAddress.CareOf,...
AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

edited Mar 6 at 10:13

Toto

66.3k175799

asked Mar 5 at 17:29

Tytire Recubans

476

I have big_file.csv containing a bunch of company information. Here's a snippet

CompanyName, CompanyNumber,RegAddress.CareOf,...
"! # 1 AVAILABLE LOCKSMITH LTD","05905727","",...
"!NSPIRED LIMITED","06019953",""...
"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD","07981734",""...

I only need the CompanyName and CompanyNumber fields, so I did the following:

cut -d, -f 1,2 big_file.csv > big_file_names_codes_only.csv

So basically the intermediate output i am looking for is:

CompanyName, CompanyNumber
"! # 1 AVAILABLE LOCKSMITH LTD","05905727"
"!NSPIRED LIMITED","06019953"
"CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD","07981734"

But this last line becomes:

"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD"

Once I get this intermediate output I need to clean the company of all non alpha num characters in the name and leading spaces - which works very well with this:

sed -i 's/[^a-zA-Z0-9 ,]//g; s/^[ t]*//'

In the end my file should be:

CompanyName, CompanyNumber,RegAddress.CareOf,...
AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

regex bash csv awk sed

edited Mar 6 at 10:13

Toto

66.3k175799

asked Mar 5 at 17:29

Tytire Recubans

476

edited Mar 6 at 10:13

Toto

66.3k175799

asked Mar 5 at 17:29

Tytire Recubans

476

edited Mar 6 at 10:13

Toto

66.3k175799

edited Mar 6 at 10:13

Toto

66.3k175799

edited Mar 6 at 10:13

Toto

66.3k175799

asked Mar 5 at 17:29

Tytire Recubans

476

asked Mar 5 at 17:29

Tytire Recubans

476

asked Mar 5 at 17:29

Tytire Recubans

476

Why keep RegAddress... in the header ?

– ctac_
Mar 5 at 20:05

No need at all!

– Tytire Recubans
Mar 6 at 9:24

If you have a working answer, put it as an answer. Don't insert it in the question!

– Toto
Mar 6 at 10:14

aaah! Yes sorry sorry!

– Tytire Recubans
Mar 6 at 10:14

add a comment |

Why keep RegAddress... in the header ?

– ctac_
Mar 5 at 20:05

No need at all!

– Tytire Recubans
Mar 6 at 9:24

If you have a working answer, put it as an answer. Don't insert it in the question!

– Toto
Mar 6 at 10:14

aaah! Yes sorry sorry!

– Tytire Recubans
Mar 6 at 10:14

Why keep RegAddress... in the header ?

– ctac_
Mar 5 at 20:05

No need at all!

– Tytire Recubans
Mar 6 at 9:24

If you have a working answer, put it as an answer. Don't insert it in the question!

– Toto
Mar 6 at 10:14

aaah! Yes sorry sorry!

– Tytire Recubans
Mar 6 at 10:14

add a comment |

10 Answers
10

active

oldest

votes

It's always better to work with structured data like CSV files with embedded commas in fields using tools that are actually aware of the format instead of trying to hack something together with things like regular expressions (Same with XML, JSON, etc.). In the long run it's a lot easier and will save you a ton of pain dealing with edge cases and odd data that doesn't exactly match your expectation.

The csvkit set of utilities has a bunch of useful command line tools and is commonly available via OS package managers:

$ csvcut -c CompanyName,CompanyNumber blah.csv 
CompanyName,CompanyNumber
! # 1 AVAILABLE LOCKSMITH LTD,05905727
!NSPIRED LIMITED,06019953
"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD",07981734

You can then continue to use sed to remove the characters you're not interested in.

(Note: I had to get rid of extra spaces in the header line of your sample data for this to work)

Edit: Also, perl version using the handy Text::AutoCSV module, that strips out characters:

$ perl -MText::AutoCSV -e 'Text::AutoCSV->new(out_fields => [ "COMPANYNAME", "COMPANYNUMBER" ],
 read_post_update_hr => sub 
 my $hr = shift;
 $hr->"COMPANYNAME" =~ s/[^[:alnum:]s]+//g;
 $hr->"COMPANYNAME" =~ s/^s+//;
 )->write();' < blah.csv | sed -e 's/"//g'
CompanyName,CompanyNumber
1 AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

edited Mar 5 at 18:30

answered Mar 5 at 17:52

Shawn

4,6522614

Hi Shawn - I come back to you after some time because I am now finding myself to perform the same operation (eliminating extra commas in a csv enclosed in double quotes) not just on the 1st columns (here it was company name) but on all columns. Would you be so kind to help me out again? Thanks :)

– Tytire Recubans
Mar 11 at 20:40

add a comment |

A solution with awk

$ awk -vFPAT='([^,]*)|("[^"]+")' -vOFS=, 'print $1,$2' big_file.csv
CompanyName, CompanyNumber
"! # 1 AVAILABLE LOCKSMITH LTD","05905727"
"!NSPIRED LIMITED","06019953"
"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD","07981734"

My suggestion would be to use programming languages like R, Python, Perl for such tasks

edited Mar 5 at 17:50

answered Mar 5 at 17:40

Sonny

902311

add a comment |

Similar to @Sonny's solution, but using the gsub function from GNU's awk to trim the quotes and commas from the output per your output expectation, and to prioritize fields enclosed in quotes over those that are not:

awk -vFPAT='("[^"]+")|([^,]*)' -vOFS=, ',/,"",$n);print$1,$2' big_file.csv

This outputs:

CompanyName, CompanyNumber
! # 1 AVAILABLE LOCKSMITH LTD,05905727
!NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

answered Mar 5 at 17:57

blhsing

39.2k41742

add a comment |

Using Perl

$ perl -lne ' if($.>1) /^"(.+?)","(.+?)"/ ;$x=$1;$y=$2; $x=~s/[,]//g; print "$x,$y" 
 else print ' big_file.csv
CompanyName, CompanyNumber,RegAddress.CareOf,...
! # 1 AVAILABLE LOCKSMITH LTD,05905727
!NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

$

answered Mar 5 at 18:15

stack0114106

4,4002422

add a comment |

You was almost there.

Since I don't know how many commas there is on the first line but if its only company name and company number this command is probably the shortest you can get if you will use bash:

The Easiest method to get rid of unwanted characters is with xargs after we run xargs -L1 things look better:

xargs -L1

Output:

CompanyName, CompanyNumber,RegAddress.CareOf,...
! # 1 AVAILABLE LOCKSMITH LTD,05905727,,...
!NSPIRED LIMITED,06019953,...
CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD,07981734,...

Now we can add cut -f1,2,3 which you have tried I guess

xargs -L1 | cut -d, -f1,2,3

Output:

CompanyName, CompanyNumber,RegAddress.CareOf
! # 1 AVAILABLE LOCKSMITH LTD,05905727,
!NSPIRED LIMITED,06019953,...
CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD,07981734

Okay now I got into the same problem as in your example, we got the number behind LTD as well since we added nr 3 to cut but the unwanted characters at the end still exist:

Solution, read file with sed and pipe it with xargs -L1!

sed 's/,...$//;s/,$//;s/, / /g' big_file.csv

Let's break it down:

sed 's/,...$//;s/,$//;s/, / /g' big_file.csv|xargs -L1|cut -d, -f1,2

End Result:

 CompanyName CompanyNumber,RegAddress.CareOf
 ! # 1 AVAILABLE LOCKSMITH LTD,05905727
 !NSPIRED LIMITED,06019953
 CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

Edit

Since I forgot a comma before my edit I found a better solution:

sed 's/, / /g' big_file.csv|xargs -L1|cut -d, -f1,2

edited Mar 5 at 21:23

TheRealBilaal

3771514

answered Mar 5 at 18:13

wuseman

1397

Hi! Thanks for all this work! It ends up getting rid of the company number of the third column tho, and the extra comma after COUNSELLING is still there.

– Tytire Recubans
Mar 5 at 18:48

Oh, you right! I missed this comma.

– wuseman
Mar 5 at 19:07

sed 's/, / /g' big_file.csv | xargs -L1 | cut -d, -f1,2 this worked like a charm!

– Tytire Recubans
Mar 6 at 9:22

1

I tried this and it worked really well - but had to switch to Perl 'cause it was taking more than 10 min to parse the file.

– Tytire Recubans
Mar 6 at 10:13

add a comment |

You can try with this sed :

sed -E '
 :A
 s/^("[^,"]*),(.*)/12/
 # label A if CompanyName can have more than 1 comma
 tA
 s/"//g;s/([^,]*,[^,]*).*/1/
' big_file.csv

answered Mar 6 at 10:44

ctac_

1,9451310

add a comment |

awk is your friend

maybe this helps

➜ ~ awk 'BEGIN FS="","" printf "%s, %s n",$1,$2 ' big_file.csv | tr -d '"'
CompanyName, CompanyNumber,RegAddress.CareOf,...,
! # 1 AVAILABLE LOCKSMITH LTD, 05905727
!NSPIRED LIMITED, 06019953
CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD, 07981734

answered Mar 5 at 17:51

Hernan Garcia

563315

This is almost what I was looking for - but the third line still includes that extra comma after counselling!

– Tytire Recubans
Mar 5 at 18:45

add a comment |

Based on the inputs of two of the answers below I tried several approaches:

The following one worked, but with 4m rows and several columns it was extremely slow:
- First get rid of the extra leading space in the second column with:
  sed -i '0,/ CompanyNumber/ s//CompanyNumber/' big_file.csv
- Then combine xargs -L1 with csvcut and sed:
  sed 's/, / /g' big_file.csv | xargs -L1 | csvcut -c CompanyName,CompanyNumber > big_file_cleaned.csv

This worked, but was super slow.

A solution in Perl from one of the kind contributors!
- First clean first line with perl: perl -lne ' if($.>1) /^"(.+?)","(.+?)"/ ;$x=$1;$y=$2; $x=~s/[,]//g; print "$x,$y" else print ' big_file.csv > big_file_clean.csv
- Then filter out only the columns I need with:
  csvcut -c CompanyName,CompanyNumber big_file_clean.csv > big_file_clean_namecodesonly.csv

THANK YOU

answered Mar 6 at 10:19

Tytire Recubans

476

add a comment |

awk 'NR>1gsub(/"/,"")sub(/.4$/,"")gsub(/!1' file

CompanyName, CompanyNumber,RegAddress.CareOf,...
AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHAPY AND TRAINING LTD,07981734

answered Mar 6 at 21:47

Claes Wikner

1,118157

add a comment |

With GNU awk for FPAT:

$ cat tst.awk
BEGIN [^,]*"; OFS="," 
NR == 1 print; next 

 for (i=1; i<=NF; i++) 
 gsub(/[^[:alnum:]]+/," ",$i)
 gsub(/^ 
 print $1, $2


$ awk -f tst.awk file
CompanyName, CompanyNumber,RegAddress.CareOf,...
1 AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

See What's the most robust way to efficiently parse CSV using awk? if you need to work with any more exotic CSVs.

edited Mar 11 at 21:45

answered Mar 5 at 18:50

Ed Morton

112k1245103

Hi @Ed Morton, thanks for the answer! I am working on a the same big_file.csv again - turns out other columns have extra commas in their strings - not just "CompanyName". Is there a way to perform this operation on all columns of the csv (delete all non alphanumeric values in the strings of all columns of the csv)? I just started to learn AWK!

– Tytire Recubans
Mar 11 at 21:26

You're welcome. My script already does what you're asking for in your comment, it doesn't make any assumptions/restrictions about any specific field. Did you try it?

– Ed Morton
Mar 11 at 21:29

Oh thank you! I was trying to parse it mentally bit by bit. I guess if I want to print all cols and not just the first 2 I simply replace print $1, $2 with print right? :)

– Tytire Recubans
Mar 11 at 21:36

1

Right. _____________

– Ed Morton
Mar 11 at 21:38

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55008440%2fremove-comma-from-quoted-text-in-first-column-of-a-csv-with-sed%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

10 Answers
10

active

oldest

votes

10 Answers
10

active

oldest

votes

The csvkit set of utilities has a bunch of useful command line tools and is commonly available via OS package managers:

$ csvcut -c CompanyName,CompanyNumber blah.csv 
CompanyName,CompanyNumber
! # 1 AVAILABLE LOCKSMITH LTD,05905727
!NSPIRED LIMITED,06019953
"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD",07981734

You can then continue to use sed to remove the characters you're not interested in.

(Note: I had to get rid of extra spaces in the header line of your sample data for this to work)

Edit: Also, perl version using the handy Text::AutoCSV module, that strips out characters:

$ perl -MText::AutoCSV -e 'Text::AutoCSV->new(out_fields => [ "COMPANYNAME", "COMPANYNUMBER" ],
 read_post_update_hr => sub 
 my $hr = shift;
 $hr->"COMPANYNAME" =~ s/[^[:alnum:]s]+//g;
 $hr->"COMPANYNAME" =~ s/^s+//;
 )->write();' < blah.csv | sed -e 's/"//g'
CompanyName,CompanyNumber
1 AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

edited Mar 5 at 18:30

answered Mar 5 at 17:52

Shawn

4,6522614

Hi Shawn - I come back to you after some time because I am now finding myself to perform the same operation (eliminating extra commas in a csv enclosed in double quotes) not just on the 1st columns (here it was company name) but on all columns. Would you be so kind to help me out again? Thanks :)

– Tytire Recubans
Mar 11 at 20:40

add a comment |

The csvkit set of utilities has a bunch of useful command line tools and is commonly available via OS package managers:

$ csvcut -c CompanyName,CompanyNumber blah.csv 
CompanyName,CompanyNumber
! # 1 AVAILABLE LOCKSMITH LTD,05905727
!NSPIRED LIMITED,06019953
"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD",07981734

You can then continue to use sed to remove the characters you're not interested in.

(Note: I had to get rid of extra spaces in the header line of your sample data for this to work)

Edit: Also, perl version using the handy Text::AutoCSV module, that strips out characters:

$ perl -MText::AutoCSV -e 'Text::AutoCSV->new(out_fields => [ "COMPANYNAME", "COMPANYNUMBER" ],
 read_post_update_hr => sub 
 my $hr = shift;
 $hr->"COMPANYNAME" =~ s/[^[:alnum:]s]+//g;
 $hr->"COMPANYNAME" =~ s/^s+//;
 )->write();' < blah.csv | sed -e 's/"//g'
CompanyName,CompanyNumber
1 AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

edited Mar 5 at 18:30

answered Mar 5 at 17:52

Shawn

4,6522614

Hi Shawn - I come back to you after some time because I am now finding myself to perform the same operation (eliminating extra commas in a csv enclosed in double quotes) not just on the 1st columns (here it was company name) but on all columns. Would you be so kind to help me out again? Thanks :)

– Tytire Recubans
Mar 11 at 20:40

add a comment |

The csvkit set of utilities has a bunch of useful command line tools and is commonly available via OS package managers:

$ csvcut -c CompanyName,CompanyNumber blah.csv 
CompanyName,CompanyNumber
! # 1 AVAILABLE LOCKSMITH LTD,05905727
!NSPIRED LIMITED,06019953
"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD",07981734

You can then continue to use sed to remove the characters you're not interested in.

(Note: I had to get rid of extra spaces in the header line of your sample data for this to work)

Edit: Also, perl version using the handy Text::AutoCSV module, that strips out characters:

$ perl -MText::AutoCSV -e 'Text::AutoCSV->new(out_fields => [ "COMPANYNAME", "COMPANYNUMBER" ],
 read_post_update_hr => sub 
 my $hr = shift;
 $hr->"COMPANYNAME" =~ s/[^[:alnum:]s]+//g;
 $hr->"COMPANYNAME" =~ s/^s+//;
 )->write();' < blah.csv | sed -e 's/"//g'
CompanyName,CompanyNumber
1 AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

edited Mar 5 at 18:30

answered Mar 5 at 17:52

Shawn

4,6522614

The csvkit set of utilities has a bunch of useful command line tools and is commonly available via OS package managers:

$ csvcut -c CompanyName,CompanyNumber blah.csv 
CompanyName,CompanyNumber
! # 1 AVAILABLE LOCKSMITH LTD,05905727
!NSPIRED LIMITED,06019953
"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD",07981734

You can then continue to use sed to remove the characters you're not interested in.

(Note: I had to get rid of extra spaces in the header line of your sample data for this to work)

Edit: Also, perl version using the handy Text::AutoCSV module, that strips out characters:

$ perl -MText::AutoCSV -e 'Text::AutoCSV->new(out_fields => [ "COMPANYNAME", "COMPANYNUMBER" ],
 read_post_update_hr => sub 
 my $hr = shift;
 $hr->"COMPANYNAME" =~ s/[^[:alnum:]s]+//g;
 $hr->"COMPANYNAME" =~ s/^s+//;
 )->write();' < blah.csv | sed -e 's/"//g'
CompanyName,CompanyNumber
1 AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

edited Mar 5 at 18:30

answered Mar 5 at 17:52

Shawn

4,6522614

edited Mar 5 at 18:30

answered Mar 5 at 17:52

Shawn

4,6522614

answered Mar 5 at 17:52

Shawn

4,6522614

answered Mar 5 at 17:52

Shawn

4,6522614

Hi Shawn - I come back to you after some time because I am now finding myself to perform the same operation (eliminating extra commas in a csv enclosed in double quotes) not just on the 1st columns (here it was company name) but on all columns. Would you be so kind to help me out again? Thanks :)

– Tytire Recubans
Mar 11 at 20:40

add a comment |

Hi Shawn - I come back to you after some time because I am now finding myself to perform the same operation (eliminating extra commas in a csv enclosed in double quotes) not just on the 1st columns (here it was company name) but on all columns. Would you be so kind to help me out again? Thanks :)

– Tytire Recubans
Mar 11 at 20:40

Hi Shawn - I come back to you after some time because I am now finding myself to perform the same operation (eliminating extra commas in a csv enclosed in double quotes) not just on the 1st columns (here it was company name) but on all columns. Would you be so kind to help me out again? Thanks :)

– Tytire Recubans
Mar 11 at 20:40

add a comment |

A solution with awk

$ awk -vFPAT='([^,]*)|("[^"]+")' -vOFS=, 'print $1,$2' big_file.csv
CompanyName, CompanyNumber
"! # 1 AVAILABLE LOCKSMITH LTD","05905727"
"!NSPIRED LIMITED","06019953"
"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD","07981734"

My suggestion would be to use programming languages like R, Python, Perl for such tasks

edited Mar 5 at 17:50

answered Mar 5 at 17:40

Sonny

902311

add a comment |

A solution with awk

$ awk -vFPAT='([^,]*)|("[^"]+")' -vOFS=, 'print $1,$2' big_file.csv
CompanyName, CompanyNumber
"! # 1 AVAILABLE LOCKSMITH LTD","05905727"
"!NSPIRED LIMITED","06019953"
"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD","07981734"

My suggestion would be to use programming languages like R, Python, Perl for such tasks

edited Mar 5 at 17:50

answered Mar 5 at 17:40

Sonny

902311

add a comment |

A solution with awk

$ awk -vFPAT='([^,]*)|("[^"]+")' -vOFS=, 'print $1,$2' big_file.csv
CompanyName, CompanyNumber
"! # 1 AVAILABLE LOCKSMITH LTD","05905727"
"!NSPIRED LIMITED","06019953"
"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD","07981734"

My suggestion would be to use programming languages like R, Python, Perl for such tasks

edited Mar 5 at 17:50

answered Mar 5 at 17:40

Sonny

902311

A solution with awk

$ awk -vFPAT='([^,]*)|("[^"]+")' -vOFS=, 'print $1,$2' big_file.csv
CompanyName, CompanyNumber
"! # 1 AVAILABLE LOCKSMITH LTD","05905727"
"!NSPIRED LIMITED","06019953"
"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD","07981734"

My suggestion would be to use programming languages like R, Python, Perl for such tasks

edited Mar 5 at 17:50

answered Mar 5 at 17:40

Sonny

902311

edited Mar 5 at 17:50

answered Mar 5 at 17:40

Sonny

902311

answered Mar 5 at 17:40

Sonny

902311

answered Mar 5 at 17:40

Sonny

902311

add a comment |

awk -vFPAT='("[^"]+")|([^,]*)' -vOFS=, ',/,"",$n);print$1,$2' big_file.csv

This outputs:

CompanyName, CompanyNumber
! # 1 AVAILABLE LOCKSMITH LTD,05905727
!NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

answered Mar 5 at 17:57

blhsing

39.2k41742

add a comment |

awk -vFPAT='("[^"]+")|([^,]*)' -vOFS=, ',/,"",$n);print$1,$2' big_file.csv

This outputs:

CompanyName, CompanyNumber
! # 1 AVAILABLE LOCKSMITH LTD,05905727
!NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

answered Mar 5 at 17:57

blhsing

39.2k41742

add a comment |

awk -vFPAT='("[^"]+")|([^,]*)' -vOFS=, ',/,"",$n);print$1,$2' big_file.csv

This outputs:

CompanyName, CompanyNumber
! # 1 AVAILABLE LOCKSMITH LTD,05905727
!NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

answered Mar 5 at 17:57

blhsing

39.2k41742

awk -vFPAT='("[^"]+")|([^,]*)' -vOFS=, ',/,"",$n);print$1,$2' big_file.csv

This outputs:

CompanyName, CompanyNumber
! # 1 AVAILABLE LOCKSMITH LTD,05905727
!NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

answered Mar 5 at 17:57

blhsing

39.2k41742

answered Mar 5 at 17:57

blhsing

39.2k41742

answered Mar 5 at 17:57

blhsing

39.2k41742

answered Mar 5 at 17:57

blhsing

39.2k41742

add a comment |

Using Perl

$ perl -lne ' if($.>1) /^"(.+?)","(.+?)"/ ;$x=$1;$y=$2; $x=~s/[,]//g; print "$x,$y" 
 else print ' big_file.csv
CompanyName, CompanyNumber,RegAddress.CareOf,...
! # 1 AVAILABLE LOCKSMITH LTD,05905727
!NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

$

answered Mar 5 at 18:15

stack0114106

4,4002422

add a comment |

Using Perl

$ perl -lne ' if($.>1) /^"(.+?)","(.+?)"/ ;$x=$1;$y=$2; $x=~s/[,]//g; print "$x,$y" 
 else print ' big_file.csv
CompanyName, CompanyNumber,RegAddress.CareOf,...
! # 1 AVAILABLE LOCKSMITH LTD,05905727
!NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

$

answered Mar 5 at 18:15

stack0114106

4,4002422

add a comment |

Using Perl

$ perl -lne ' if($.>1) /^"(.+?)","(.+?)"/ ;$x=$1;$y=$2; $x=~s/[,]//g; print "$x,$y" 
 else print ' big_file.csv
CompanyName, CompanyNumber,RegAddress.CareOf,...
! # 1 AVAILABLE LOCKSMITH LTD,05905727
!NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

$

answered Mar 5 at 18:15

stack0114106

4,4002422

Using Perl

$ perl -lne ' if($.>1) /^"(.+?)","(.+?)"/ ;$x=$1;$y=$2; $x=~s/[,]//g; print "$x,$y" 
 else print ' big_file.csv
CompanyName, CompanyNumber,RegAddress.CareOf,...
! # 1 AVAILABLE LOCKSMITH LTD,05905727
!NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

$

answered Mar 5 at 18:15

stack0114106

4,4002422

answered Mar 5 at 18:15

stack0114106

4,4002422

answered Mar 5 at 18:15

stack0114106

4,4002422

answered Mar 5 at 18:15

stack0114106

4,4002422

add a comment |

You was almost there.

Since I don't know how many commas there is on the first line but if its only company name and company number this command is probably the shortest you can get if you will use bash:

The Easiest method to get rid of unwanted characters is with xargs after we run xargs -L1 things look better:

xargs -L1

Output:

CompanyName, CompanyNumber,RegAddress.CareOf,...
! # 1 AVAILABLE LOCKSMITH LTD,05905727,,...
!NSPIRED LIMITED,06019953,...
CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD,07981734,...

Now we can add cut -f1,2,3 which you have tried I guess

xargs -L1 | cut -d, -f1,2,3

Output:

CompanyName, CompanyNumber,RegAddress.CareOf
! # 1 AVAILABLE LOCKSMITH LTD,05905727,
!NSPIRED LIMITED,06019953,...
CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD,07981734

Okay now I got into the same problem as in your example, we got the number behind LTD as well since we added nr 3 to cut but the unwanted characters at the end still exist:

Solution, read file with sed and pipe it with xargs -L1!

sed 's/,...$//;s/,$//;s/, / /g' big_file.csv

Let's break it down:

sed 's/,...$//;s/,$//;s/, / /g' big_file.csv|xargs -L1|cut -d, -f1,2

End Result:

 CompanyName CompanyNumber,RegAddress.CareOf
 ! # 1 AVAILABLE LOCKSMITH LTD,05905727
 !NSPIRED LIMITED,06019953
 CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

Edit

Since I forgot a comma before my edit I found a better solution:

sed 's/, / /g' big_file.csv|xargs -L1|cut -d, -f1,2

edited Mar 5 at 21:23

TheRealBilaal

3771514

answered Mar 5 at 18:13

wuseman

1397

Hi! Thanks for all this work! It ends up getting rid of the company number of the third column tho, and the extra comma after COUNSELLING is still there.

– Tytire Recubans
Mar 5 at 18:48

Oh, you right! I missed this comma.

– wuseman
Mar 5 at 19:07

sed 's/, / /g' big_file.csv | xargs -L1 | cut -d, -f1,2 this worked like a charm!

– Tytire Recubans
Mar 6 at 9:22

1

I tried this and it worked really well - but had to switch to Perl 'cause it was taking more than 10 min to parse the file.

– Tytire Recubans
Mar 6 at 10:13

add a comment |

You was almost there.

Since I don't know how many commas there is on the first line but if its only company name and company number this command is probably the shortest you can get if you will use bash:

The Easiest method to get rid of unwanted characters is with xargs after we run xargs -L1 things look better:

xargs -L1

Output:

CompanyName, CompanyNumber,RegAddress.CareOf,...
! # 1 AVAILABLE LOCKSMITH LTD,05905727,,...
!NSPIRED LIMITED,06019953,...
CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD,07981734,...

Now we can add cut -f1,2,3 which you have tried I guess

xargs -L1 | cut -d, -f1,2,3

Output:

CompanyName, CompanyNumber,RegAddress.CareOf
! # 1 AVAILABLE LOCKSMITH LTD,05905727,
!NSPIRED LIMITED,06019953,...
CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD,07981734

Okay now I got into the same problem as in your example, we got the number behind LTD as well since we added nr 3 to cut but the unwanted characters at the end still exist:

Solution, read file with sed and pipe it with xargs -L1!

sed 's/,...$//;s/,$//;s/, / /g' big_file.csv

Let's break it down:

sed 's/,...$//;s/,$//;s/, / /g' big_file.csv|xargs -L1|cut -d, -f1,2

End Result:

 CompanyName CompanyNumber,RegAddress.CareOf
 ! # 1 AVAILABLE LOCKSMITH LTD,05905727
 !NSPIRED LIMITED,06019953
 CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

Edit

Since I forgot a comma before my edit I found a better solution:

sed 's/, / /g' big_file.csv|xargs -L1|cut -d, -f1,2

edited Mar 5 at 21:23

TheRealBilaal

3771514

answered Mar 5 at 18:13

wuseman

1397

Hi! Thanks for all this work! It ends up getting rid of the company number of the third column tho, and the extra comma after COUNSELLING is still there.

– Tytire Recubans
Mar 5 at 18:48

Oh, you right! I missed this comma.

– wuseman
Mar 5 at 19:07

sed 's/, / /g' big_file.csv | xargs -L1 | cut -d, -f1,2 this worked like a charm!

– Tytire Recubans
Mar 6 at 9:22

1

I tried this and it worked really well - but had to switch to Perl 'cause it was taking more than 10 min to parse the file.

– Tytire Recubans
Mar 6 at 10:13

add a comment |

You was almost there.

Since I don't know how many commas there is on the first line but if its only company name and company number this command is probably the shortest you can get if you will use bash:

The Easiest method to get rid of unwanted characters is with xargs after we run xargs -L1 things look better:

xargs -L1

Output:

CompanyName, CompanyNumber,RegAddress.CareOf,...
! # 1 AVAILABLE LOCKSMITH LTD,05905727,,...
!NSPIRED LIMITED,06019953,...
CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD,07981734,...

Now we can add cut -f1,2,3 which you have tried I guess

xargs -L1 | cut -d, -f1,2,3

Output:

CompanyName, CompanyNumber,RegAddress.CareOf
! # 1 AVAILABLE LOCKSMITH LTD,05905727,
!NSPIRED LIMITED,06019953,...
CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD,07981734

Okay now I got into the same problem as in your example, we got the number behind LTD as well since we added nr 3 to cut but the unwanted characters at the end still exist:

Solution, read file with sed and pipe it with xargs -L1!

sed 's/,...$//;s/,$//;s/, / /g' big_file.csv

Let's break it down:

sed 's/,...$//;s/,$//;s/, / /g' big_file.csv|xargs -L1|cut -d, -f1,2

End Result:

 CompanyName CompanyNumber,RegAddress.CareOf
 ! # 1 AVAILABLE LOCKSMITH LTD,05905727
 !NSPIRED LIMITED,06019953
 CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

Edit

Since I forgot a comma before my edit I found a better solution:

sed 's/, / /g' big_file.csv|xargs -L1|cut -d, -f1,2

edited Mar 5 at 21:23

TheRealBilaal

3771514

answered Mar 5 at 18:13

wuseman

1397

You was almost there.

Since I don't know how many commas there is on the first line but if its only company name and company number this command is probably the shortest you can get if you will use bash:

The Easiest method to get rid of unwanted characters is with xargs after we run xargs -L1 things look better:

xargs -L1

Output:

CompanyName, CompanyNumber,RegAddress.CareOf,...
! # 1 AVAILABLE LOCKSMITH LTD,05905727,,...
!NSPIRED LIMITED,06019953,...
CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD,07981734,...

Now we can add cut -f1,2,3 which you have tried I guess

xargs -L1 | cut -d, -f1,2,3

Output:

CompanyName, CompanyNumber,RegAddress.CareOf
! # 1 AVAILABLE LOCKSMITH LTD,05905727,
!NSPIRED LIMITED,06019953,...
CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD,07981734

Okay now I got into the same problem as in your example, we got the number behind LTD as well since we added nr 3 to cut but the unwanted characters at the end still exist:

Solution, read file with sed and pipe it with xargs -L1!

sed 's/,...$//;s/,$//;s/, / /g' big_file.csv

Let's break it down:

sed 's/,...$//;s/,$//;s/, / /g' big_file.csv|xargs -L1|cut -d, -f1,2

End Result:

 CompanyName CompanyNumber,RegAddress.CareOf
 ! # 1 AVAILABLE LOCKSMITH LTD,05905727
 !NSPIRED LIMITED,06019953
 CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

Edit

Since I forgot a comma before my edit I found a better solution:

sed 's/, / /g' big_file.csv|xargs -L1|cut -d, -f1,2

edited Mar 5 at 21:23

TheRealBilaal

3771514

answered Mar 5 at 18:13

wuseman

1397

edited Mar 5 at 21:23

TheRealBilaal

3771514

edited Mar 5 at 21:23

TheRealBilaal

3771514

edited Mar 5 at 21:23

TheRealBilaal

3771514

answered Mar 5 at 18:13

wuseman

1397

answered Mar 5 at 18:13

wuseman

1397

answered Mar 5 at 18:13

wuseman

1397

Hi! Thanks for all this work! It ends up getting rid of the company number of the third column tho, and the extra comma after COUNSELLING is still there.

– Tytire Recubans
Mar 5 at 18:48

Oh, you right! I missed this comma.

– wuseman
Mar 5 at 19:07

sed 's/, / /g' big_file.csv | xargs -L1 | cut -d, -f1,2 this worked like a charm!

– Tytire Recubans
Mar 6 at 9:22

1

I tried this and it worked really well - but had to switch to Perl 'cause it was taking more than 10 min to parse the file.

– Tytire Recubans
Mar 6 at 10:13

add a comment |

Hi! Thanks for all this work! It ends up getting rid of the company number of the third column tho, and the extra comma after COUNSELLING is still there.

– Tytire Recubans
Mar 5 at 18:48

Oh, you right! I missed this comma.

– wuseman
Mar 5 at 19:07

sed 's/, / /g' big_file.csv | xargs -L1 | cut -d, -f1,2 this worked like a charm!

– Tytire Recubans
Mar 6 at 9:22

1

I tried this and it worked really well - but had to switch to Perl 'cause it was taking more than 10 min to parse the file.

– Tytire Recubans
Mar 6 at 10:13

Hi! Thanks for all this work! It ends up getting rid of the company number of the third column tho, and the extra comma after COUNSELLING is still there.

– Tytire Recubans
Mar 5 at 18:48

Oh, you right! I missed this comma.

– wuseman
Mar 5 at 19:07

sed 's/, / /g' big_file.csv | xargs -L1 | cut -d, -f1,2 this worked like a charm!

– Tytire Recubans
Mar 6 at 9:22

I tried this and it worked really well - but had to switch to Perl 'cause it was taking more than 10 min to parse the file.

– Tytire Recubans
Mar 6 at 10:13

add a comment |

You can try with this sed :

sed -E '
 :A
 s/^("[^,"]*),(.*)/12/
 # label A if CompanyName can have more than 1 comma
 tA
 s/"//g;s/([^,]*,[^,]*).*/1/
' big_file.csv

answered Mar 6 at 10:44

ctac_

1,9451310

add a comment |

You can try with this sed :

sed -E '
 :A
 s/^("[^,"]*),(.*)/12/
 # label A if CompanyName can have more than 1 comma
 tA
 s/"//g;s/([^,]*,[^,]*).*/1/
' big_file.csv

answered Mar 6 at 10:44

ctac_

1,9451310

add a comment |

You can try with this sed :

sed -E '
 :A
 s/^("[^,"]*),(.*)/12/
 # label A if CompanyName can have more than 1 comma
 tA
 s/"//g;s/([^,]*,[^,]*).*/1/
' big_file.csv

answered Mar 6 at 10:44

ctac_

1,9451310

You can try with this sed :

sed -E '
 :A
 s/^("[^,"]*),(.*)/12/
 # label A if CompanyName can have more than 1 comma
 tA
 s/"//g;s/([^,]*,[^,]*).*/1/
' big_file.csv

answered Mar 6 at 10:44

ctac_

1,9451310

answered Mar 6 at 10:44

ctac_

1,9451310

answered Mar 6 at 10:44

ctac_

1,9451310

answered Mar 6 at 10:44

ctac_

1,9451310

add a comment |

awk is your friend

maybe this helps

➜ ~ awk 'BEGIN FS="","" printf "%s, %s n",$1,$2 ' big_file.csv | tr -d '"'
CompanyName, CompanyNumber,RegAddress.CareOf,...,
! # 1 AVAILABLE LOCKSMITH LTD, 05905727
!NSPIRED LIMITED, 06019953
CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD, 07981734

answered Mar 5 at 17:51

Hernan Garcia

563315

This is almost what I was looking for - but the third line still includes that extra comma after counselling!

– Tytire Recubans
Mar 5 at 18:45

add a comment |

awk is your friend

maybe this helps

➜ ~ awk 'BEGIN FS="","" printf "%s, %s n",$1,$2 ' big_file.csv | tr -d '"'
CompanyName, CompanyNumber,RegAddress.CareOf,...,
! # 1 AVAILABLE LOCKSMITH LTD, 05905727
!NSPIRED LIMITED, 06019953
CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD, 07981734

answered Mar 5 at 17:51

Hernan Garcia

563315

This is almost what I was looking for - but the third line still includes that extra comma after counselling!

– Tytire Recubans
Mar 5 at 18:45

add a comment |

awk is your friend

maybe this helps

➜ ~ awk 'BEGIN FS="","" printf "%s, %s n",$1,$2 ' big_file.csv | tr -d '"'
CompanyName, CompanyNumber,RegAddress.CareOf,...,
! # 1 AVAILABLE LOCKSMITH LTD, 05905727
!NSPIRED LIMITED, 06019953
CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD, 07981734

answered Mar 5 at 17:51

Hernan Garcia

563315

awk is your friend

maybe this helps

➜ ~ awk 'BEGIN FS="","" printf "%s, %s n",$1,$2 ' big_file.csv | tr -d '"'
CompanyName, CompanyNumber,RegAddress.CareOf,...,
! # 1 AVAILABLE LOCKSMITH LTD, 05905727
!NSPIRED LIMITED, 06019953
CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD, 07981734

answered Mar 5 at 17:51

Hernan Garcia

563315

answered Mar 5 at 17:51

Hernan Garcia

563315

answered Mar 5 at 17:51

Hernan Garcia

563315

answered Mar 5 at 17:51

Hernan Garcia

563315

This is almost what I was looking for - but the third line still includes that extra comma after counselling!

– Tytire Recubans
Mar 5 at 18:45

add a comment |

This is almost what I was looking for - but the third line still includes that extra comma after counselling!

– Tytire Recubans
Mar 5 at 18:45

This is almost what I was looking for - but the third line still includes that extra comma after counselling!

– Tytire Recubans
Mar 5 at 18:45

add a comment |

Based on the inputs of two of the answers below I tried several approaches:

The following one worked, but with 4m rows and several columns it was extremely slow:
- First get rid of the extra leading space in the second column with:
  sed -i '0,/ CompanyNumber/ s//CompanyNumber/' big_file.csv
- Then combine xargs -L1 with csvcut and sed:
  sed 's/, / /g' big_file.csv | xargs -L1 | csvcut -c CompanyName,CompanyNumber > big_file_cleaned.csv

This worked, but was super slow.

A solution in Perl from one of the kind contributors!
- First clean first line with perl: perl -lne ' if($.>1) /^"(.+?)","(.+?)"/ ;$x=$1;$y=$2; $x=~s/[,]//g; print "$x,$y" else print ' big_file.csv > big_file_clean.csv
- Then filter out only the columns I need with:
  csvcut -c CompanyName,CompanyNumber big_file_clean.csv > big_file_clean_namecodesonly.csv

THANK YOU

answered Mar 6 at 10:19

Tytire Recubans

476

add a comment |

Based on the inputs of two of the answers below I tried several approaches:

The following one worked, but with 4m rows and several columns it was extremely slow:
- First get rid of the extra leading space in the second column with:
  sed -i '0,/ CompanyNumber/ s//CompanyNumber/' big_file.csv
- Then combine xargs -L1 with csvcut and sed:
  sed 's/, / /g' big_file.csv | xargs -L1 | csvcut -c CompanyName,CompanyNumber > big_file_cleaned.csv

This worked, but was super slow.

A solution in Perl from one of the kind contributors!
- First clean first line with perl: perl -lne ' if($.>1) /^"(.+?)","(.+?)"/ ;$x=$1;$y=$2; $x=~s/[,]//g; print "$x,$y" else print ' big_file.csv > big_file_clean.csv
- Then filter out only the columns I need with:
  csvcut -c CompanyName,CompanyNumber big_file_clean.csv > big_file_clean_namecodesonly.csv

THANK YOU

answered Mar 6 at 10:19

Tytire Recubans

476

add a comment |

Based on the inputs of two of the answers below I tried several approaches:

The following one worked, but with 4m rows and several columns it was extremely slow:
- First get rid of the extra leading space in the second column with:
  sed -i '0,/ CompanyNumber/ s//CompanyNumber/' big_file.csv
- Then combine xargs -L1 with csvcut and sed:
  sed 's/, / /g' big_file.csv | xargs -L1 | csvcut -c CompanyName,CompanyNumber > big_file_cleaned.csv

This worked, but was super slow.

A solution in Perl from one of the kind contributors!
- First clean first line with perl: perl -lne ' if($.>1) /^"(.+?)","(.+?)"/ ;$x=$1;$y=$2; $x=~s/[,]//g; print "$x,$y" else print ' big_file.csv > big_file_clean.csv
- Then filter out only the columns I need with:
  csvcut -c CompanyName,CompanyNumber big_file_clean.csv > big_file_clean_namecodesonly.csv

THANK YOU

answered Mar 6 at 10:19

Tytire Recubans

476

Based on the inputs of two of the answers below I tried several approaches:

The following one worked, but with 4m rows and several columns it was extremely slow:
- First get rid of the extra leading space in the second column with:
  sed -i '0,/ CompanyNumber/ s//CompanyNumber/' big_file.csv
- Then combine xargs -L1 with csvcut and sed:
  sed 's/, / /g' big_file.csv | xargs -L1 | csvcut -c CompanyName,CompanyNumber > big_file_cleaned.csv

This worked, but was super slow.

A solution in Perl from one of the kind contributors!
- First clean first line with perl: perl -lne ' if($.>1) /^"(.+?)","(.+?)"/ ;$x=$1;$y=$2; $x=~s/[,]//g; print "$x,$y" else print ' big_file.csv > big_file_clean.csv
- Then filter out only the columns I need with:
  csvcut -c CompanyName,CompanyNumber big_file_clean.csv > big_file_clean_namecodesonly.csv

THANK YOU

answered Mar 6 at 10:19

Tytire Recubans

476

answered Mar 6 at 10:19

Tytire Recubans

476

answered Mar 6 at 10:19

Tytire Recubans

476

answered Mar 6 at 10:19

Tytire Recubans

476

add a comment |

awk 'NR>1gsub(/"/,"")sub(/.4$/,"")gsub(/!1' file

CompanyName, CompanyNumber,RegAddress.CareOf,...
AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHAPY AND TRAINING LTD,07981734

answered Mar 6 at 21:47

Claes Wikner

1,118157

add a comment |

awk 'NR>1gsub(/"/,"")sub(/.4$/,"")gsub(/!1' file

CompanyName, CompanyNumber,RegAddress.CareOf,...
AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHAPY AND TRAINING LTD,07981734

answered Mar 6 at 21:47

Claes Wikner

1,118157

add a comment |

awk 'NR>1gsub(/"/,"")sub(/.4$/,"")gsub(/!1' file

CompanyName, CompanyNumber,RegAddress.CareOf,...
AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHAPY AND TRAINING LTD,07981734

answered Mar 6 at 21:47

Claes Wikner

1,118157

awk 'NR>1gsub(/"/,"")sub(/.4$/,"")gsub(/!1' file

CompanyName, CompanyNumber,RegAddress.CareOf,...
AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHAPY AND TRAINING LTD,07981734

answered Mar 6 at 21:47

Claes Wikner

1,118157

answered Mar 6 at 21:47

Claes Wikner

1,118157

answered Mar 6 at 21:47

Claes Wikner

1,118157

answered Mar 6 at 21:47

Claes Wikner

1,118157

add a comment |

With GNU awk for FPAT:

$ cat tst.awk
BEGIN [^,]*"; OFS="," 
NR == 1 print; next 

 for (i=1; i<=NF; i++) 
 gsub(/[^[:alnum:]]+/," ",$i)
 gsub(/^ 
 print $1, $2


$ awk -f tst.awk file
CompanyName, CompanyNumber,RegAddress.CareOf,...
1 AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

See What's the most robust way to efficiently parse CSV using awk? if you need to work with any more exotic CSVs.

edited Mar 11 at 21:45

answered Mar 5 at 18:50

Ed Morton

112k1245103

Hi @Ed Morton, thanks for the answer! I am working on a the same big_file.csv again - turns out other columns have extra commas in their strings - not just "CompanyName". Is there a way to perform this operation on all columns of the csv (delete all non alphanumeric values in the strings of all columns of the csv)? I just started to learn AWK!

– Tytire Recubans
Mar 11 at 21:26

You're welcome. My script already does what you're asking for in your comment, it doesn't make any assumptions/restrictions about any specific field. Did you try it?

– Ed Morton
Mar 11 at 21:29

Oh thank you! I was trying to parse it mentally bit by bit. I guess if I want to print all cols and not just the first 2 I simply replace print $1, $2 with print right? :)

– Tytire Recubans
Mar 11 at 21:36

1

Right. _____________

– Ed Morton
Mar 11 at 21:38

add a comment |

With GNU awk for FPAT:

$ cat tst.awk
BEGIN [^,]*"; OFS="," 
NR == 1 print; next 

 for (i=1; i<=NF; i++) 
 gsub(/[^[:alnum:]]+/," ",$i)
 gsub(/^ 
 print $1, $2


$ awk -f tst.awk file
CompanyName, CompanyNumber,RegAddress.CareOf,...
1 AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

See What's the most robust way to efficiently parse CSV using awk? if you need to work with any more exotic CSVs.

edited Mar 11 at 21:45

answered Mar 5 at 18:50

Ed Morton

112k1245103

Hi @Ed Morton, thanks for the answer! I am working on a the same big_file.csv again - turns out other columns have extra commas in their strings - not just "CompanyName". Is there a way to perform this operation on all columns of the csv (delete all non alphanumeric values in the strings of all columns of the csv)? I just started to learn AWK!

– Tytire Recubans
Mar 11 at 21:26

You're welcome. My script already does what you're asking for in your comment, it doesn't make any assumptions/restrictions about any specific field. Did you try it?

– Ed Morton
Mar 11 at 21:29

Oh thank you! I was trying to parse it mentally bit by bit. I guess if I want to print all cols and not just the first 2 I simply replace print $1, $2 with print right? :)

– Tytire Recubans
Mar 11 at 21:36

1

Right. _____________

– Ed Morton
Mar 11 at 21:38

add a comment |

With GNU awk for FPAT:

$ cat tst.awk
BEGIN [^,]*"; OFS="," 
NR == 1 print; next 

 for (i=1; i<=NF; i++) 
 gsub(/[^[:alnum:]]+/," ",$i)
 gsub(/^ 
 print $1, $2


$ awk -f tst.awk file
CompanyName, CompanyNumber,RegAddress.CareOf,...
1 AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

See What's the most robust way to efficiently parse CSV using awk? if you need to work with any more exotic CSVs.

edited Mar 11 at 21:45

answered Mar 5 at 18:50

Ed Morton

112k1245103

With GNU awk for FPAT:

$ cat tst.awk
BEGIN [^,]*"; OFS="," 
NR == 1 print; next 

 for (i=1; i<=NF; i++) 
 gsub(/[^[:alnum:]]+/," ",$i)
 gsub(/^ 
 print $1, $2


$ awk -f tst.awk file
CompanyName, CompanyNumber,RegAddress.CareOf,...
1 AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734

See What's the most robust way to efficiently parse CSV using awk? if you need to work with any more exotic CSVs.

edited Mar 11 at 21:45

answered Mar 5 at 18:50

Ed Morton

112k1245103

edited Mar 11 at 21:45

answered Mar 5 at 18:50

Ed Morton

112k1245103

answered Mar 5 at 18:50

Ed Morton

112k1245103

answered Mar 5 at 18:50

Ed Morton

112k1245103

Hi @Ed Morton, thanks for the answer! I am working on a the same big_file.csv again - turns out other columns have extra commas in their strings - not just "CompanyName". Is there a way to perform this operation on all columns of the csv (delete all non alphanumeric values in the strings of all columns of the csv)? I just started to learn AWK!

– Tytire Recubans
Mar 11 at 21:26

You're welcome. My script already does what you're asking for in your comment, it doesn't make any assumptions/restrictions about any specific field. Did you try it?

– Ed Morton
Mar 11 at 21:29

Oh thank you! I was trying to parse it mentally bit by bit. I guess if I want to print all cols and not just the first 2 I simply replace print $1, $2 with print right? :)

– Tytire Recubans
Mar 11 at 21:36

1

Right. _____________

– Ed Morton
Mar 11 at 21:38

add a comment |

Hi @Ed Morton, thanks for the answer! I am working on a the same big_file.csv again - turns out other columns have extra commas in their strings - not just "CompanyName". Is there a way to perform this operation on all columns of the csv (delete all non alphanumeric values in the strings of all columns of the csv)? I just started to learn AWK!

– Tytire Recubans
Mar 11 at 21:26

You're welcome. My script already does what you're asking for in your comment, it doesn't make any assumptions/restrictions about any specific field. Did you try it?

– Ed Morton
Mar 11 at 21:29

Oh thank you! I was trying to parse it mentally bit by bit. I guess if I want to print all cols and not just the first 2 I simply replace print $1, $2 with print right? :)

– Tytire Recubans
Mar 11 at 21:36

1

Right. _____________

– Ed Morton
Mar 11 at 21:38

Hi @Ed Morton, thanks for the answer! I am working on a the same big_file.csv again - turns out other columns have extra commas in their strings - not just "CompanyName". Is there a way to perform this operation on all columns of the csv (delete all non alphanumeric values in the strings of all columns of the csv)? I just started to learn AWK!

– Tytire Recubans
Mar 11 at 21:26

You're welcome. My script already does what you're asking for in your comment, it doesn't make any assumptions/restrictions about any specific field. Did you try it?

– Ed Morton
Mar 11 at 21:29

Oh thank you! I was trying to parse it mentally bit by bit. I guess if I want to print all cols and not just the first 2 I simply replace print $1, $2 with print right? :)

– Tytire Recubans
Mar 11 at 21:36

Right. _____________

– Ed Morton
Mar 11 at 21:38

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ufdjrw

10 Answers
10

You was almost there.

Solution, read file with sed and pipe it with xargs -L1!

End Result:

Edit

Your Answer

Post as a guest

10 Answers
10

10 Answers
10

You was almost there.

Solution, read file with sed and pipe it with xargs -L1!

End Result:

Edit

You was almost there.

Solution, read file with sed and pipe it with xargs -L1!

End Result:

Edit

You was almost there.

Solution, read file with sed and pipe it with xargs -L1!

End Result:

Edit

You was almost there.

Solution, read file with sed and pipe it with xargs -L1!

End Result:

Edit

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

10 Answers 10

You was almost there.

Solution, read file with sed and pipe it with xargs -L1!

End Result:

Edit

Your Answer

Sign up or log in

Post as a guest

Post as a guest

10 Answers 10

10 Answers 10

You was almost there.

Solution, read file with sed and pipe it with xargs -L1!

End Result:

Edit

You was almost there.

Solution, read file with sed and pipe it with xargs -L1!

End Result:

Edit

You was almost there.

Solution, read file with sed and pipe it with xargs -L1!

End Result:

Edit

You was almost there.

Solution, read file with sed and pipe it with xargs -L1!

End Result:

Edit

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

10 Answers
10

10 Answers
10

10 Answers
10