How to sed replace UTF-8 characters with HTML entities? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Data science time! April 2019 and salary with experience Should we burninate the [wrap] tag? The Ask Question Wizard is Live!Escape a string for a sed replace patternUnexpected substitution for & with sedHow can I remove the first line of a text file using bash/sed script?Escape a string for a sed replace patternHow can I replace a newline (n) using sed?How to do a recursive find/replace of a string with awk or sed?How to output only captured groups with sed?Find and replace in file and overwrite file doesn't work, it empties the fileFind and replace with sed in directory and sub directoriesReplace comma with newline in sed on MacOS?Replace whole line containing a string using SedHow to replace strings containing slashes with sed?
List *all* the tuples!
How can I make names more distinctive without making them longer?
Echoing a tail command produces unexpected output?
Output the ŋarâþ crîþ alphabet song without using (m)any letters
What's the meaning of 間時肆拾貳 at a car parking sign
Why was the term "discrete" used in discrete logarithm?
What would be the ideal power source for a cybernetic eye?
Withdrew £2800, but only £2000 shows as withdrawn on online banking; what are my obligations?
What is a non-alternating simple group with big order, but relatively few conjugacy classes?
Denied boarding although I have proper visa and documentation. To whom should I make a complaint?
English words in a non-english sci-fi novel
Why do people hide their license plates in the EU?
How come Sam didn't become Lord of Horn Hill?
prime numbers and expressing non-prime numbers
String `!23` is replaced with `docker` in command line
Generate an RGB colour grid
Why are Kinder Surprise Eggs illegal in the USA?
Overriding an object in memory with placement new
List of Python versions
3 doors, three guards, one stone
Is it true that "carbohydrates are of no use for the basal metabolic need"?
What does an IRS interview request entail when called in to verify expenses for a sole proprietor small business?
Check which numbers satisfy the condition [A*B*C = A! + B! + C!]
How do I stop a creek from eroding my steep embankment?
How to sed replace UTF-8 characters with HTML entities?
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
Data science time! April 2019 and salary with experience
Should we burninate the [wrap] tag?
The Ask Question Wizard is Live!Escape a string for a sed replace patternUnexpected substitution for & with sedHow can I remove the first line of a text file using bash/sed script?Escape a string for a sed replace patternHow can I replace a newline (n) using sed?How to do a recursive find/replace of a string with awk or sed?How to output only captured groups with sed?Find and replace in file and overwrite file doesn't work, it empties the fileFind and replace with sed in directory and sub directoriesReplace comma with newline in sed on MacOS?Replace whole line containing a string using SedHow to replace strings containing slashes with sed?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I'm running cygwin
under windows 10
Have a dictionary file (1-dictionary.txt
) that looks like this:
labelling labeling
flavour flavor
colour color
organisations organizations
végétales végétales
contr?lée contrôlée
" "
The separators between are TAB
s (t
s).
The dictionary file is encoded as UTF-8
.
Want to replace words and symbols in the first column with words and HTML entities in the second column.
My source file (2-source.txt
) has the target UTF-8 and ASCII symbols. The source file also is encoded as UTF-8
.
Sample text looks like this:
Cultivar was coined by Bailey and it is generally regarded as a portmanteau of "cultivated" and "variety" ... The International Union for the Protection of New Varieties of Plants (UPOV - French: Union internationale pour la protection des obtentions végétales) offers legal protection of plant cultivars ...Terroir is the basis of the French wine appellation d'origine contrôlée (AOC) system
I run the following sed
one-liner in a shell script (./3-script.sh):
sed -f <(sed -E 's_(.+)t(.+)_s/1/2/g_' 1-dictionary.txt) 2-source.txt > 3-translation.txt
The substitution of English (en-GB) words with American (en-US) words in 3-translation.txt
is successful.
However the substitution of ASCII symbols, such as the quote symbol, and UTF-8 words produces this result:
vvégétales#x00E9;gvégétales#x00E9;tales)
contrcontrôlée#x00F4;lcontrôlée#x00E9;e (AOC)
If i use only the specific symbol (not the full word) I get results like this:
vé#x00E9;gé#x00E9;tales
"#x0022cultivated"#x0022
contrô#x00F4;lé#x00E9;e
The ASCII quote symbol is appended with "
- it is not replaced.
Similarly, the UTF-8 symbol is appended with its HTML entity - not replaced with the HTML entity.
The expected output would look like this:
v#x00E9;g#x00E9;tales
#x0022cultivated#x0022
contr#x00F4;l#x00E9;e
How to modify the sed
script so that target ASCII and UTF-8 symbols are replaced with their HTML entity equivalent as defined in the dictionary file?
sed
add a comment |
I'm running cygwin
under windows 10
Have a dictionary file (1-dictionary.txt
) that looks like this:
labelling labeling
flavour flavor
colour color
organisations organizations
végétales végétales
contr?lée contrôlée
" "
The separators between are TAB
s (t
s).
The dictionary file is encoded as UTF-8
.
Want to replace words and symbols in the first column with words and HTML entities in the second column.
My source file (2-source.txt
) has the target UTF-8 and ASCII symbols. The source file also is encoded as UTF-8
.
Sample text looks like this:
Cultivar was coined by Bailey and it is generally regarded as a portmanteau of "cultivated" and "variety" ... The International Union for the Protection of New Varieties of Plants (UPOV - French: Union internationale pour la protection des obtentions végétales) offers legal protection of plant cultivars ...Terroir is the basis of the French wine appellation d'origine contrôlée (AOC) system
I run the following sed
one-liner in a shell script (./3-script.sh):
sed -f <(sed -E 's_(.+)t(.+)_s/1/2/g_' 1-dictionary.txt) 2-source.txt > 3-translation.txt
The substitution of English (en-GB) words with American (en-US) words in 3-translation.txt
is successful.
However the substitution of ASCII symbols, such as the quote symbol, and UTF-8 words produces this result:
vvégétales#x00E9;gvégétales#x00E9;tales)
contrcontrôlée#x00F4;lcontrôlée#x00E9;e (AOC)
If i use only the specific symbol (not the full word) I get results like this:
vé#x00E9;gé#x00E9;tales
"#x0022cultivated"#x0022
contrô#x00F4;lé#x00E9;e
The ASCII quote symbol is appended with "
- it is not replaced.
Similarly, the UTF-8 symbol is appended with its HTML entity - not replaced with the HTML entity.
The expected output would look like this:
v#x00E9;g#x00E9;tales
#x0022cultivated#x0022
contr#x00F4;l#x00E9;e
How to modify the sed
script so that target ASCII and UTF-8 symbols are replaced with their HTML entity equivalent as defined in the dictionary file?
sed
1
Possible duplicate of Unexpected substitution for & with sed
– tripleee
Mar 8 at 17:47
1
Possible duplicate of stackoverflow.com/questions/407523/…
– tripleee
Mar 8 at 17:48
1
I tried it, just replace all&
with&
in your1-dictionary.txt
will solve your problem. Try it, see if it's working.
– Tiw
Mar 8 at 18:09
add a comment |
I'm running cygwin
under windows 10
Have a dictionary file (1-dictionary.txt
) that looks like this:
labelling labeling
flavour flavor
colour color
organisations organizations
végétales végétales
contr?lée contrôlée
" "
The separators between are TAB
s (t
s).
The dictionary file is encoded as UTF-8
.
Want to replace words and symbols in the first column with words and HTML entities in the second column.
My source file (2-source.txt
) has the target UTF-8 and ASCII symbols. The source file also is encoded as UTF-8
.
Sample text looks like this:
Cultivar was coined by Bailey and it is generally regarded as a portmanteau of "cultivated" and "variety" ... The International Union for the Protection of New Varieties of Plants (UPOV - French: Union internationale pour la protection des obtentions végétales) offers legal protection of plant cultivars ...Terroir is the basis of the French wine appellation d'origine contrôlée (AOC) system
I run the following sed
one-liner in a shell script (./3-script.sh):
sed -f <(sed -E 's_(.+)t(.+)_s/1/2/g_' 1-dictionary.txt) 2-source.txt > 3-translation.txt
The substitution of English (en-GB) words with American (en-US) words in 3-translation.txt
is successful.
However the substitution of ASCII symbols, such as the quote symbol, and UTF-8 words produces this result:
vvégétales#x00E9;gvégétales#x00E9;tales)
contrcontrôlée#x00F4;lcontrôlée#x00E9;e (AOC)
If i use only the specific symbol (not the full word) I get results like this:
vé#x00E9;gé#x00E9;tales
"#x0022cultivated"#x0022
contrô#x00F4;lé#x00E9;e
The ASCII quote symbol is appended with "
- it is not replaced.
Similarly, the UTF-8 symbol is appended with its HTML entity - not replaced with the HTML entity.
The expected output would look like this:
v#x00E9;g#x00E9;tales
#x0022cultivated#x0022
contr#x00F4;l#x00E9;e
How to modify the sed
script so that target ASCII and UTF-8 symbols are replaced with their HTML entity equivalent as defined in the dictionary file?
sed
I'm running cygwin
under windows 10
Have a dictionary file (1-dictionary.txt
) that looks like this:
labelling labeling
flavour flavor
colour color
organisations organizations
végétales végétales
contr?lée contrôlée
" "
The separators between are TAB
s (t
s).
The dictionary file is encoded as UTF-8
.
Want to replace words and symbols in the first column with words and HTML entities in the second column.
My source file (2-source.txt
) has the target UTF-8 and ASCII symbols. The source file also is encoded as UTF-8
.
Sample text looks like this:
Cultivar was coined by Bailey and it is generally regarded as a portmanteau of "cultivated" and "variety" ... The International Union for the Protection of New Varieties of Plants (UPOV - French: Union internationale pour la protection des obtentions végétales) offers legal protection of plant cultivars ...Terroir is the basis of the French wine appellation d'origine contrôlée (AOC) system
I run the following sed
one-liner in a shell script (./3-script.sh):
sed -f <(sed -E 's_(.+)t(.+)_s/1/2/g_' 1-dictionary.txt) 2-source.txt > 3-translation.txt
The substitution of English (en-GB) words with American (en-US) words in 3-translation.txt
is successful.
However the substitution of ASCII symbols, such as the quote symbol, and UTF-8 words produces this result:
vvégétales#x00E9;gvégétales#x00E9;tales)
contrcontrôlée#x00F4;lcontrôlée#x00E9;e (AOC)
If i use only the specific symbol (not the full word) I get results like this:
vé#x00E9;gé#x00E9;tales
"#x0022cultivated"#x0022
contrô#x00F4;lé#x00E9;e
The ASCII quote symbol is appended with "
- it is not replaced.
Similarly, the UTF-8 symbol is appended with its HTML entity - not replaced with the HTML entity.
The expected output would look like this:
v#x00E9;g#x00E9;tales
#x0022cultivated#x0022
contr#x00F4;l#x00E9;e
How to modify the sed
script so that target ASCII and UTF-8 symbols are replaced with their HTML entity equivalent as defined in the dictionary file?
sed
sed
edited Mar 8 at 17:53
Jay Gray
asked Mar 8 at 17:32
Jay GrayJay Gray
1,02021222
1,02021222
1
Possible duplicate of Unexpected substitution for & with sed
– tripleee
Mar 8 at 17:47
1
Possible duplicate of stackoverflow.com/questions/407523/…
– tripleee
Mar 8 at 17:48
1
I tried it, just replace all&
with&
in your1-dictionary.txt
will solve your problem. Try it, see if it's working.
– Tiw
Mar 8 at 18:09
add a comment |
1
Possible duplicate of Unexpected substitution for & with sed
– tripleee
Mar 8 at 17:47
1
Possible duplicate of stackoverflow.com/questions/407523/…
– tripleee
Mar 8 at 17:48
1
I tried it, just replace all&
with&
in your1-dictionary.txt
will solve your problem. Try it, see if it's working.
– Tiw
Mar 8 at 18:09
1
1
Possible duplicate of Unexpected substitution for & with sed
– tripleee
Mar 8 at 17:47
Possible duplicate of Unexpected substitution for & with sed
– tripleee
Mar 8 at 17:47
1
1
Possible duplicate of stackoverflow.com/questions/407523/…
– tripleee
Mar 8 at 17:48
Possible duplicate of stackoverflow.com/questions/407523/…
– tripleee
Mar 8 at 17:48
1
1
I tried it, just replace all
&
with &
in your 1-dictionary.txt
will solve your problem. Try it, see if it's working.– Tiw
Mar 8 at 18:09
I tried it, just replace all
&
with &
in your 1-dictionary.txt
will solve your problem. Try it, see if it's working.– Tiw
Mar 8 at 18:09
add a comment |
1 Answer
1
active
oldest
votes
I tried it, just replace all &
with &
in your 1-dictionary.txt
will solve your problem.
Sed's substitute uses a regex as the from part, so when you use it like that, notice those regex characters and add to prepare them to be escaped.
And the to part will have special characters too, mainly and
&
, add extra to prepare them to be escaped too.
Above linked to GNU sed's document, for other sed
version, you can also check man sed
.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55068226%2fhow-to-sed-replace-utf-8-characters-with-html-entities%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I tried it, just replace all &
with &
in your 1-dictionary.txt
will solve your problem.
Sed's substitute uses a regex as the from part, so when you use it like that, notice those regex characters and add to prepare them to be escaped.
And the to part will have special characters too, mainly and
&
, add extra to prepare them to be escaped too.
Above linked to GNU sed's document, for other sed
version, you can also check man sed
.
add a comment |
I tried it, just replace all &
with &
in your 1-dictionary.txt
will solve your problem.
Sed's substitute uses a regex as the from part, so when you use it like that, notice those regex characters and add to prepare them to be escaped.
And the to part will have special characters too, mainly and
&
, add extra to prepare them to be escaped too.
Above linked to GNU sed's document, for other sed
version, you can also check man sed
.
add a comment |
I tried it, just replace all &
with &
in your 1-dictionary.txt
will solve your problem.
Sed's substitute uses a regex as the from part, so when you use it like that, notice those regex characters and add to prepare them to be escaped.
And the to part will have special characters too, mainly and
&
, add extra to prepare them to be escaped too.
Above linked to GNU sed's document, for other sed
version, you can also check man sed
.
I tried it, just replace all &
with &
in your 1-dictionary.txt
will solve your problem.
Sed's substitute uses a regex as the from part, so when you use it like that, notice those regex characters and add to prepare them to be escaped.
And the to part will have special characters too, mainly and
&
, add extra to prepare them to be escaped too.
Above linked to GNU sed's document, for other sed
version, you can also check man sed
.
edited Mar 8 at 19:07
answered Mar 8 at 18:59
TiwTiw
4,40761730
4,40761730
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55068226%2fhow-to-sed-replace-utf-8-characters-with-html-entities%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Possible duplicate of Unexpected substitution for & with sed
– tripleee
Mar 8 at 17:47
1
Possible duplicate of stackoverflow.com/questions/407523/…
– tripleee
Mar 8 at 17:48
1
I tried it, just replace all
&
with&
in your1-dictionary.txt
will solve your problem. Try it, see if it's working.– Tiw
Mar 8 at 18:09