Extract both href and text on same line using Xidel, specific links onlyHow to extract a specific part from XMLxquery expression to return a link text only if it contains within it a specific stringExtract only the values using XQueryCompiler in JavaExtract specific info from xml stored in clobI'm using XQUERY, I have 500 records saved in Database, I want to extract only n number of records from it, according to my needHow to extract text with html link?Extracting links (get href values) with certain text with Xpath under a div tag with certain classXQuery SQL Extract text value from a child element of a specific nodeeXist-db HTML output 'stalling' after a few lineseXist-db / XQuery compression:zip() of XML files saves text only
Should I outline or discovery write my stories?
Count the occurrence of each unique word in the file
Create all possible words using a set or letters
What was this official D&D 3.5e Lovecraft-flavored rulebook?
Lowest total scrabble score
What is the evidence for the "tyranny of the majority problem" in a direct democracy context?
Why did the Mercure fail?
What should you do if you miss a job interview (deliberately)?
Fear of getting stuck on one programming language / technology that is not used in my country
Can I sign legal documents with a smiley face?
Should I stop contributing to retirement accounts?
Where does the bonus feat in the cleric starting package come from?
Is this toilet slogan correct usage of the English language?
Is there a working SACD iso player for Ubuntu?
Intuition of generalized eigenvector.
Does a 'pending' US visa application constitute a denial?
Why did the HMS Bounty go back to a time when whales are already rare?
GraphicsGrid with a Label for each Column and Row
What is Cash Advance APR?
On a tidally locked planet, would time be quantized?
How do I color the graph in datavisualization?
Non-trope happy ending?
Approximating irrational number to rational number
Creepy dinosaur pc game identification
Extract both href and text on same line using Xidel, specific links only
How to extract a specific part from XMLxquery expression to return a link text only if it contains within it a specific stringExtract only the values using XQueryCompiler in JavaExtract specific info from xml stored in clobI'm using XQUERY, I have 500 records saved in Database, I want to extract only n number of records from it, according to my needHow to extract text with html link?Extracting links (get href values) with certain text with Xpath under a div tag with certain classXQuery SQL Extract text value from a child element of a specific nodeeXist-db HTML output 'stalling' after a few lineseXist-db / XQuery compression:zip() of XML files saves text only
I am trying to extract the link (href) and text inside the <a>
tag for a number of links in an html page.
I only want specific links, which I match by a substring.
Example of my html:
<a href="/this/dir/1234/">This should be 1234</a> some other html
<a href="/this/dir/1236/">This should be 1236</a> some other html
<a href="/about_us/">Not important link</a> some other html
I am using Xidel, which allows me to avoid regexp. It seems to be the simplest for the job.
What I have so far:
xidel -e "//a/(@href[contains(.,'/this/dir')],text())"
It basically works, but two issues remain:
- I get the data separated by linefeed. I would like to have it on same line.
- Every link text is returned, so I get the text "Not important link" as well.
What is recommended way to get output like
/this/dir/1234 ; This should be 1234
/this/dir/1236 ; This should be 1236
Appreciate any feedback / tips.
edit:
The solution provided by Martin was 99% there. Newlines were not output, so I am using awk to replace a dummy text with newlines.
note : I am on windows.
xidel myhtml.htm -e "string-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), 'XXX')" | awk -F "XXX" "$1=$11" "OFS=n"
xquery xidel
add a comment |
I am trying to extract the link (href) and text inside the <a>
tag for a number of links in an html page.
I only want specific links, which I match by a substring.
Example of my html:
<a href="/this/dir/1234/">This should be 1234</a> some other html
<a href="/this/dir/1236/">This should be 1236</a> some other html
<a href="/about_us/">Not important link</a> some other html
I am using Xidel, which allows me to avoid regexp. It seems to be the simplest for the job.
What I have so far:
xidel -e "//a/(@href[contains(.,'/this/dir')],text())"
It basically works, but two issues remain:
- I get the data separated by linefeed. I would like to have it on same line.
- Every link text is returned, so I get the text "Not important link" as well.
What is recommended way to get output like
/this/dir/1234 ; This should be 1234
/this/dir/1236 ; This should be 1236
Appreciate any feedback / tips.
edit:
The solution provided by Martin was 99% there. Newlines were not output, so I am using awk to replace a dummy text with newlines.
note : I am on windows.
xidel myhtml.htm -e "string-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), 'XXX')" | awk -F "XXX" "$1=$11" "OFS=n"
xquery xidel
add a comment |
I am trying to extract the link (href) and text inside the <a>
tag for a number of links in an html page.
I only want specific links, which I match by a substring.
Example of my html:
<a href="/this/dir/1234/">This should be 1234</a> some other html
<a href="/this/dir/1236/">This should be 1236</a> some other html
<a href="/about_us/">Not important link</a> some other html
I am using Xidel, which allows me to avoid regexp. It seems to be the simplest for the job.
What I have so far:
xidel -e "//a/(@href[contains(.,'/this/dir')],text())"
It basically works, but two issues remain:
- I get the data separated by linefeed. I would like to have it on same line.
- Every link text is returned, so I get the text "Not important link" as well.
What is recommended way to get output like
/this/dir/1234 ; This should be 1234
/this/dir/1236 ; This should be 1236
Appreciate any feedback / tips.
edit:
The solution provided by Martin was 99% there. Newlines were not output, so I am using awk to replace a dummy text with newlines.
note : I am on windows.
xidel myhtml.htm -e "string-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), 'XXX')" | awk -F "XXX" "$1=$11" "OFS=n"
xquery xidel
I am trying to extract the link (href) and text inside the <a>
tag for a number of links in an html page.
I only want specific links, which I match by a substring.
Example of my html:
<a href="/this/dir/1234/">This should be 1234</a> some other html
<a href="/this/dir/1236/">This should be 1236</a> some other html
<a href="/about_us/">Not important link</a> some other html
I am using Xidel, which allows me to avoid regexp. It seems to be the simplest for the job.
What I have so far:
xidel -e "//a/(@href[contains(.,'/this/dir')],text())"
It basically works, but two issues remain:
- I get the data separated by linefeed. I would like to have it on same line.
- Every link text is returned, so I get the text "Not important link" as well.
What is recommended way to get output like
/this/dir/1234 ; This should be 1234
/this/dir/1236 ; This should be 1236
Appreciate any feedback / tips.
edit:
The solution provided by Martin was 99% there. Newlines were not output, so I am using awk to replace a dummy text with newlines.
note : I am on windows.
xidel myhtml.htm -e "string-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), 'XXX')" | awk -F "XXX" "$1=$11" "OFS=n"
xquery xidel
xquery xidel
edited Mar 12 at 20:10
Mr Lister
35.3k1077121
35.3k1077121
asked Mar 7 at 7:32
MyICQMyICQ
92
92
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
You can move the condition into a predicate e.g. //a[contains(@href, '/this/dir')]!(@href, string())
. As for the result format, what happens if you delegate all to XQuery with
string-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), '
')
Thank you Martin! That was 99% correct. See my edit to original question. I did not know about the predicate.
– MyICQ
Mar 7 at 14:22
The use of'
is use of XQuery syntax so if Xidel has any options to make sure the expression you pass in is evaluated as XQuery and not plain XPath then try that. Or use
'codepoints-to-string(10)
instead e.g.string-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), codepoints-to-string(10))
, that should go through as XPath.
– Martin Honnen
Mar 7 at 14:50
the codepoints-to-string(10) worked. You are brilliant. Thank you !
– MyICQ
Mar 7 at 15:05
@MartinHonnen, by putting the entire query insidestring-join()
you can expect the entire output to be on a single line. MyICQ likes to have every @href on a separate line, so instead//a[contains(@href,'/this/dir')]/join((@href,.),' ; ')
, or//a[contains(@href,'/this/dir')]/concat(@href,' ; ',.)
would be better.
– Reino
Mar 8 at 14:29
@Reino, can you cite anything from the XQuery spec or XQuery functions spec that supports your claim that usingstring-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), '
as I have done puts the entire output on a single line? Not sure where your expectations come from, I certainly don't share them. And I don't see why
')//a[contains(@href,'/this/dir')]/concat(@href,' ; ',.)
ensures output on separate lines, you construct a sequence of strings without defining any separator between them.
– Martin Honnen
Mar 8 at 15:12
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55038333%2fextract-both-href-and-text-on-same-line-using-xidel-specific-links-only%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can move the condition into a predicate e.g. //a[contains(@href, '/this/dir')]!(@href, string())
. As for the result format, what happens if you delegate all to XQuery with
string-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), '
')
Thank you Martin! That was 99% correct. See my edit to original question. I did not know about the predicate.
– MyICQ
Mar 7 at 14:22
The use of'
is use of XQuery syntax so if Xidel has any options to make sure the expression you pass in is evaluated as XQuery and not plain XPath then try that. Or use
'codepoints-to-string(10)
instead e.g.string-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), codepoints-to-string(10))
, that should go through as XPath.
– Martin Honnen
Mar 7 at 14:50
the codepoints-to-string(10) worked. You are brilliant. Thank you !
– MyICQ
Mar 7 at 15:05
@MartinHonnen, by putting the entire query insidestring-join()
you can expect the entire output to be on a single line. MyICQ likes to have every @href on a separate line, so instead//a[contains(@href,'/this/dir')]/join((@href,.),' ; ')
, or//a[contains(@href,'/this/dir')]/concat(@href,' ; ',.)
would be better.
– Reino
Mar 8 at 14:29
@Reino, can you cite anything from the XQuery spec or XQuery functions spec that supports your claim that usingstring-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), '
as I have done puts the entire output on a single line? Not sure where your expectations come from, I certainly don't share them. And I don't see why
')//a[contains(@href,'/this/dir')]/concat(@href,' ; ',.)
ensures output on separate lines, you construct a sequence of strings without defining any separator between them.
– Martin Honnen
Mar 8 at 15:12
|
show 1 more comment
You can move the condition into a predicate e.g. //a[contains(@href, '/this/dir')]!(@href, string())
. As for the result format, what happens if you delegate all to XQuery with
string-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), '
')
Thank you Martin! That was 99% correct. See my edit to original question. I did not know about the predicate.
– MyICQ
Mar 7 at 14:22
The use of'
is use of XQuery syntax so if Xidel has any options to make sure the expression you pass in is evaluated as XQuery and not plain XPath then try that. Or use
'codepoints-to-string(10)
instead e.g.string-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), codepoints-to-string(10))
, that should go through as XPath.
– Martin Honnen
Mar 7 at 14:50
the codepoints-to-string(10) worked. You are brilliant. Thank you !
– MyICQ
Mar 7 at 15:05
@MartinHonnen, by putting the entire query insidestring-join()
you can expect the entire output to be on a single line. MyICQ likes to have every @href on a separate line, so instead//a[contains(@href,'/this/dir')]/join((@href,.),' ; ')
, or//a[contains(@href,'/this/dir')]/concat(@href,' ; ',.)
would be better.
– Reino
Mar 8 at 14:29
@Reino, can you cite anything from the XQuery spec or XQuery functions spec that supports your claim that usingstring-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), '
as I have done puts the entire output on a single line? Not sure where your expectations come from, I certainly don't share them. And I don't see why
')//a[contains(@href,'/this/dir')]/concat(@href,' ; ',.)
ensures output on separate lines, you construct a sequence of strings without defining any separator between them.
– Martin Honnen
Mar 8 at 15:12
|
show 1 more comment
You can move the condition into a predicate e.g. //a[contains(@href, '/this/dir')]!(@href, string())
. As for the result format, what happens if you delegate all to XQuery with
string-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), '
')
You can move the condition into a predicate e.g. //a[contains(@href, '/this/dir')]!(@href, string())
. As for the result format, what happens if you delegate all to XQuery with
string-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), '
')
edited Mar 7 at 11:28
answered Mar 7 at 10:55
Martin HonnenMartin Honnen
113k66279
113k66279
Thank you Martin! That was 99% correct. See my edit to original question. I did not know about the predicate.
– MyICQ
Mar 7 at 14:22
The use of'
is use of XQuery syntax so if Xidel has any options to make sure the expression you pass in is evaluated as XQuery and not plain XPath then try that. Or use
'codepoints-to-string(10)
instead e.g.string-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), codepoints-to-string(10))
, that should go through as XPath.
– Martin Honnen
Mar 7 at 14:50
the codepoints-to-string(10) worked. You are brilliant. Thank you !
– MyICQ
Mar 7 at 15:05
@MartinHonnen, by putting the entire query insidestring-join()
you can expect the entire output to be on a single line. MyICQ likes to have every @href on a separate line, so instead//a[contains(@href,'/this/dir')]/join((@href,.),' ; ')
, or//a[contains(@href,'/this/dir')]/concat(@href,' ; ',.)
would be better.
– Reino
Mar 8 at 14:29
@Reino, can you cite anything from the XQuery spec or XQuery functions spec that supports your claim that usingstring-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), '
as I have done puts the entire output on a single line? Not sure where your expectations come from, I certainly don't share them. And I don't see why
')//a[contains(@href,'/this/dir')]/concat(@href,' ; ',.)
ensures output on separate lines, you construct a sequence of strings without defining any separator between them.
– Martin Honnen
Mar 8 at 15:12
|
show 1 more comment
Thank you Martin! That was 99% correct. See my edit to original question. I did not know about the predicate.
– MyICQ
Mar 7 at 14:22
The use of'
is use of XQuery syntax so if Xidel has any options to make sure the expression you pass in is evaluated as XQuery and not plain XPath then try that. Or use
'codepoints-to-string(10)
instead e.g.string-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), codepoints-to-string(10))
, that should go through as XPath.
– Martin Honnen
Mar 7 at 14:50
the codepoints-to-string(10) worked. You are brilliant. Thank you !
– MyICQ
Mar 7 at 15:05
@MartinHonnen, by putting the entire query insidestring-join()
you can expect the entire output to be on a single line. MyICQ likes to have every @href on a separate line, so instead//a[contains(@href,'/this/dir')]/join((@href,.),' ; ')
, or//a[contains(@href,'/this/dir')]/concat(@href,' ; ',.)
would be better.
– Reino
Mar 8 at 14:29
@Reino, can you cite anything from the XQuery spec or XQuery functions spec that supports your claim that usingstring-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), '
as I have done puts the entire output on a single line? Not sure where your expectations come from, I certainly don't share them. And I don't see why
')//a[contains(@href,'/this/dir')]/concat(@href,' ; ',.)
ensures output on separate lines, you construct a sequence of strings without defining any separator between them.
– Martin Honnen
Mar 8 at 15:12
Thank you Martin! That was 99% correct. See my edit to original question. I did not know about the predicate.
– MyICQ
Mar 7 at 14:22
Thank you Martin! That was 99% correct. See my edit to original question. I did not know about the predicate.
– MyICQ
Mar 7 at 14:22
The use of
'
'
is use of XQuery syntax so if Xidel has any options to make sure the expression you pass in is evaluated as XQuery and not plain XPath then try that. Or use codepoints-to-string(10)
instead e.g. string-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), codepoints-to-string(10))
, that should go through as XPath.– Martin Honnen
Mar 7 at 14:50
The use of
'
'
is use of XQuery syntax so if Xidel has any options to make sure the expression you pass in is evaluated as XQuery and not plain XPath then try that. Or use codepoints-to-string(10)
instead e.g. string-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), codepoints-to-string(10))
, that should go through as XPath.– Martin Honnen
Mar 7 at 14:50
the codepoints-to-string(10) worked. You are brilliant. Thank you !
– MyICQ
Mar 7 at 15:05
the codepoints-to-string(10) worked. You are brilliant. Thank you !
– MyICQ
Mar 7 at 15:05
@MartinHonnen, by putting the entire query inside
string-join()
you can expect the entire output to be on a single line. MyICQ likes to have every @href on a separate line, so instead //a[contains(@href,'/this/dir')]/join((@href,.),' ; ')
, or //a[contains(@href,'/this/dir')]/concat(@href,' ; ',.)
would be better.– Reino
Mar 8 at 14:29
@MartinHonnen, by putting the entire query inside
string-join()
you can expect the entire output to be on a single line. MyICQ likes to have every @href on a separate line, so instead //a[contains(@href,'/this/dir')]/join((@href,.),' ; ')
, or //a[contains(@href,'/this/dir')]/concat(@href,' ; ',.)
would be better.– Reino
Mar 8 at 14:29
@Reino, can you cite anything from the XQuery spec or XQuery functions spec that supports your claim that using
string-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), '
')
as I have done puts the entire output on a single line? Not sure where your expectations come from, I certainly don't share them. And I don't see why //a[contains(@href,'/this/dir')]/concat(@href,' ; ',.)
ensures output on separate lines, you construct a sequence of strings without defining any separator between them.– Martin Honnen
Mar 8 at 15:12
@Reino, can you cite anything from the XQuery spec or XQuery functions spec that supports your claim that using
string-join(//a[contains(@href, '/this/dir')]!(@href || ' ; ' || .), '
')
as I have done puts the entire output on a single line? Not sure where your expectations come from, I certainly don't share them. And I don't see why //a[contains(@href,'/this/dir')]/concat(@href,' ; ',.)
ensures output on separate lines, you construct a sequence of strings without defining any separator between them.– Martin Honnen
Mar 8 at 15:12
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55038333%2fextract-both-href-and-text-on-same-line-using-xidel-specific-links-only%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown