Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xpath selector doesn't work on text() nodes #1777

Closed
8 tasks done
DetachHead opened this issue Oct 24, 2021 · 19 comments
Closed
8 tasks done

xpath selector doesn't work on text() nodes #1777

DetachHead opened this issue Oct 24, 2021 · 19 comments
Labels
duplicate This issue or pull request already exists

Comments

@DetachHead
Copy link

Prerequisites

I tried to reproduce the issue when...

  • uBO is the only extension
  • uBO with default lists/settings
  • using a new, unmodified browser profile

Description

xpath selectors don't seem to work on individual text nodes, for example when an element has multiple text nodes and you're trying to match one of them

A specific URL where the issue occurs

n/a but see steps to reproduce for a minimal html file to reproduce

Steps to Reproduce

  1. start a webserver with the following html:
    <body>
      <p/> 
    </body>
    <script>
      document.querySelector('p').appendChild(document.createTextNode('hello'));
      document.querySelector('p').appendChild(document.createTextNode('hello2'));
    </script>
  2. search for the following xpath in devtools to verify that it's correct: //p/text()[.='hello2']
    image
  3. attempt to create a filter rule using the same xpath: ##body:has(:xpath(//p/text()[.='hello2']))
    image

Expected behavior

body element is blocked as the //p/text()[.='hello2'] xpath should match a node within it

Actual behavior

no match

uBlock Origin version

1.38.6

Browser name and version

edge 94.0.992.50

Operating System and version

windows 10

@uBlock-user
Copy link
Contributor

Text nodes cannot be queried in uBO.

@DetachHead
Copy link
Author

why not?

@uBlock-user
Copy link
Contributor

uBlock-user commented Oct 24, 2021

Only HTML Elements are queried by uBO, non-HTML Elements are discarded, so not possible. Support for querying text nodes doesn't exist in Chromium/Firefox either.

@DetachHead - w3c/csswg-drafts#2208

@uBlock-user uBlock-user added the something to address something to address label Oct 24, 2021
@uBlock-user
Copy link
Contributor

Duplicate of #1654

@uBlock-user uBlock-user marked this as a duplicate of #1654 Oct 24, 2021
@uBlock-user uBlock-user added duplicate This issue or pull request already exists and removed something to address something to address labels Oct 24, 2021
@gorhill
Copy link
Member

gorhill commented Oct 24, 2021

why not?

The question should be the other way around, you need to provide cases where that would be genuinely useful in the real world.

@DetachHead
Copy link
Author

@gorhill facebook ads
image

image

@gorhill
Copy link
Member

gorhill commented Oct 24, 2021

Use :has-text()? Looks to me a typical case of XY problem.

@DetachHead
Copy link
Author

That's what I went with, but I don't like that approach because it seems less accurate, as in it seems more likely that some completely unrelated div that has the word sponsored in it might get filtered. that's why I try to avoid non-exact matching where I can

I also tried it with regex to get an exact match (:has-text(^Sponsored$)) but that didn't work either, presumably for the same reason

I was simply pointing out a feature that wasn't working as expected and provided a minimal example to reproduce it

@gorhill
Copy link
Member

gorhill commented Oct 24, 2021

I also tried it with regex to get an exact match (:has-text(^Sponsored$)) but that didn't work either

You are not reading the documentation properly. Read carefully, especially before opening invalid issues. If you read carefully -- and it's very clearly explained in the documentation -- you would have understood that what you want is :has-text(/^Sponsored$/).

@DetachHead
Copy link
Author

that was a typo

image

i get that you probably have to deal with dozens of invalid issues every day, i apologize if i didn't make my use case clear in the op but i'm just wondering if there's a way around situations like this, because from what i can tell exact matching with regex just doesn't work here

@gorhill
Copy link
Member

gorhill commented Oct 24, 2021

because from what i can tell exact matching with regex just doesn't work here

It appears you purposefully designed your regex to not match, while you avoided to reveal what is under that first p tag in the inspector so that we can point out to you why specifically your regex does not match.

@DetachHead
Copy link
Author

here is the full html

<body>
    <p/> 
    <p>this is not a Sponsored post. don't block me</p>
</body>
<script>
document.querySelector('p').appendChild(document.createTextNode('Sponsored'));
document.querySelector('p').appendChild(document.createTextNode('.'));
document.querySelector('p').appendChild(document.createTextNode('some other text'));
</script>

i didn't mean for the p to be collapsed in that screenshot but it's the same as what i had at the start i just changed the text.
image

@DetachHead
Copy link
Author

so it turns out the innerHTML has whitespace at the start which seems to be the cause. i can go ##p:has-text(/^\s+Sponsored/) which works

image

@gorhill
Copy link
Member

gorhill commented Oct 24, 2021

In any case, you can use ##body:xpath(//p[text()="Sponsored"]), this works already.

@u-RraaLL
Copy link
Contributor

image
image
That's what I went with, but I don't like that approach because it seems less accurate, as in it seems more likely that some completely unrelated div that has the word sponsored in it might get filtered. that's why I try to avoid non-exact matching where I can

Not if you narrow down the matches with with other attributes and ancestor nodes.

facebook.com##[role="feed"] span[id]>[role=button]:has-text(/^Sponsored|Paid for by/):upward([role="feed"]>div)

https://www.reddit.com/r/uBlockOrigin/wiki/solutions#wiki_facebook

@DetachHead
Copy link
Author

DetachHead :

Prerequisites

[x] I performed a cursory search of the issue tracker to avoid opening a duplicate issue

Really? Simply typing xpath in the tracker search field, and it returns the thread you duplicated (#1654) as the first result.

as it says i performed a cursory search. the word "xpath" wasn't in either the title or description of that issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

5 participants
@gorhill @uBlock-user @DetachHead @u-RraaLL and others