Replies: 3 comments 1 reply
-
It's been so long since I've struggled with XPath. XPath under python makes me shudder. I would think that the second phrase: //a[contains(@href, "/company/")]//text()[normalize-space()] would return an array of two strings "CashModels", "Johnny Rapid". Is this what you're looking for? It seems to be the closest one to me to what you want. This is from me examining the source page on GEVI in Firefox. (And again, I'm far from an Xpath expert, so I may be all wet.) I would suggest debugging this the old fashioned way. Start with the first match parameter on left, and print out the element returned, then add the next search parameter and print, and so on to walk the chain until it fails to match what you expect. Tedious, but this is the only way I know. Sorry I can't be more help, but I have always struggled by XPath. Also be aware that browsers will insert missing tags and otherwise pretty print malformed html, so what one parses with XPAth may not always match what one would expect by looking at the browser. That one has thrown me for a loop in the paste. |
Beta Was this translation helpful? Give feedback.
-
That was what I was expecting...
I end up with only CashModels... twice...
This is rather irritating... I have a feeling that the Web browser auto
corrects bad html but this does not happen when accessing it in python...
I will try this with other titles and see...
Thanks for taking the time to look at it...
…On Wed, 22 Mar 2023, 12:23 fourstix, ***@***.***> wrote:
It's been so long since I've struggled with XPath. XPath under python
makes me shudder.
I would think that the second phrase: ***@***.***
<https://github.com/href>, "/company/")]//text()[normalize-space()] would
return an array of two strings "CashModels", "Johnny Rapid". Is this what
you're looking for? It seems to be the closest one to me to what you want.
This is from me examining the source page on GEVI in Firefox. (And again,
I'm far from an Xpath expert, so I may be all wet.)
I would suggest debugging this the old fashioned way. Start with the first
match parameter on left, and print out the element returned, then add the
next search parameter and print, and so on to walk the chain until it fails
to match what you expect. Tedious, but this is the only way I know.
Sorry I can't be more help, but I have always struggled by XPath. Also be
aware that browsers will insert missing tags and otherwise pretty print
malformed html, so what one parses with XPAth may not always match what one
would expect by looking at the browser. That one has thrown me for a loop
in the paste.
—
Reply to this email directly, view it on GitHub
<#228 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKI3AKIBXNQHJ6YSGJHMZBLW5LOKRANCNFSM6AAAAAAWDBBF5E>
.
You are receiving this because you authored the thread.Message ID:
***@***.***
.com>
|
Beta Was this translation helpful? Give feedback.
-
Try this XPath:
For consistency, ensure your Python scraping code handles dynamic content. Consider using Crawlbase for easier scraping. |
Beta Was this translation helpful? Give feedback.
-
For this film - https://gayeroticvideoindex.com/video/67509
I can not seem to get the Studio via xPath - the Studio been Johnny Rapid
I have tried: GEVI - init.py line 282
original - //a[contains(@href, "/company/")]/parent::td//text()[normalize-space()]
first change - //a[contains(@href, "/company/")]//text()[normalize-space()]
2nd Change - //td[@Class="ad"]//text()[normalize-space()]
All work when I use inspect in Chrome/Edge - but not through python
Anyone with any ideas?
Jason
Beta Was this translation helpful? Give feedback.
All reactions