Skip to content

Floki.find doesn't support non-alpha characters #620

@rahultumpala

Description

@rahultumpala

Description

I'm using Floki to read a html document and extract some elements from it. The element has id that contains a forward slash. I used Floki.find with the selector #element/abc but this returns an empty list though an element with the same id is present in the document.

I used Floki.get_by_id with the id element/abc and this fetched the correct element.

To Reproduce

The following elixir script reproduces the issue.

Mix.install([
:floki
])

raw_html = """
<html lang="en">
  <head>
      <meta charset="UTF-8">
      <meta name="viewport" content="width=device-width, initial-scale=1.0">
      <title>Document</title>
  </head>
  <body>
          <p id="text"> text </p>
          <p id="hello/there"> hello/there </p>
          <p id="hello.there"> hello.there </p>
      </div>
  </body>
</html>

"""

{:ok, document} = Floki.parse_document(raw_html)


Floki.find(document, "#text") |> IO.inspect() #works
Floki.find(document, "#hello/there") |> IO.inspect() #does not work and is not documented
Floki.get_by_id(document, "hello/there") |> IO.inspect() #works
Floki.find(document, "#hello.there") |> IO.inspect() #does not work and is documented
Floki.find(document, "#hello\\.there") |> IO.inspect() # works and is documented
Floki.get_by_id(document, "hello.there") |> IO.inspect() # works

extra info: a debug log stating the forward slash token is not recognized.


19:59:17.209 [debug] Unknown token ~c"/". Ignoring.

Expected behavior

I would expect Floki.find and Floki.get_by_id to work the same way or add a note in the overview page of Floki doc that it isn't supported especially since Floki.get_by_id is not listed anywhere in the overview page.

or

We could add support to escape non-alpha characters in the selector passed to Floki.find. I am willing to contribute if you could guide me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions