-
Notifications
You must be signed in to change notification settings - Fork 41
Closed
Labels
Description
Bug Description
We are trying to use exclude_tags: to avoid including header and footer and the content is still insterted
To Reproduce
Steps to reproduce the behavior:
- Create config including:
domains:
- url: https://test.example.com
exclude_tags:
- address
- header
- create test page:
<html>
<head>
<title> Eclude tag test </title>
<meta name="keywords" content="excludetag"/>
</head>
<body>
<header>
HEADER TEXT Should not be indexed
</header>
<nav>
<ul>
<li><a href="/about">About</a></li>
<li><a href="/contact">Contact</a></li>
</ul>
</nav>
<h2 >tittle</h2>
BODY
<article>
<h1>Introduction to HTML</h1>
<p>HTML is a markup language that is used for creating web pages.</p>
</article>
<address>
main street 123 to be ignored too
</address>
<footer>
FOOOOOOOOOOOOOOOOOOOOOOTERRRRRRRRR
</footer>
</body>
</html>
- config is seen in logs:
domains=[{:url=>\"https://test.example.com\", :exclude_tags=>[\"address\", \"header\"]}];
- crawl and check in elastic:
"body": [
"HEADER TEXT Should not be indexed About Contact tittle BODY Introduction to HTML HTML is a markup language that is used for creating web pages. main street 123 to be ignored too FOOOOOOOOOOOOOOOOOOOOOOTERRRRRRRRR"
],
Expected behavior
Header and address text should be excluded from body text.
Additional context
using <address data-elastic-exclude> works OK
product_version
0.4.2
Reactions are currently unavailable