Skip to content
This repository has been archived by the owner on May 3, 2024. It is now read-only.

Commit

Permalink
Merge pull request #45 from alfredbaudisch/feature_generate-post-summ…
Browse files Browse the repository at this point in the history
…ary-description

Generate post summary/description if a custom summary is not provided (useful for SEO)
  • Loading branch information
alfredbaudisch authored Sep 25, 2021
2 parents 4a2f987 + e0bbef3 commit 78526df
Show file tree
Hide file tree
Showing 9 changed files with 123 additions and 12 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ See PardallMarkdown in action and learn how to use it by following this video:
Add dependency and application into your `mix.exs`:
```elixir
defp deps do
[{:pardall_markdown, "~> 0.3.2"} ...]
[{:pardall_markdown, "~> 0.3.3"} ...]
end

def application do
Expand Down Expand Up @@ -208,7 +208,7 @@ The following configuration properties are available (all optional):
- `title`: the post title. If not provided, a title will be generated from the post slug.
- `date`: the date or date-time to be considered for the post, string, ISO format. If not provided, the file modification date will be considered as the post date.
- `published`: a post without `published: true` set will be considered draft. The default can be inverted when the configuration `:is_content_draft_by_default` is set to `false`, this way, posts will always be considered as published, unless they contain: `published: false`.
- `summary`: post description or short content.
- `summary`: post description or short content. If `summary` is not provided, a summary will be generated from the Post's content/body.
- `position`: if the post topmost taxonomy has a `:sort_by` rule set to `:position`, this is the value that will be used to sort the post (see below).
- `slug`: override the post slug. As seem above, by default, slugs are generated from the file names and are the main, unique identifier of posts.
- If you override the slug with this property, make sure to put the full path, prepended by a slash, example: `slug: "/my/custom/slug"`.
Expand Down
57 changes: 57 additions & 0 deletions lib/pardall_markdown/content/html_utils.ex
Original file line number Diff line number Diff line change
@@ -1,6 +1,63 @@
defmodule PardallMarkdown.Content.HtmlUtils do
alias PardallMarkdown.Content.Utils

def generate_summary_from_html(html, expected_length \\ 157)
def generate_summary_from_html(html, _) when html == nil or html == "", do: nil

@doc """
Extract text from paragraphs `</p>` of a HTML `html` string,
and assemble a string up until it reaches `expected_length` length.
If the generated string length matches `expected_length`, an ellipsis
will be appended to it. If the generated string is smaller than `expected_length`,
then no ellipsis is added.
If no text could be extracted from the input html, returns nil.
## Examples
iex> PardallMarkdown.Content.HtmlUtils.generate_summary_from_html("<h1>Post Title</h1><main><article><div><p>So, <a href='link'>a description</a> will be generated from it. Even a <span>nested span</span>.</p></div></article></main><p>As you can see, this a long paragraph outside.</p>This is <a name='anchor'>an anchor</a>.")
"So, a description will be generated from it. Even a nested span. As you can see, this a long paragraph outside."
iex> PardallMarkdown.Content.HtmlUtils.generate_summary_from_html("<h1>Post Title</h1><main><article><div><p>So, <a href='link'>a description</a> will be generated from it. Even a <span>nested span</span>.</p><p>Another paragraph?</p><p>Another paragraph 2?</p><p>Another paragraph 3?</p><p>As you can see, this a very long paragraph. As you can see, this a very long paragraph.</p></div></article></main>")
"So, a description will be generated from it. Even a nested span. Another paragraph? Another paragraph 2? Another paragraph 3? As you can see, this a very long..."
"""
def generate_summary_from_html(html, expected_length) do
document = Floki.parse_fragment!(html)

Floki.find(document, "p")
|> Enum.reduce("", fn
{"p", _, children}, "" ->
truncate(String.trim(children |> Floki.text()), expected_length)

{"p", _, children}, final ->
if String.length(final) < expected_length do
truncate(final <> " " <> String.trim(children |> Floki.text()), expected_length)
else
final
end

_, final -> final
end)
|> trim_and_maybe_ellipsis(expected_length)
end

defp truncate(string, length) do
if String.length(string) <= length do
string
else
String.slice(string, 0..length)
end
end

defp trim_and_maybe_ellipsis(string, _)
when string == "" or is_nil(string), do: nil
defp trim_and_maybe_ellipsis(string, expected_length) do
string = String.trim(string)
if String.length(string) < expected_length,
do: string, else: string <> "..."
end

def convert_internal_links_to_live_links(html) do
{updated_tree, _} =
Floki.parse_fragment!(html)
Expand Down
10 changes: 5 additions & 5 deletions lib/pardall_markdown/file_parser.ex
Original file line number Diff line number Diff line change
Expand Up @@ -55,14 +55,14 @@ defmodule PardallMarkdown.FileParser do
with {:ok, raw_content} <- File.read(path),
{:ok, attrs, body} <- parse_contents(path, raw_content, is_index?),
{:ok, body_html, _} <- markdown_to_html(body),
{:ok, summary_html, _} <- maybe_summary_to_html(attrs),
{:ok, summary} <- get_summary(attrs, body_html),
{:ok, date} <- parse_or_get_date(attrs, path) do
attrs =
attrs
|> maybe_extract_and_put_slug(path)
|> extract_and_put_categories(path)
|> maybe_put_title(path, is_index?)
|> Map.put(:summary, summary_html)
|> Map.put(:summary, summary)
|> Map.put(:date, date)
|> Map.put(:is_index, is_index?)

Expand Down Expand Up @@ -106,10 +106,10 @@ defmodule PardallMarkdown.FileParser do
])
end

defp maybe_summary_to_html(%{summary: summary}) when is_binary(summary) and summary != "",
do: summary |> markdown_to_html()
defp get_summary(%{summary: summary}, _) when is_binary(summary) and summary != "",
do: {:ok, summary}

defp maybe_summary_to_html(_), do: {:ok, nil, :ignore}
defp get_summary(_, body_html), do: {:ok, generate_summary_from_html(body_html)}

defp markdown_to_html(content), do: content |> Earmark.as_html(escape: false)

Expand Down
2 changes: 1 addition & 1 deletion lib/pardall_markdown/repository.ex
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ defmodule PardallMarkdown.Repository do
if slug not found.
"""
def get_by_slug!(slug) do
get_by_slug(slug) || raise PardallMarkdown.Content.NotFoundError, "Page not found: #{slug}"
get_by_slug(slug) || raise PardallMarkdown.Content.NotFoundError, "Post not found: #{slug}"
end

def push_post(path, %{slug: slug, is_index: is_index?} = attrs, content, _type \\ :post) do
Expand Down
2 changes: 1 addition & 1 deletion mix.exs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ defmodule PardallMarkdown.MixProject do
use Mix.Project

@url "https://github.com/alfredbaudisch/pardall_markdown"
@version "0.3.2"
@version "0.3.3"

def project do
[
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@
}
---

Do not delete Blender's Default Cube!
Do not delete the Default Cube!
3 changes: 2 additions & 1 deletion test/content/blog/dailies/first-day.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
%{
title: "This is the beginning of the project PardallMarkdown!",
date: "2020-08-30",
published: true
published: true,
summary: "Custom post summary"
}
---

Expand Down
42 changes: 42 additions & 0 deletions test/pardall_markdown/html_test.exs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,48 @@ defmodule PardallMarkdown.HtmlTest do
use ExUnit.Case, async: true
alias PardallMarkdown.Content.HtmlUtils

@moduletag :html_utils
doctest(PardallMarkdown.Content.HtmlUtils)

@tag :post_summary
test "generate post summary" do
html = ~S"""
<h1>Post Title</h1>
<main>
<article>
<div>
<p>So, <a href="link">a description</a> will be generated from it. Even a <span>nested span</span>.</p>
<p>Another paragraph?</p>
<p>Another paragraph 2?</p>
<p>Another paragraph 3?</p>
<p>As you can see, this a very long paragraph. As you can see, this a very long paragraph. As you can see, this a very long paragraph. As you can see, this a very long paragraph. As you can see, this a very long paragraph. As you can see, this a very long paragraph. As you can see, this a very long paragraph. As you can see, this a very long paragraph. </p>
</div>
</article>
</main>
<p>As you can see, this a paragraph outside.</p>
This is <a name="anchor">an anchor</a>.
"""

assert HtmlUtils.generate_summary_from_html(html) == "So, a description will be generated from it. Even a nested span. Another paragraph? Another paragraph 2? Another paragraph 3? As you can see, this a very long..."

html = ~S"""
<h1>Post Title</h1>
<main><article><div><p>So, <a href="link">a description</a> will be generated from it. Even a <span>nested span</span>.</p></div></article></main>
<p>As you can see, this a long paragraph outside.</p>This is <a name="anchor">an anchor</a>.
"""

assert HtmlUtils.generate_summary_from_html(html) == "So, a description will be generated from it. Even a nested span. As you can see, this a long paragraph outside."

html = "<p>Do not delete Blender's Default Cube!</p>"

assert HtmlUtils.generate_summary_from_html(html) == "Do not delete Blender's Default Cube!"
end

test "make internal <a/> links as live links" do
html = ~S"""
This <a href="docs">is</a> <a href="/blog" class="foo" id="boo">a link</a> to <a href="../../wiki">an</a> internal <a href="v1.0_release">post</a>.
Expand Down
13 changes: 12 additions & 1 deletion test/pardall_markdown/repository_test.exs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,18 @@ defmodule PardallMarkdown.RepositoryTest do
setup do
Application.ensure_all_started(:pardall_markdown)
# wait the Markdown content to be parsed and built
Process.sleep(100)
Process.sleep(300)
end

@tag :post_summary
test "custom post summary and generated post summary" do
# Custom
post = Repository.get_by_slug!("/blog/dailies/first-day")
assert post.summary == "Custom post summary"

# Generated
post = Repository.get_by_slug!("/blog/dailies/3d/blender/default-cube-not-deleted")
assert post.summary == "Do not delete the Default Cube!"
end

# still not accounting for per-folder indexing
Expand Down

0 comments on commit 78526df

Please sign in to comment.