Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken Image conversion on compressed <figure> tag #466

Open
yagudaev opened this issue Jun 18, 2024 · 2 comments
Open

Broken Image conversion on compressed <figure> tag #466

yagudaev opened this issue Jun 18, 2024 · 2 comments

Comments

@yagudaev
Copy link

yagudaev commented Jun 18, 2024

First off, thank you so much for making this excellent library. It has been pretty much flawless 💜.

Found a bug when dealing with images from from Substack specifically.

The HTML is compressed and uses the <figure> tag.

It works fine if the HTML has white spacing, but as soon as that whitespace is removed it fails.

Codesandbox Example

CleanShot 2024-06-18 at 15 26 06@2x

It adds new lines, breaking the image markdown formatting.

@yagudaev
Copy link
Author

Quick workaround for now:

  let markdown = turndownService.turndown(html)

  if (html.match(/<figure.*?<\/figure>/gs)) {
    markdown = markdown
      .replace(/\[\s*\n*\s*!/gs, '[!')
      .replace(/\)\s*\n*\s*]/gs, ')]')
  }

@yagudaev
Copy link
Author

yagudaev commented Jul 18, 2024

Another edge case is blockqoutes like this

<blockquote><div class="captioned-image-container"><figure><a class="image-link is-viewable-img image2" target="_blank" href="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79300b7-ab3e-4f59-bbd3-cc334accceee_770x676.webp" data-component-name="Image2ToDOM" rel=""><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79300b7-ab3e-4f59-bbd3-cc334accceee_770x676.webp 424w, https://substackcdn.com/image/fetch/w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79300b7-ab3e-4f59-bbd3-cc334accceee_770x676.webp 848w, https://substackcdn.com/image/fetch/w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79300b7-ab3e-4f59-bbd3-cc334accceee_770x676.webp 1272w, https://substackcdn.com/image/fetch/w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79300b7-ab3e-4f59-bbd3-cc334accceee_770x676.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79300b7-ab3e-4f59-bbd3-cc334accceee_770x676.webp" width="770" height="676" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/e79300b7-ab3e-4f59-bbd3-cc334accceee_770x676.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:676,&quot;width&quot;:770,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;An online property listing for an empty lot for sale at the price of $2.5 million.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null}" class="sizing-normal" alt="An online property listing for an empty lot for sale at the price of $2.5 million." title="An online property listing for an empty lot for sale at the price of $2.5 million." srcset="https://substackcdn.com/image/fetch/w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79300b7-ab3e-4f59-bbd3-cc334accceee_770x676.webp 424w, https://substackcdn.com/image/fetch/w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79300b7-ab3e-4f59-bbd3-cc334accceee_770x676.webp 848w, https://substackcdn.com/image/fetch/w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79300b7-ab3e-4f59-bbd3-cc334accceee_770x676.webp 1272w, https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe79300b7-ab3e-4f59-bbd3-cc334accceee_770x676.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 " style="--darkreader-inline-stroke: currentColor;" data-darkreader-inline-stroke=""><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></div></div></a></figure></div></blockquote>

Updated the workaround to support blockquote and tooltips (title property)

  let markdown = turndownService.turndown(html)

  // fix for: https://github.com/mixmark-io/turndown/issues/466
  if (html.match(/<figure.*?<\/figure>/gs)) {
    markdown = markdown
      // revmoe newlines and whitespaces from images
      .replace(/\[\s*\n*\s*!/gs, '[!')
      .replace(/\)\s*\n*\s*]/gs, ')]')
      // remove tooltip from images
      .replace(/(!\[.*?\]\(.*?)( ".*?")(\))/gs, '$1$3')
  }

  // if figure is inside a blockquote do a similar fix to the above
  if (html.match(/<blockquote.*?<figure.*?<\/figure>.*?<\/blockquote>/gs)) {
    markdown = markdown.replace(
      />\s*\[\s*\n*\s*>\s*\n*>\s*!\[(.*?\))\s*\n*>\s*\n*>\s*]/gs,
      '> [![$1]',
    )
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant