On this page

Skip to content

My Experience Troubleshooting "TheArtOfDev.HtmlRenderer.PdfSharp"

I recently encountered a requirement where, due to licensing issues with the free version of "iTextSharp," I began searching for other free PDF libraries. Ultimately, I chose "PDFsharp." However, since "PDFsharp" does not provide functionality to convert HTML strings to PDF, I paired it with "TheArtOfDev.HtmlRenderer.PdfSharp." Although the last update for this package was "2015/5/6," the maintained version "HtmlRenderer.PdfSharp.NetStandard2" is developed using .NET Standard 2.1 and is not applicable to current .NET Framework projects. This article will not delve into how to use these two packages, as I only use the HTML-to-PDF functionality; instead, it focuses on listing the problems one might encounter.

Package Introduction

  • PDFsharp: A PDF library. I didn't make a typo in the capitalization; that's just how it's named.
  • HtmlRenderer.Core: This package is used to parse HTML elements and uses custom classes to store relevant information.
  • HtmlRenderer.PdfSharp: Converts the objects defined by "HtmlRenderer.Core" into the information required by "PDFsharp" to generate a PDF.

Troubleshooting Journey

Attempting to Run

First, I followed the example provided in the README.md to run the code:

csharp
PdfDocument pdf = PdfGenerator.GeneratePdf("<p><h1>Hello World</h1>This is html rendered text</p>", PageSize.A4);
pdf.Save("document.pdf");

A problem occurred at the very first step... the following Exception was thrown:

exception error message

It felt like a version issue, so I tried downgrading the "TheArtOfDev.HtmlRenderer.PdfSharp" version, but the problem persisted. Since I couldn't find the issue in the source code, I had to download it and debug it to find the cause. However, after reconfiguring the references, I found it wouldn't compile properly. After checking the project file (.csproj), I discovered that compared to standard packages, it had the following extra settings:

xml

  <Import Project="$(SolutionDir)\.nuget\NuGet.targets" Condition="Exists('$(SolutionDir)\.nuget\NuGet.targets')" />
  <Target Name="EnsureNuGetPackageBuildImports" BeforeTargets="PrepareForBuild">
    <PropertyGroup>
      <ErrorText>This project references NuGet package(s) that are missing on this computer. Enable NuGet Package Restore to download them.  For more information, see http://go.microsoft.com/fwlink/?LinkID=322105. The missing file is {0}.</ErrorText>
    </PropertyGroup>
    <Error Condition="!Exists('$(SolutionDir)\.nuget\NuGet.targets')" Text="$([System.String]::Format('$(ErrorText)', '$(SolutionDir)\.nuget\NuGet.targets'))" />
  </Target>

After removing the settings related to NuGet restoration and rebuilding, the PDF was generated normally. This meant the code itself wasn't the problem; it really was a version configuration issue. Later, someone reminded me to try downgrading the "PDFsharp" version. After downgrading it to "1.32.3057," I finally succeeded in taking the first step =.=a.

Chinese Fonts

When attempting to output Chinese, you may encounter issues where it doesn't display correctly, as shown below:

csharp
PdfDocument pdf = PdfGenerator.GeneratePdf("<p>中文</p>", PageSize.A4);
pdf.Save("document.pdf");

pdf generation issue 1

The solution is to add a CSS font-family in the HTML and set it to 標楷體 (BiauKai), which will allow it to display correctly:

csharp
string html = @"
    <style>
        * {
            font-family: 標楷體;
        }
    </style>
    <p>中文</p>
";
PdfDocument pdf = PdfGenerator.GeneratePdf(html, PageSize.A4);
pdf.Save("document.pdf");

pdf generation issue 2

TIP

  • Among the default fonts, only "標楷體" and "Malgun Gothic" can display Chinese correctly.
  • In a Windows English environment, please change "標楷體" to "DFKai-SB". Conversely, do not use "DFKai-SB" in a Chinese environment.

Truncated Tables

In the following scenarios, the bottom border of a <table> might be truncated:

  1. Using nested <table> elements.
  2. The <td> element of the inner <table> has a specified height attribute that exceeds a certain number, while the height of the elements inside the <td> does not exceed the <td> itself.
  3. The inner <table> has borders set. It is difficult to determine if there is a problem when borders are not set because truncation cannot be identified through borders, but I have not yet observed cases where the content inside the <td> is truncated.

The following is a demonstration case:

csharp
string rowsHtml = "";
for (int i = 0; i < 5; i++) {
    rowsHtml += $"<tr><td>Text{i}0</td><td>Text{i}1</td><td>Text{i}2</td></tr>";
}

string html = $@"
    <style>
        table.list, table.list td {{
            border-collapse: collapse;
            border: 1xp solid black;
        }}
        td {{
            height: 30px;
        }}
    </style>
    <html>
    <body>
        <table>
            <tr>
                <td>
                    <table class=""list"">
                        {rowsHtml}
                    </table>
                </td>
            </td>
        </table>
    </body>
    </html>
";

PdfDocument pdf = PdfGenerator.GeneratePdf(html, PageSize.A4);
pdf.Save("document.pdf");

pdf generation issue 3

I suspect the problem lies in an incorrect calculation of the outer <td> height, causing the inner <table> to exceed the bounds of the outer <td>. I haven't found a proper solution yet, so I can only add an extra <tr> at the end and set the height attribute of the <td> underneath to 0px initially. If there is still a problem with the border being truncated, I gradually increase the value of the height attribute in increments of 0.5px until I find a value that displays correctly without adding extra border height. However, there is a chance that the top border of the inner <table> will be truncated. If the top border is truncated, I similarly add an extra <tr> at the beginning and handle it in the same way. Here is an example:

csharp
string rowsHtml = "";
for (int i = 0; i < 5; i++) {
    rowsHtml += $"<tr><td>Text{i}0</td><td>Text{i}1</td><td>Text{i}2</td></tr>";
}

string html = $@"
    <style>
        table.list, table.list td {{
            border-collapse: collapse;
            border: 1xp solid black;
        }}
        td {{
            height: 30px;
        }}
    </style>
    <html>
    <body>
        <table>
            <tr>
                <td>
                    <table class=""list"">
                        {rowsHtml}
                        <!-- Extra tr added -->
                        <tr>
                            <td colspan=""3"" style=""height: 9px;""></td>
                        <tr>
                    </table>
                </td>
            </td>
        </table>
    </body>
    </html>
";

PdfDocument pdf = PdfGenerator.GeneratePdf(html, PageSize.A4);
pdf.Save("document.pdf");

pdf generation issue 4

Final Thoughts

Regarding the functionality of converting HTML to PDF, the styling in "TheArtOfDev.HtmlRenderer.PdfSharp" is more complete than in "iTextSharp." During the process of switching libraries, I frequently found that the styling performance was not as expected. This was because some CSS styles that didn't work in "iTextSharp" took effect in "TheArtOfDev.HtmlRenderer.PdfSharp," thereby affecting the parts I was currently working on.

It should be noted that in cases where the HTML structure is non-standard, determining which rendering result is closer to the browser's effect is relatively complex. For example, in the case of placing an <hr /> inside a <td> element, current versions of Chrome accept this structure, and the "iTextSharp" rendering result shows a horizontal line inside the <td>. When using "TheArtOfDev.HtmlRenderer.PdfSharp," the display position of the horizontal line might shift.

As for "HtmlRenderer.PdfSharp.NetStandard2," since I haven't used it, I won't offer an opinion here.

Change Log

  • 2023-12-05 Initial document created.