Winnovative PDF to Text Converter for .NET

Winnovative PDF to Text Converter is a library for .NET that can be used in ASP.NET and MVC websites or in Windows Forms and WPF desktop applications to extract the text from existing PDF documents or to search text in a PDF document. After PDF to Text conversion you get a String object in memory.

The PDF to Text Converter does not depend on Adobe Reader or on any other third party tool. The main features of the Winnovative PDF to Text converter are:

Extract text from PDF documents
Search text in PDF documents
Save the extracted text using various text encodings
Case sensitive and whole word options for text search
Support for password protected PDF documents
Extract the text or search only a range of PDF pages
Extract text preserving the original PDF layout
Extract text in PDF reading order or PDF internal order
Get the number of pages in a PDF document
Get the PDF document title, keywords, author and description
Does not require Adobe Reader or other third party tools
Support for .NET 4.0 framework and later
Documentation and C# samples for all the features

You can find a complete description of the Winnovative PDF to Text Converter for .NET on product web page.

C# Code Sample to Convert PDF to Text

In the C# code is taken from the demo application that comes with software package. An object of the PdfToTextConverter class is created to extract the text from an existing PDF document. The extracted text is saved in a file on disk using the UTF-8 encoding.

private void btnConvertToText_Click(object sender, EventArgs e)
{
    if (pdfFileTextBox.Text.Trim().Equals(String.Empty))
    {
        MessageBox.Show("Please choose a PDF file to convert", "Choose PDF file", MessageBoxButtons.OK);
        return;
    }

    // the pdf file to convert
    string pdfFileName = pdfFileTextBox.Text.Trim();
            
    // start page number
    int startPageNumber = int.Parse(textBoxStartPage.Text.Trim());
    // end page number
    // when it is 0 the extraction will continue up to the end of document
    int endPageNumber = 0;
    if (textBoxEndPage.Text.Trim() != String.Empty)
        endPageNumber = int.Parse(textBoxEndPage.Text.Trim());

    // the output text layout
    TextLayout textLayout = SelectedTextLayout();

    // the output text encoding
    System.Text.Encoding textEncoding = SelectedTextEncoding();

    // page breaks
    bool markPageBreaks = cbMarkPageBreaks.Checked;

    string outputFileName = System.IO.Path.Combine(Application.StartupPath, @"DemoFiles\Output", 
            System.IO.Path.GetFileNameWithoutExtension(pdfFileName) + ".txt");

    // create the converter object and set the user options
    PdfToTextConverter pdfToTextConverter = new PdfToTextConverter();

    pdfToTextConverter.LicenseKey = "C4WUhJaRhJSEkoqUhJeVipWWip2dnZ2ElA==";

    pdfToTextConverter.Layout = textLayout;
    pdfToTextConverter.MarkPageBreaks = markPageBreaks;

    Cursor = Cursors.WaitCursor;
    try
    {
        // extract text from PDF
        string extractedText = pdfToTextConverter.ConvertToText(pdfFileName, startPageNumber, endPageNumber);

        // write the resulted string into an output file 
        // in the application directory using the selected encoding
        System.IO.File.WriteAllText(outputFileName, extractedText, textEncoding);
    }
    catch (Exception ex)
    {
        MessageBox.Show(String.Format("An error occurred. {0}", ex.Message), "Error");
        return;
    }
    finally
    {
        Cursor = Cursors.Arrow;
    }


    try
    {
        System.Diagnostics.Process.Start(outputFileName);
    }
    catch (Exception ex)
    {
        MessageBox.Show(ex.Message);
        return;
    }
}

Winnovative HTML to PDF Converter for .NET

Convert HTML to PDF in ASP.NET, MVC, Windows Forms, WPF with C# and VB.NET