Skip to content
Turan Furkan Topak edited this page Oct 9, 2020 · 5 revisions

Composer

Add PDFParser to your composer.json file :

{
    "require": {
        "smalot/pdfparser": "*"
    }
}

Now ask for composer to download the bundle by running the command:

$ composer update smalot/pdfparser

As standalone library

First of all, download the library from Github by choosing a specific release or directly the master.

Once done, unzip it and run the following command line using composer.

$ composer update

This command will download any dependencies (Atoum library) and create the 'autoload.php' file.

Now create a new file with this content, in the same folder :

<?php
 
// Include 'Composer' autoloader.
include 'vendor/autoload.php';
 
// Your code
// ...
 
?>

Without Composer

Last checked in 2020 Aug

Updated file: vendor-autoload.zip - See #117 (comment)

The ../vendor/autoload.php gets generated when we use composer and we include it in our scripts for PdfParser access. If we wish to freeze our install and manage it without using Composer, this said file can be created to have the following:

<?php
/**
 * this file acts as vendor/autoload.php
 */

/*
Using PDFParser without Composer
Folder structure
================
webroot
  pdfdemos
    INV001.pdf # test PDF file to extract text from for demo
    test.php # our operational demo file
  vendor
    autoload.php
    smalot
      pdfparser # unpack from git master https://github.com/smalot/pdfparser/archive/master.zip release is 0.9.25 dated 2015-09-15
        docs # optional
        samples # optional
        src
          Smalot
            PdfParser
*/

$prerequisites = array();

/**
 * TODO: ADAPT THIS PATH TO pdfparser
 */ 
$pdfparser = '/host/path/to/pdfparser';

$prerequisites['pdfparser'] = array (
    $pdfparser.'/Parser.php',
    $pdfparser.'/Document.php',
    $pdfparser.'/Header.php',
    $pdfparser.'/PDFObject.php',
    $pdfparser.'/Element.php',
    $pdfparser.'/Encoding.php',
    $pdfparser.'/Font.php',
    $pdfparser.'/Page.php',
    $pdfparser.'/Pages.php',
    $pdfparser.'/Element/ElementArray.php',
    $pdfparser.'/Element/ElementBoolean.php',
    $pdfparser.'/Element/ElementString.php',
    $pdfparser.'/Element/ElementDate.php',
    $pdfparser.'/Element/ElementHexa.php',
    $pdfparser.'/Element/ElementMissing.php',
    $pdfparser.'/Element/ElementName.php',
    $pdfparser.'/Element/ElementNull.php',
    $pdfparser.'/Element/ElementNumeric.php',
    $pdfparser.'/Element/ElementStruct.php',
    $pdfparser.'/Element/ElementXRef.php',
    $pdfparser.'/Encoding/StandardEncoding.php',
    $pdfparser.'/Encoding/ISOLatin1Encoding.php',
    $pdfparser.'/Encoding/ISOLatin9Encoding.php',
    $pdfparser.'/Encoding/MacRomanEncoding.php',
    $pdfparser.'/Encoding/WinAnsiEncoding.php',
    $pdfparser.'/Font/FontCIDFontType0.php',
    $pdfparser.'/Font/FontCIDFontType2.php',
    $pdfparser.'/Font/FontTrueType.php',
    $pdfparser.'/Font/FontType0.php',
    $pdfparser.'/Font/FontType1.php',
    $pdfparser.'/RawData/FilterHelper.php',
    $pdfparser.'/RawData/RawDataParser.php',
    $pdfparser.'/XObject/Form.php',
    $pdfparser.'/XObject/Image.php'
);

foreach($prerequisites as $project = $includes) {
    foreach($includes as $mapping = $file) {
      require_once $file;
    }
}

/*
// Information for comparison with composer
use Datamatrix;
use PDF417;
use QRcode;
use TCPDF;
use TCPDF2DBarcode;
use TCPDFBarcode;
use TCPDF_COLORS;
use TCPDF_FILTERS;
use TCPDF_FONTS;
use TCPDF_FONT_DATA;
use TCPDF_IMAGES;
use TCPDF_IMPORT;
use TCPDF_PARSER;
use TCPDF_STATIC;
*/

We can now create a test.php in the deployment folder (pdfdemos here) with:

<?php
include "../vendor/autoload.php";

$directory = getcwd();
$file = 'INV001.pdf';
$fullfile = $directory . '/' . $file;
$content = '';
$out = '';
$parser = new \Smalot\PdfParser\Parser();

$document = $parser-parseFile($fullfile);
$pages    = $document-getPages();
$page     = $pages[0];
$content  = $page-getText();
$out      = $content;
echo '<pre' . $out . '</pre';

EDIT 1 by k00ni: added updated PHP code from @ndmax. Also removed tecnickcom/tcpdf (not needed anymore) and added code highlighting.

Wiki author's note: This post is for the transfer of issue #117 to the wiki.

Clone this wiki locally