-
Notifications
You must be signed in to change notification settings - Fork 551
Installation
Add PDFParser to your composer.json file :
{
"require": {
"smalot/pdfparser": "*"
}
}
Now ask for composer to download the bundle by running the command:
$ composer update smalot/pdfparser
First of all, download the library from Github by choosing a specific release or directly the master.
Once done, unzip it and run the following command line using composer.
$ composer update
This command will download any dependencies (Atoum library) and create the 'autoload.php' file.
Now create a new file with this content, in the same folder :
<?php
// Include 'Composer' autoloader.
include 'vendor/autoload.php';
// Your code
// ...
?>
Last checked in 2020 Aug
Updated file: vendor-autoload.zip - See #117 (comment)
The ../vendor/autoload.php
gets generated when we use composer and we include it in our scripts for PdfParser access. If we wish to freeze our install and manage it without using Composer, this said file can be created to have the following:
<?php
/**
* this file acts as vendor/autoload.php
*/
/*
Using PDFParser without Composer
Folder structure
================
webroot
pdfdemos
INV001.pdf # test PDF file to extract text from for demo
test.php # our operational demo file
vendor
autoload.php
smalot
pdfparser # unpack from git master https://github.com/smalot/pdfparser/archive/master.zip release is 0.9.25 dated 2015-09-15
docs # optional
samples # optional
src
Smalot
PdfParser
*/
$prerequisites = array();
/**
* TODO: ADAPT THIS PATH TO pdfparser
*/
$pdfparser = '/host/path/to/pdfparser';
$prerequisites['pdfparser'] = array (
$pdfparser.'/Parser.php',
$pdfparser.'/Document.php',
$pdfparser.'/Header.php',
$pdfparser.'/PDFObject.php',
$pdfparser.'/Element.php',
$pdfparser.'/Encoding.php',
$pdfparser.'/Font.php',
$pdfparser.'/Page.php',
$pdfparser.'/Pages.php',
$pdfparser.'/Element/ElementArray.php',
$pdfparser.'/Element/ElementBoolean.php',
$pdfparser.'/Element/ElementString.php',
$pdfparser.'/Element/ElementDate.php',
$pdfparser.'/Element/ElementHexa.php',
$pdfparser.'/Element/ElementMissing.php',
$pdfparser.'/Element/ElementName.php',
$pdfparser.'/Element/ElementNull.php',
$pdfparser.'/Element/ElementNumeric.php',
$pdfparser.'/Element/ElementStruct.php',
$pdfparser.'/Element/ElementXRef.php',
$pdfparser.'/Encoding/StandardEncoding.php',
$pdfparser.'/Encoding/ISOLatin1Encoding.php',
$pdfparser.'/Encoding/ISOLatin9Encoding.php',
$pdfparser.'/Encoding/MacRomanEncoding.php',
$pdfparser.'/Encoding/WinAnsiEncoding.php',
$pdfparser.'/Font/FontCIDFontType0.php',
$pdfparser.'/Font/FontCIDFontType2.php',
$pdfparser.'/Font/FontTrueType.php',
$pdfparser.'/Font/FontType0.php',
$pdfparser.'/Font/FontType1.php',
$pdfparser.'/RawData/FilterHelper.php',
$pdfparser.'/RawData/RawDataParser.php',
$pdfparser.'/XObject/Form.php',
$pdfparser.'/XObject/Image.php'
);
foreach($prerequisites as $project = $includes) {
foreach($includes as $mapping = $file) {
require_once $file;
}
}
/*
// Information for comparison with composer
use Datamatrix;
use PDF417;
use QRcode;
use TCPDF;
use TCPDF2DBarcode;
use TCPDFBarcode;
use TCPDF_COLORS;
use TCPDF_FILTERS;
use TCPDF_FONTS;
use TCPDF_FONT_DATA;
use TCPDF_IMAGES;
use TCPDF_IMPORT;
use TCPDF_PARSER;
use TCPDF_STATIC;
*/
We can now create a test.php
in the deployment folder (pdfdemos
here) with:
<?php
include "../vendor/autoload.php";
$directory = getcwd();
$file = 'INV001.pdf';
$fullfile = $directory . '/' . $file;
$content = '';
$out = '';
$parser = new \Smalot\PdfParser\Parser();
$document = $parser-parseFile($fullfile);
$pages = $document-getPages();
$page = $pages[0];
$content = $page-getText();
$out = $content;
echo '<pre' . $out . '</pre';
EDIT 1 by k00ni: added updated PHP code from @ndmax. Also removed tecnickcom/tcpdf
(not needed anymore) and added code highlighting.
Wiki author's note: This post is for the transfer of issue #117 to the wiki.