pdf_text_extraction 2.0.0
pdf_text_extraction: ^2.0.0 copied to clipboard
pdf_text_extraction
pdf_text_extraction #
Bindings and convenience wrappers around a fork of xpdf that enable extracting text and metadata from PDF files using Dart. The native bits are available for Linux and Windows only.
ℹ️ The project depends on a fork of xpdf maintained at https://github.com/insinfo/xpdf.
Platform requirements #
- Windows: ship the compiled
pdftotext.dllandTextExtraction.dllalongside your executable. - Linux: ensure the GNU C++ runtime (libstdc++6) is available before using the package.
sudo apt-get install libstdc++6
Getting started #
Add the package as a dependency and ensure the native libraries are available on the execution path or in the working directory. Two APIs are exposed:
- Low level bindings generated by
package:ffigen, mirroring the C API. - High level wrappers that take care of memory management and validation.
Low-level usage #
import 'dart:io' show Platform, Directory;
import 'package:ffi/ffi.dart';
import 'dart:ffi';
import 'package:path/path.dart' as path;
import 'package:pdf_text_extraction/pdf_text_extraction.dart';
import 'package:pdf_text_extraction/src/pdf_to_text_bindings.dart';
void logCallback(Pointer<Int8> msg) {
print(nativeInt8ToString(msg));
}
void main() {
var libraryPath = path.join(Directory.current.path, 'pdftotext.dll');
if (Platform.isLinux) {
libraryPath = path.join(Directory.current.path, 'pdftotext.so');
}
final dylib = DynamicLibrary.open(libraryPath);
var pdfLib = PDFToTextBindings(dylib);
//input pdf file
var uriPointer = stringToNativeInt8('pdf_file.pdf', allocator: calloc);
// output text character encoding
var textOutEnc = stringToNativeInt8('UTF-8', allocator: calloc);
var layout = stringToNativeInt8('rawOrder', allocator: calloc);
//function for print log info
var lgf = Pointer.fromFunction<Void Function(Pointer<Int8>)>(logCallback);
Pointer<Pointer<Int8>> textOut = calloc();
var result = pdfLib.extractText(
uriPointer, 1, 1, textOutEnc, layout, textOut, lgf, nullptr, nullptr);
var textResult = nativeInt8ToString(textOut.value);
calloc.free(uriPointer);
calloc.free(textOutEnc);
calloc.free(textOut);
if (result == 0) {
print('result ok: $textResult');
} else {
print('erro on text extraction');
}
}
High-level usage #
void main() {
final wrapper = PDFToTextWrapping();
final text = wrapper.extractText(
'pdf_file.pdf',
startPage: 1,
endPage: 1,
);
print('result: $text');
}
PDFToTextWrapping also exposes getPagesCount and reports any native errors
through the static lastError property.
Testing #
The repository ships with unit and integration tests. To use the integration
tests you must have a fixture PDF (for example 1417.pdf) and the native
libraries in the root of the project.
dart test
Regenerating bindings #
If you need to regenerate the FFI bindings after updating the native headers, run:
dart run ffigen --config ffigen.yaml