Skip to content

fityannugroho/idn-area-extractor

Repository files navigation

idn-area-extractor

Extract Indonesia area data from the raw sources to CSV.

This package was developed to ease and speed up the data processing stage of idn-area-data.

Prerequisite

  • Node.js 18 or later
  • npm 9 or later

Installation

idn-area-extractor can be installed in the global scope (if you'd like to have it available and use it on the whole system) or locally for a specific package (especially if you'd like to use it programmatically):

Install globally:

npm install -g idn-area-extractor

Install locally:

npm install idn-area-extractor

Usage

Access the manual with idnxtr --help command:

USAGE
  $ idnxtr [regencies|districts|islands|villages] </path/to/file.[pdf|txt]> [OPTIONS]

OPTIONS
  -c, --compare             Compare the extracted data with the latest data
  -d, --destination=<path>  Set the folder destination. Default: current working directory
  -o, --output=<filename>   Set a specific output file name without the file extension
  -r, --range=<range>       Extract specific PDF pages (e.g. 1-2,5,7-10)
  -R, --save-raw            Save the extracted raw data into .txt file (only works with PDF data)
      --silent              Disable all logs

EXAMPLE
  $ idnxtr
  $ idnxtr regencies ~/data/regencies.pdf
  $ idnxtr regencies ~/data/regencies.pdf -r 1-2,5,7-10 -R
  $ idnxtr regencies ~/data/regencies.pdf --range 1-2,5,7-10 --save-raw
  $ idnxtr regencies ~/data/raw-regencies.txt

Interactive UI

Run idnxtr without arguments to launch the interactive UI that guides you to extracting the data.

Screenshot UI

API

idn-area-extractor can be used programmatically by using the API documented below:

idnxtr(options)

Extract the data from the PDF file.

options

Required:

  • options.data: Which kind of data should be extracted, either 'regencies', 'districts', 'islands', or 'villages'.
  • options.filePath: The path to the PDF or TXT file.

Optional:

  • options.compare: Compare the extracted data with the latest data. Default: false.
  • options.destination: The destination folder to save the CSV file. Default: process.cwd().
  • options.output: The output file name without the file extension. Default: options.data.
  • options.range: Extract specific PDF pages (e.g. 1-2,5,7-10). If not set, all pages will extracted.
  • options.saveRaw: Save the extracted raw data into .txt file (only works with PDF data). Default: false.
  • options.silent: Disable all logs. Default: false.

Example

ESM

// .js
import idnxtr from 'idn-area-extractor';

(async () => {
  await idnxtr({
    data: 'regencies',
    filePath: '/path/to/regencies.pdf',
    compare: true,
    destination: '/path/to/destination',
    output: 'regencies',
    range: '1-2,5,7-10',
    saveRaw: true,
    silent: true,
  });
})();

CommonJS

For CommonJS user, you need to use dynamic import like this:

// .js
(async () => {
  const {default: idnxtr} = await import('idn-area-extractor')

  await idnxtr({
    // options...
  })
})()

Problem Reporting

We have different channels for each problem, please use them by following these conditions :

Reporting a Bug

To report a bug, please open a new issue following the guide.

Requesting a New Feature

If you have a new feature in mind, please open a new issue following the guide.

Asking a Question

If you have a question, you can search for answers in the GitHub Discussions Q&A category. If you don't find a relevant discussion already, you can open a new discussion.

Support This Project

Give a ⭐️ if this project helped you!

You can support this project by donating via GitHub Sponsor, Trakteer, or Saweria.