civis-name-parser

civis-name-parser is a Node.js command line app that is designed to be used in conjunction with the Civis Platform. It can be used to efficiently parse full names in to constituent parts using the another-name-parser package. (Details on the specifics of name parsing can be found in the package's listing.)

Given a table with a unique identifier and a full name column, the app exports the identifier and full name columns to an S3 bucket, streams the export--parsing names along the way--and imports the result in to a common table: parsed_names which is keyed off of the input's unique identifier and the ID of the first query job created via the Civis API.

Pre-Requisites

A Civis Platform API key
Node.js v4+
An Amazon Web Services S3 Credential loaded in to the Civis platform
A bucket readable and writeable by the loaded S3 credential

Install

$ npm install

Then, create the following table in a schema of your choice:

create table schema.parsed_names (
  query_job_id int not null,
  source_id varchar not null,
  full_name varchar(100),
  title varchar(10),
  first_name varchar(25),
  middle_name varchar(25),
  last_name varchar(30),
  suffix varchar(10),
  parsed_on timestamp default sysdate,
  primary key(query_job_id, source_id)
)
distkey(query_job_id)
compound sortkey(query_job_id, source_id);

Usage

If running locally, call npm start for usage instructions.

If running as a Civis Custom Script, use the following settings:

Setting	Value
Git Repo URL	`github.com/cleanchoice/civis-name-parser.git`
Git Repo Reference	`master`
Docker Image Name	`node:5.0.0`
Command	`bash /app/run_script.sh some_schema.names_to_parse name_of_unique_id_column name_of_full_name_column name-of-accessible-bucket` (where `name-of-accessible-bucket` is a bucket accessible to the S3 credential you've loaded in to Civis Platform)
Memory Usage	Standard
Credential	none needed

After you've run the job, you can inspect the results of this job (and all others stored in the parsed_names table) using:

select query_job_id, count(1) , max(parsed_on)
from some_schema.parsed_names
group by query_job_id
order by max(parsed_on) desc

To load the parsed data back in to your source table (assuming it has the appropriate columns), use a query similar to:

UPDATE some_schema.names_to_parse
SET title=p.title, first_name=p.first_name, middle_name=p.middle_name, last_name=p.last_name, suffix=p.suffix
FROM some_schema.names_to_parse s
JOIN some_schema.parsed_names p ON p.source_id=s.id AND p.query_job_id=1234

Run in Docker

If you'd like to simulate a Civis Custom Script, use this docker run command; it closely as possible mimics a standard Civis Custom Script configuration:

$ docker run -i -t --rm \
  -e "CIVIS_API_KEY=YOURapiKEYhere" -v $(pwd):/app -v /tmp:/data -w /app \
  --name civis-name-parser -m 512M node:5.0.0 \
  bash /app/run_script.sh  tableSchemaAndName idColumn nameColumn bucketName

Test

To run the test suite, you'll need Mocha:

npm install -g mocha

Then, run npm test.

TODO

Inspect tables for data types
Add a setup command that:
- Creates the destination table
- Loads S3 credentials to Civis
- Creates Custom Script
Break out name-parser.js streams in to separate components to make it easier to test

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.idea		.idea
bin		bin
lib		lib
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
run_script.sh		run_script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

civis-name-parser

Pre-Requisites

Install

Usage

Run in Docker

Test

TODO

About

Releases

Packages

Contributors 2

Languages

License

cleanchoice/civis-name-parser

Folders and files

Latest commit

History

Repository files navigation

civis-name-parser

Pre-Requisites

Install

Usage

Run in Docker

Test

TODO

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages