Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement import from .csv files #25

Open
cecilia-donnelly opened this issue Aug 4, 2015 · 9 comments
Open

Implement import from .csv files #25

cecilia-donnelly opened this issue Aug 4, 2015 · 9 comments
Assignees

Comments

@cecilia-donnelly
Copy link

See the "examples" directory on the 14-export-clients branch. For now, we'll show an "import" button that brings up a prompt that asks a user to choose a .csv file to import. We'll POST the rows from that file via the API. For each row, we'll check to make sure that a record with that SSN does not already exist. If one does, we will not POST that row but will instead store it in a list of (possible) duplicates to be shown to the user after the import.

Later, we'll probably have smarter behavior for duplicate records.

@cecilia-donnelly
Copy link
Author

cecilia-donnelly added a commit that referenced this issue Aug 4, 2015
Add a conditional to the controller to handle POST requests.  For now,
this reads the file into a variable and then sends it back to the
browser.  We'll do something different in the next commit.  Add a POST
route pointing to the controller.  Add a form to the front page to
receive the file for import (I'll have to move this, since it's right
above the search bar now).
cecilia-donnelly added a commit that referenced this issue Aug 4, 2015
Instead of just submitting a form, I'm now sending the file to be
imported via AJAX so that we can interpret the data once it is returned
from the server, and send our POSTs via the API (for non-duplicate
rows).

Note that future commits will assume that we're importing a csv of a
format like "Client.csv" in the examples directory.  Does that sound
right?
@cecilia-donnelly
Copy link
Author

Or probably this: https://github.com/mholt/PapaParse

The jquery-csv project fails in Firefox with "csv.replace is not a function" error. Somebody opened a ticket for it here, but there's no response yet, so I think we're out of luck there.

cecilia-donnelly added a commit that referenced this issue Aug 4, 2015
As of this commit, the imported file is sitting in a JS array in the
browser.  We still need to check the SSN against the API for duplicates,
and to POST each non-duplicate record via the API.

I've done some of the parsing inside the controller, which may be bad
form.  I just split the csv so that the headers and footers aren't sent
back to the browser.  Then, in the client-client (as we're calling it),
I use the MIT-licensed "PapaParse" library to transform the csv lines
into a JS array.

I also fixed a call in index.jade so that it refers to PapaParse instead
of to a different jquery library that I decided not to use.

I added papaparse.js so that we could use the library.  It has a license
header and I don't think we need any other files from that repo.  They
do offer a minified version, if we prefer.
cecilia-donnelly added a commit that referenced this issue Aug 5, 2015
Test whether a record to be imported has an SSN that matches an existing
record.  This commit adds several lines for interpreting the response
from the Node server.  Those may change.  I just tested with a file that
was exported from our db and PapaParse didn't split the lines into their
own arrays as it should, so it looked like the whole file was in one
array (which meant that the rest of the code did not work correctly).
Next commit will resolve that and possibly add the POST lines for new
data.  I need to turn the csv record into a correctly formatted Client,
before POSTing.
cecilia-donnelly added a commit that referenced this issue Aug 5, 2015
Simplify the controller's split of file contents away from header and
footer.  I think that separating on `\r\n` will generally work for files
we create, but I worry somewhat that other files will have `\r\n` on
every line of data.  Add a div to hold notifications of duplicates from
the import (this will look nicer in the next commit.  Consider it
something of a placeholder for now).  Track duplicates in a variable
instead of just outputting them to the console.

Insert the data from each (non-duplicate) file line into a client
object.  I'm a little worried that explicitly naming the column index
for each element is not a portable solution, *but* HMIS explicitly
details the correct format for a compliant CSV file, so maybe it's okay.
What do you think, @kfogel?  (I'm looking at lines 1019-1041 of
index.js, here).

POST the data!  It saves correctly in my DB.
@cecilia-donnelly
Copy link
Author

Import now works! The next step(s) are to make it look nicer:

  • Instead of two buttons (browse and import) just show one, which Does The Right Thing
  • Show duplicates in a more beautiful format

Note that I should also be checking whether the selected file is even a .csv, and probably doing more quality assurance checks inside the import function.

@cecilia-donnelly
Copy link
Author

Note that I should also more explicitly map the csv columns to their API counterparts, as @kfogel mentions in this comment: a7df443#commitcomment-12549220

cecilia-donnelly added a commit that referenced this issue Aug 5, 2015
Move the import button to the bottom of the page, with the other
buttons. Set "import" button to click the file input (the former
"Browse") and then submit the form.  Manage duplicate warning by placing
it in the "results" div for now.  @nttaylor points out that we could
show them a "Possible duplicates detected" link and then, if they
clicked that, give them a popup with more information.
@cecilia-donnelly
Copy link
Author

So, this definitely works with the "Client.csv" file that is generated by our export. That file matches the HUD CSV specification. I'm not sure what else to test against. It will break if the columns are not in exactly this order, which is what I'm referring to by saying that I need to "more explicitly map the csv columns to their API counterparts."

@cecilia-donnelly
Copy link
Author

It does seem to fail to detect the first duplicate, but maybe I'll open that as a separate issue.

cecilia-donnelly added a commit that referenced this issue Aug 6, 2015
Read the file in the browser, using the FileReader API.  Check if dates
are in Excel format and convert to YYYY-MM-DD if so (better method
coming soon).  Show duplicate warning -- presently there is an error
where the duplicate warning header may appear even when there are no
duplicates.  Fixing that is the next commit.

More soon to come: removing now-obsolete route and controllers, since we
are handling the file in the browser.  Checking to make sure that the
chosen file is in fact a csv.
cecilia-donnelly added a commit that referenced this issue Aug 6, 2015
Only show the "warning" header for duplicates if duplicates were
detected.
cecilia-donnelly added a commit that referenced this issue Aug 6, 2015
I added a new controller and route for uploads in commit b34aeba.
However, as of commit cc7fdb2, we're processing the file on the browser
side, so there's no longer any need for the "upload" route.  The file is
never uploaded.  This commit just removes that no-longer-needed code.
@cecilia-donnelly cecilia-donnelly self-assigned this Aug 24, 2015
cecilia-donnelly pushed a commit that referenced this issue Jul 26, 2016
Show any API errors returned by POST-ing records from the import.  Also
let the user know when items were imported correctly.

Unrelated, but also move the function that gets information about the
logged-in user so that it is only run on page load, not every time we
switch to the search page view.
@cecilia-donnelly
Copy link
Author

The import still fails silently if given something that isn't a CSV file, or is a CSV file in a different format from the one it expects. @kfogel, how high-priority is it to fix that?

@kfogel
Copy link
Member

kfogel commented Jul 26, 2016

Well, how easy is it to fix? :-) Do we already get an internal error that we're just not yet propagating up to user-visible land, or is the situation more complicated than that?

@cecilia-donnelly
Copy link
Author

@kfogel we do have an internal error, but it's something of an artifact of the fragility of the import process. Currently, if the "date" column isn't where we expect it to be then one of our functions complains, and the import fails -- silently, from the user's perspective. I'm looking into how to make this more robust, and will commit/comment shortly.

cecilia-donnelly pushed a commit that referenced this issue Aug 1, 2016
If the imported file is not a CSV in the correct format, tell the user
that.  If it is, then show the user the outcome for each line of the
imported file.  The data in each line will either successfully be
POSTed, will be detected as a duplicate, or will fail -- display
messages for each scenario.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants