Skip to content

Releases: palewire/django-calaccess-raw-data

v1.4.7

24 Dec 03:43
Compare
Choose a tag to compare
  • Fixed search field on admins for models with ForeignKey fields (#1498).

v1.4.6

23 Nov 22:59
Compare
Choose a tag to compare
  • Upgraded to latest version of django-postgres-copy
  • Small improvements to CAL-ACCESS field documentation
  • Small expansion of unittests
  • Clean up of migrations

v1.4.5

12 Sep 01:07
Compare
Choose a tag to compare
  • Copyediting of CAL-ACCESS form documentation

v1.4.1

29 Aug 14:22
Compare
Choose a tag to compare
  • Increase max character length on ReceivedFilingsCd fields.
  • Prevent unnecessary download of zip when resuming updatecalaccessrawdata.
  • Include release datetimes in log when downloadcalaccessrawdata and updatecalaccessrawdata versions are incompatible.

v1.4.0

23 Aug 18:18
Compare
Choose a tag to compare
  • Added zipping up and archiving of cleaned CSVs and error logs.
    • Added RawDataVersion.clean_zip_archive FileField.
    • Renamed RawDataVersion.zip_file_archive to RawDataVersion.download_zip_archive.
  • Smaller clean data files (removed unnecessary quote characters).
  • Improvements to tracking models
    • Replaced RawDataCommand model with datetime fields and related properties
      • Added to RawDataVersion instances
        • .update_start_datetime and .update_finish_datetime to store version's most recent update start and finish datetimes.
        • .update_completed returns True if most recent update to version started and finished.
        • .update_stalled returns True if most recent update to version started but did not finish.
        • .download_start_datetime and .download_finish_datetime to store version's most recent download start and finish datetimes.
        • .download_completed returns True if most recent download of version started and finished.
        • .download_stalled returns True if most recent download version started but did not finish.
        • .completed() QuerySet method to RawDataVersion to get all versions where the update completed.
      • Added to RawDataFile instances
        • .clean_start_datetime and .clean_finish_datetime to store raw file's most recent clean start and finish datetimes.
        • .load_start_datetime and .load_finish_datetime to store raw file's most recent load start and finish datetimes.
    • Expanded file size tracking
      • Renamed .size to .expected_size on RawDataVersion instances.
      • Added .download_zip_size to RawDataVersion instances.
      • Added .clean_zip_size to RawDataVersion instances.
      • Added methods to get a pretty version (e.g., 723M) of each file size field
        • Added to RawDataVersion instances
          • .pretty_expected_size()
          • .pretty_download_size()
          • .pretty_clean_size()
        • Added to RawDataFile instances
          • .pretty_download_file_size()
          • .pretty_clean_file_size()
      • Raise CommandError if completed download file size is not the same as expected size.
      • Added RawDataVersion properties to calculate file and record counts:
        • .download_file_count
        • .download_record_count
        • .clean_file_count
        • .clean_record_count
        • .error_file_count
        • .error_count
  • Added extractcalaccessrawfiles management command for unzipping and extracting raw data files from downloaded CAL-ACCESS database export.
    • Start and finish times stored in .start_extract_datetime and .finish_extract_datetime on RawDataVersion instances.
  • Bug fixes.
    • Indownloadcalaccessrawdata, skip download if the size of the local zip file is equal to or bigger than the expected zip file size.
    • Because the server hosting the ZIP doesn’t always provide the most up-to-date resource (as we have documented <https://github.com/california-civic-data-coalition/django-calaccess-raw-data/issues/1487>_), a CommandError will be raised under any of the following conditions:
      • If downloadcalaccessrawdata is not called from the command-line (presumably, then, it was called by updatecalaccessrawdata), and the RawDataVersion instance of the download command doesn't match the most recently started update.
      • If the ETag in the initial HEAD request made by downloadcalaccessrawdata does not match the ETag in the subsequent GET request.
      • If the actual size of the ZIP does not match the value of the Content-Length in the HEAD response.
    • If downloadcalaccessrawdata raises any of the above errors, updatecalaccessrawdata will wait five minutes and try again.
    • When archiving zips and files, open in binary ('rb') mode.
    • In cleancalaccessrawfile, fixed skipping of empty lines for Python 3.5.
  • Support for Django 1.10.

v1.3.0

07 Jul 22:04
Compare
Choose a tag to compare
  • Added error_count to output reportcalaccessrawdata and excluded any unspecified fields.
  • Added model property to RawDataFile that returns the CAL-ACCESS model class.

v1.2.0

07 Jul 17:10
Compare
Choose a tag to compare
  • Enhancements to tracking models
    • Zero pad datetime parts of the archive directory for better sorting
    • Calculate and store load_columns_count and load_records_count in the RawDataFile model.
    • Added error_count and error_log_archive fields to RawDataFile in order to track bad line parses during the cleancalaccessrawfile command.
    • Added download_file_size and clean_file_size fields to the RawDataFile model.
  • Enhancements to CAL-ACCESS models
    • Added "inactive" models group for CAL-ACCESS tables that are empty or apparently no longer in use.
    • Added a CalAccessMetaClass to automatically configure meta attributes common to all models.
    • Added a custom admin for every model.
    • Model verbose names are pre-fixed with model groups
    • Edits to model doc strings.
  • Enhancements to management commands
    • Added standard logging to the header, log and success methods.
    • Added a logger.info to the end of the updatecalaccessrawdata command to allow sending of emails when finished
    • Edits to command doc strings.
  • More tests
    • Test to confirm that any field included in a model's UNIQUE_KEY attribute actually exists on the model.
    • Test to confirm that every model has a custom admin.
    • Added flake8_docstrings plugin to the testing routine
    • New unittest modules providing 100% coverage to most of the app's components
  • Bug fixes
    • Fixed numbers in clean_records_count for the RawDataFile model.
    • Fixed line numbers logged in errors.csv files.
    • reportcalaccessrawdata now writes output to the data directory instead of REPO_DIR.
  • Distribution now packaged in wheel format

v1.1.0

28 Jun 18:23
Compare
Choose a tag to compare
  • When --noinput is invoked for updatecalaccessrawdata, exit if previously updated to the currently available version.
  • Enforce lowercase UNIQUE_KEY settings on models.
  • Removed unnecessary pretty_amount model methods as part of driving common.py models file test coverage up to 100%.

v1.0.2

08 Jun 17:53
Compare
Choose a tag to compare
  • Include migrations in official package.
  • Fix verbose_name for RawDataFile.clean_file_archive.

v1.0.0

27 May 18:35
Compare
Choose a tag to compare
  • Enhanced resume behavior
    • Allow previously interrupted updates to resume at any stage of the process: downloading, cleaning or loading.
    • Users will be prompted to resume (if possible). User may decline and re-start the entire update.
    • Removed --resume-download option from updatecalaccessrawdata and downloadcalaccessrawdata in favor of prompting the user to resume.
    • Removed --database option from all commands. Multi-database users are encouraged to use Django's database routers.
  • Raw data file archiving
    • Added CALACCESS_STORE_ARCHIVE setting. When enabled, management commands will save each version of the downloaded .zip file, the extracted .tsv files and cleaned .csv files to the Django project's MEDIA_ROOT.
    • Added FileFields to RawDataVersion and RawDataFile in order to link the database records with the archived files they reference.
  • Completed documentation of all 80 raw data models and 1,467 fields
    • Defined hundreds of choices for 182 look-up fields.
    • Published expanded Django project documentation. Added re-directs from old app-specific documentation.
    • Integrated references to official documents and filing forms into data models. PDFs on DocumentCloud.
  • Expanded unit testing of data model documentation
    • Wider scope of choice field testing.
    • Verify that each model has a UNIQUE_KEY attribute set.
    • Verify that each model has a document reference.
    • Verify that each choice field has a document reference.
    • Verify that each model with a form_type or form_id field (with a few exceptions) is linked to filing forms.
    • Introduced reportcalaccessrawdata command, which generates a report outlining the number / proportion of files / records cleaned and loaded.
  • Model Re-modeling:
    • Moved BallotMeasuresCd from other.py to campaign.py. Same with admin.
    • Moved remaining models in other.py to common.py. Removed other.py. Same with admins.
    • Re-ordered models into related groups.
  • Bug fixes
    • Truncate time portions of raw datetime values #1457.
    • Strip newlines when loading into MySQL.