Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify properties of processes #3335

Open
5 tasks
matthias-ronge opened this issue Mar 23, 2020 · 1 comment
Open
5 tasks

Simplify properties of processes #3335

matthias-ronge opened this issue Mar 23, 2020 · 1 comment

Comments

@matthias-ronge
Copy link
Collaborator

matthias-ronge commented Mar 23, 2020

Processes can have three types of properties: process properties, workpiece properties, and template properties. There are several things that are strangely implemented (as legacy issues from Production 2) and that require to do something about it:

  • The corresponding fields of the Process class that contain these properties are incorrectly referred to as properties, templates and workpieces (not processProperties, templateProperties and workpieceProperties as they should).
  • The same misnomer applies to getters and setters.
  • The fields are assigned to the database using three (also incorrectly labeled) crosstabs: process_x_property, template_x_property and workpiece_x_property. However, the relationship is 1:n, each property record always belongs to exactly one process.
  • We wanted to remove the properties completely, but at the moment properties are still used in one project, so we cannot do this. templates and workpieces can be removed.
  • workpieces properties are still used when generating the docket and the Tiff header.

This should be cleaned up. Necessary steps:

  1. Remove template properties.
  2. Change the generation of runnotes so that they are based on metadata (requires a different type of docket.xsl).
  3. Change generation of TIFF headers
  4. Remove properties of the type workpiece.
  5. Change the relationship of the properties to 1:n.
@matthias-ronge
Copy link
Collaborator Author

Simplify the tablellar properties

At the moment, there is a table of properties and three crosstabs. Example:

Table: property

id | title     | value            (obligatory | dataType | choice | creationDate)
---+-----------+-----------------
1  | NumImages | 99
2  | Cost      | 24
3  | NumImages | 799
4  | Title     | Moby Dick
5  | ISBN      | 123-45678-912X
6  | Title     | Fairy Tales
7  | Dummy     | nothing new here


Table: process_x_property

process_id | property_id
-----------+------------
1          | 1
1          | 2
2          | 3


Table: workpiece_x_property

process_id | property_id
-----------+------------
1          | 4
2          | 5
2          | 6

Crosstabs should only be used for n:n relationships, but here the relationship is 1:n. Also, you don't need three tables, but the fact of the properties to which domain they belong.

Here's a piece of SQL code to change that. It was thoughts. I'll leave it here in case it becomes useful:

-- * Add column for foreign key process_id. Temporarily setting this to DEFAULT
--   NULL; will be fixed, and constraint be added after data migration below.
-- * Rename column "title" to "key"
-- * Add column for domain. Temporarily DEFAULT NULL, until data was added.
-- * Remove other columns (what are they for?)
--
ALTER TABLE properties
    ADD COLUMN process_id int(11) DEFAULT NULL AFTER id,
    CHANGE title `key` varchar(255) NOT NULL,
    ADD COLUMN domain varchar(255) DEFAULT NULL AFTER `key`,
    DROP COLUMN obligatory,
    DROP COLUMN dataType,
    DROP COLUMN choice,
    DROP COLUMN creationDate;

-- Populate "process_id" for workpiece properties and set domain "description"
--
UPDATE property
    INNER JOIN workpiece_x_property ON property.id = workpiece_x_property.property_id
    SET property.process_id = workpiece_x_property.process_id,
        property.domain = "description";

-- Populate "process_id" for template properties and set domain "source"
--
UPDATE property
    INNER JOIN template_x_property ON property.id = template_x_property.property_id
    SET property.process_id = template_x_property.process_id,
        property.domain = "source";

-- Populate "process_id" for process properties and set domain "technical"
--
UPDATE property
    INNER JOIN process_x_property ON property.id = process_x_property.property_id
    SET property.process_id = process_x_property.process_id,
        property.domain = "technical";

-- Delete crosstabs
--
DROP TABLE workpiece_x_property,
           template_x_property,
           process_x_property;

-- delete orpahns (if any)
--
DELETE FROM property WHERE (process_id IS NULL);

-- * Add foreign key constraint for "process_id"
-- * Add possible enum values check for "domain" (as per
--   org.kitodo.api.dataeditor.rulesetmanagement.Domain)
--
ALTER TABLE properties
    MODIFY process_id int(11) NOT NULL,
    ADD CONSTRAINT FK_property_x_process_id FOREIGN KEY (process_id) REFERENCES process(id),
    MODIFY domain varchar(255) NOT NULL DEFAULT "description",
    ADD CONSTRAINT CHK_domain CHECK (domain IN
        ("description", "digitalProvenance", "rights", "source", "technical", "mets:div"));

This has not yet been tested as I don't have any test data at the moment. It's where can the journey go, maybe. The result should then be:

id | process_id | key       | domain      | value
---+------------+-------------+-----------+---------------
1  | 1          | NumImages | technical   | 99
2  | 1          | Cost      | technical   | 24
3  | 2          | NumImages | technical   | 799
4  | 1          | Title     | description | Moby Dick
5  | 2          | ISBN      | description | 123-45678-912X
6  | 2          | Title     | description | Fairy Tales

From here, a migration of the data into the METS files would be conceivable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants