forked from dataquest-dev/dspace-import
-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Ondřej Košarko edited this page Jun 17, 2024
·
5 revisions
Note: check README.md for up-to-date instructions
- Installed CLARIN-DSpace7.*. with running database, solr, tomcat
- Clone python-api: https://github.com/ufal/dspace-python-api (branch
internal/data-migration-items
- it's still in progress) and dpace://https://github.com/ufal/DSpace (branchinternal/migrate-clarin-dspace5-to-clarin-dspace7
- Get database dump (old CLARIN-DSpace) and unzip it into the
<PSQL_PATH>/bin
(or wherever you want)
- Create CLARIN-DSpace5.* databases (dspace, utilities) from dump.
// clarin-dspace database
createdb --username=postgres --owner=dspace --encoding=UNICODE clarin-dspace
// create a clarin database with owner
// It run on second try:
psql -U postgres clarin-dspace < <CLARIN_DUMP_FILE_PATH>
// clarin-utilities database
createdb --username=postgres --owner=dspace --encoding=UNICODE clarin-utilities
// create a utilities database with owner
// It run on second try:
psql -U postgres clarin-utilities < <UTILITIES_DUMP_FILE_PATH>
- Recreate your local CLARIN-DSpace7.* database NOTE: all data will be deleted
- Install again the database following the official tutorial steps: https://wiki.lyrasis.org/display/DSDOC7x/Installing+DSpace#InstallingDSpace-PostgreSQL11.x,12.xor13.x(withpgcryptoinstalled)
- Or try to run these commands in the <PSQL_PATH>/bin:
createdb --username=postgres --owner=dspace --encoding=UNICODE dspace
// create databasepsql --username=postgres dspace -c "CREATE EXTENSION pgcrypto;"
// Add pgcrypto extensionIf it throws warning that
-c
parameter was ignored, just write aCREATE EXTENSION pgcrypto;
command in the database cmd. CREATE EXTENSION pgcrypto;
// Now the clarin database for DSpace7 should be created
- Run the database by the command:
pg_ctl start -D "<PSQL_PATH>\data\"
- (Your DSpace project must be installed) Go to the
dspace/bin
and run the commanddspace database migrate force
// force because of local types NOTE:dspace database migrate force
creates default database data that may be not in database dump, so after migration, some tables may have more data than the database dump. Data from database dump that already exists in database is not migrated.
- Create an admin by running the command
dspace create-administrator
in thedspace/bin
- Prepare
dspace-python-api
project for migration IMPORTANT: Ifdata
folder doesn't exist in the project, create it
Update const.py
-
user = "<ADMIN_NAME>"
-
password = "<ADMIN_PASSWORD>"
-
# http or https
-
use_ssl = False
-
host = "<YOUR_SERVER>" e.g., localhost
-
# host = "dev-5.pc"
-
fe_port = "<YOUR_FE_PORT>"
-
# fe_port = ":4000"
-
be_port = "<YOUR_BE_PORT>"
-
# be_port = ":8080"
-
be_location = "/server/"
Update migration_const.py
REPOSITORY_PATH = "<PROJECT_PATH>"
DATA_PATH = REPOSITORY_PATH + "data/"
- Create JSON files from the database tables. NOTE: You must do it for both databases
clarin-dspace
andclarin-utilities
(JSON files are stored in thedata
folder)
- Go to
dspace-python-api
in the cmd - Run
pip install -r requirements.txt
- Run
python data_migration.py <DATABSE NAME> <HOST> postgres <PASSWORD FOR POSTGRES>
e.g.,python data_migration.py clarin-dspace localhost postgres pass
(arguments for database connection - database, host, user, password) for the BOTH databases // NOTE there must exist data folder in the project structure
- Copy
assetstore
from dspace5 to dspace7 (for bitstream import)
- Import data from the json files (python-api/data/) into dspace database (CLARIN-DSpace7.)
-
NOTE: database must be up to date (
dspace database migrate force
must be called in thedspace/bin
) - NOTE: dspace server must be running
- From the
dspace-python-api
run commandpython dspace_import.py
Migration notes:
- The values of table attributes that describe the last modification time of dspace object (for example attribute
last_modified
in tableItem
) have a value that represents the time when that object was migrated and not the value from migrated database dump.