The Django Import package provides a rich Django models import feature for the admin interface as well as using any API.
The following options are present:
- synchronous as well as asynchronous (using Celery) import procedure depending on the project-wide settings
- customizing project-wide storage settings to store imported files
- import any model present in the project
- restricting project-wide list of models available to import
- wide range of input file formats (with help of pandas package)
- online customizing column-to-field reflection
- wide and extendable set of conversion and checking functions for imported values
The idea of the package has been inspired by the django-csvimport package which unfortunately was too hard to modify because of very long history.
For the stable version:
Stable version from the PyPi package repository
pip install django-import
Last development version from the GitHub source version control system
pip install git+git://github.com/nnseva/django-import.git
Note that the Django Import package itself doesn't depend on the Celery project.
if you are planning to start import procedures asynchronously, install Celery accordingly installation instructions in the documentation.
The user should create or save an instance of the ImportJob
model to start the import. It may be done using any available way - from the code,
from the admin page, etc.
The ImportJob
instance has an upload_file
attribute which contains an imported file, options
attribute to provide import customization options,
and model
atrribute which references to the ContentType
instance determining a model whose instances will be created by the import procedure.
Just add a new Import Job
, fill the record appropriately, and save the instance. You will see an import log entry below.
You can repeat the import procedure. Every time when the Import Job
is saved, the import procedure is repeated. You can change any attribute
of the Import Job
between calls to save.
Delete unnecessary import log entries between Import Job
saves just switching the delete flag on
to the right of the import log entry.
Every start of the import procedure generates a new instance of the ImportLog
model where you can find timestamp and detailed log of the import.
Every time when you create or update the ImportJob
, the import procedure is started, and an instance of the ImportLog
is created.
If you don't provide any options for the ImportJob
, the import procedure will:
- open and read file using passed
mode
, orrb
if not set - import all data and headers from the file using the passed
format
, orcsv
if not set - search columns in the input file corresponding to the field names
- create model instances filling the instance field values by the data found in the correspondent input data columns
Every of these steps is customizable.
options
attribute value:
{
...
"mode": "rt",
...
}
The mode
option allows to set the file opening mode explicitly. It can receive either rb
(read binary), or rt
(read text) value.
The default value is rb
.
In case of read text mode, the utf-8
encoding is assumed. You are free to detect and convert the text input file encoding
yourself using some external tools like iconv, enconv, etc. Some of formats, as modern XML, are self-descriptive, so you can try
to import the file unchanged without explicit mode parameter.
options
attribute value:
{
...
"format": "csv"
...
}
You can set the format
value to any appropriate suffix for the pandas.read_*
function. Tested examples are csv
with different formatting (using additional parameters), and excel
- for xls
and xlsx
files. Note that you can use additional parameters to select a sheet from the excel file to import.
options
attribute value:
{
...
"headers": [
"id",
"name",
"code",
...
]
...
}
This headers
option overrides header names. Note that number of headers in the option should correspond to the number of columns in the input file.
When the headers
option is not set, and no any header names are recognized for some reason, headers like sequential numbers with leading zeros are used
in the following importing steps: "0001", "0002", "0003", ...
model
The most poweful option of the import procedure is customizing where and how the input values should appear in the output instances.
This customization is called reflection and appears in the reflections
section of the options
job attribute:
{
...
"reflections": {
"name": "direct",
"surname": {
"function": "direct",
"parameters": {
"column": "second name"
}
},
"code": {
"function": "clean",
"parameters": {
"column": "passcode"
}
},
"access_level": {
"function": "constant",
"parameters": {
"value": 1
}
},
"nationality": "avoid",
"address": {
"function": "format",
"parameters": {
"format": "%(post_code)s, %(building)s-%(apartment)s, %(street), %(city)s, %(country)s"
}
},
"has_car": {
"function": "enum",
"parameters": {
"column": "car",
"mapping": {
"yes": true,
"no": false
}
}
}
...
}
...
}
Every reflection in the reflections
option is a key-value pair, where the key is a field name,
while the value is a reflection option which determines a way how to reflect data row
values to this field value.
The reflection option determines a reflection function, and optional parameters section for this function.
The easiest reflection option is a string. If the reflection option is a string, it corresponds
to the reflection function name, and assumes that the data contains a column having the name
equal to the field name (see the name
reflection in the example above).
Alternatively, reflection option may have a form of the subsection, having the following structure:
function
key determines a reflection function nameparameters
key determines a subsection of additional reflection function parameters
A default reflection function is direct
, it is used if the field is not listed in the
reflections
section (the same reflection function is assigned for the name
reflection
in the example above explicitly).
When the data row is got, all reflection functions are called, and returned values are collected.
The returned values may be used in two stages:
- create - an instance is created or updated from these values using
update_or_create()
function call - update - every value collected for this stage is assigned to the correspondent instance attribute, and after the all, the instance is saved again
Every reflection function returns data for create and update stages separately.
The most reflection functions return data for the create stage. The update stage may be used when the existence of the instance is important, for some custom field types for example.
List of core reflection functions:
direct
- sends data column value directly to the field on the instance create stage, parameters:column
- name of the data column to get a value, field name by defalut
update
- sends data column value to the field on the instance update stage. This reflection, as a rule, is used to update instance attributes which are not fields really, f.e. properties, or some custom fields. Parameters:column
- name of the data column to get a value, field name by defalut
constant
- sends a constant value to the field on the instance create stage, parameters:value
- value to be sent, None by default
avoid
- forces ignore of the field by it's name, avoiding a default direct reflectionclean
- sends data column value to the field on the instance create stage, cleaning it by the field clean method, parameters:column
- name of the data column to get a value, field name by defalut
substr
- sends substring of the data column value to the field on the instance create stage, parameters:column
- name of the data column to get a value, field name by defalutstart
- start index in the string, 0 by defaultlength
- maximal number of characters to get from the value, max_length attribute of the field by default
replace
- sends a value from the data column withsearch
value replaced by thereplace
value, parameters:column
- column name where to find a value, field name by defaultsearch
- substring to be searchedreplace
- value to replacecount
- number of occurences to replace, all by default
format
- sends a formatted value collected from the data row to the field on the instance create stage, parameters:format
- %-style format to create a value from the dictionary of data with column names as keys
xformat
- sends a formatted value collected from the data row to the field on the instance create stage, parameters:format
- {}-style format to create a value from the dictionary of data with column names as keys
enum
- sends a enum value accordingly to the mapping from data value to the field value, parameters:column
- column name where to find a value, field name by defaultmapping
- column-to-field value mapping (column value as a key)
combine
- applies several reflections sequentially, parameters:reflections
- a list of reflections to be applied
lookup
- lookups the column value in thelookup_field
on the opposite side of the reference, parameters:column
- column name where to find a value, field name by defaultlookup_field
- field name of the opposide side model with the optional lookup suffix, 'pk' by default
You can see the detailed help with the actual list of all registered reflections at the change page of the ImportJob
instance.
The field list to be imported is determined as a union of two name sets:
- a model field names set
- a reflection names set
Every Django model determines a list of fields. Every such field has a name. The package tries to import all fields found in the model.
The import options contain a set of reflections see before, with unique names. Every such a name also determines a model field or property which should be imported.
The reflection whose name is equal to the model field name, determines a reflection for this field.
The default import column name is equal to the field name.
When the import file contains a column whose name is equal to the field name, it is imported even the reflection is not declared. To avoid such an import, use the special avoid
reflection with the name equal to the column name.
options
attribute value:
{
...
"identity": [
"name",
"surname",
"birthday"
]
...
}
When the import is evaluated several times, it is important to find and update the existent instance imported before
instead of creating a new one. The identity
option allows to select list of fields which will be used to find an existent
instance.
Any existent model fields may be included into this identity list. But note that only those of them which are really imported will be used for the instance identification in the particular import procedure.
The import procedure is started synchronously by default. It also can be started asynchronously
in the separate Celery worker,
depending on the async
value in the DJANGO_IMPORT
attribute of the settings
file:
settings.py
DJANGO_IMPORT = {
...
"async": True
...
}
The django_import.tasks
module contains a definition of the run_import
shared task, if you need to register it yourself instead
of the automatic procedure.
You can check whether the procedure is finished using a special is_finished
flag of the ImportLog
instance.
Note that as minimum one Celery Worker process should be started in order to start asynchronous import procedure. If no Workers are started, the import procedure will never finished.
Two keys containing lists in settings, models
and except
mean, what models are allowed to import.
If models
key is not empty, only those models, which are included into this list, are allowed.
settings.py
DJANGO_IMPORT = {
...
"models": [
'accounting.Debit',
'accounting.Credit',
'accounting.Orders',
'accounting.Accounts',
...
],
"except": []
...
}
Else, the except
key enlists models which are forbidden to import using this packet. All models except enlisted are allowed to import in this case.
settings.py
DJANGO_IMPORT = {
...
"models": [],
"except": [
'contenttypes.ContentType',
'admin.LogEntry',
'auth.Permission',
'sessions.Session',
'django_import.ImportJob',
'django_import.ImportLog',
...
]
...
}
Storage configuration is configured using the storage
key.
The class
key is used to setup a storage class.
The settings
key is used to setup keyword parameters to call the storage instance constructor.
The upload_to
key is used to setup upload_to
keyword parameter of the ImportJob.upload_file
field.
There is a sample of the custom storage settings for the package below:
settings.py
DJANGO_IMPORT = {
...
'storage': {
'class': 'storages.backends.s3boto3.S3Boto3Storage',
'settings': {
'access_key': PROJECT_AWS_ACCESS_KEY_ID,
'secret_key': PROJECT_AWS_SECRET_ACCESS_KEY,
'bucket_name': PROJECT_AWS_STORAGE_BUCKET_NAME,
'querystring_auth': PROJECT_AWS_QUERYSTRING_AUTH,
'use_ssl': PROJECT_AWS_S3_USE_SSL,
'default_acl': PROJECT_AWS_DEFAULT_ACL,
'location': PROJECT_AWS_LOCATION,
'custom_domain': PROJECT_AWS_S3_CUSTOM_DOMAIN,
},
'upload_to': 'imports',
...
}
The default settings are the folowing:
DJANGO_IMPORT = {
'storage': {
'class': 'django.core.files.storage.FileSystemStorage',
'settings': {
'location': '',
},
'upload_to': 'uploads',
},
'models': [],
'except': [
'contenttypes.ContentType',
'admin.LogEntry',
'auth.Permission',
'sessions.Session',
'django_import.ImportJob',
'django_import.ImportLog',
],
'sync': True,
}
Note that the default options are united with the custom options from the settings.py
file to get the actual value, like:
options = {
**default_options,
**custom_options
}
We would like to import files into the model ImportModel which has the following structure:
class ImportExample(models.Model):
""" Import Example """
name = models.CharField(
max_length=128,
verbose_name=_('Name'),
)
quantity = models.IntegerField(
null=True, blank=True,
verbose_name=_('Quantity'),
)
weight = models.FloatField(
null=True, blank=True,
verbose_name=_('Weight'),
)
price = models.DecimalField(
max_digits=15,
decimal_places=2,
null=True, blank=True,
verbose_name=_('Price'),
)
kind = models.CharField(
max_length=32,
choices=[
('wood', _('Wood')),
('steel', _('Steel')),
('oil', _('Oil')),
],
verbose_name=_('Kind'),
)
user = models.ForeignKey(
settings.AUTH_USER_MODEL,
on_delete=models.CASCADE,
related_name='examples',
null=True, blank=True,
verbose_name=_('User'),
help_text=_('User who regards to this example'),
)
def __str__(self):
return self.name
class Meta:
verbose_name = _('Import Example')
verbose_name_plural = _('Import Examples')
We also will assume that we have (as minimum) two users in the User model with usernames u1
and u2
.
The Import Example
model exists in the tests
application of the package if you have cloned it locally using:
git clone git@github.com:nnseva/django-import.git
or download from the repository https://github.com/nnseva/django-import/archive/master.zip and unpack it.
The package itself should be installed as described at the top here.
Just go to the dev
subdirectory and start the following sequentially:
python manage.py migrate
python manage.py createsuperuser
python manage.py runserver
Input superuser name and password, and see the WEB admin interface on the http://127.0.0.1:8000/admin/
URL.
Let we have a CSV file like this (you will find it in the dev/tests/data
directory).
"name","quantity","weight","price","type","user"
"etewrt",123,10.3,11.11,S,"u1"
"cvbncv",112,54.333,34.12,W,"u2"
Create an ImportJob
instance without options, select ImportExample
model, upload the file and try to save it.
You will see the following log in the import log:
2020-05-12 08:29:49.712041+00:00: [INFO(4)] Starting import for: uploads/test_WyIyugj.csv
2020-05-12 08:29:50.051297+00:00: [INFO(4)] Trying to import uploads/test_WyIyugj.csv
2020-05-12 08:29:50.068780+00:00: [INFO(4)] Import file has been recognized, 6 columns, 2 rows: uploads/test_WyIyugj.csv
2020-05-12 08:29:50.071130+00:00: [ERROR(2)] Unexpected error: Cannot assign "'u1'": "ImportExample.user" must be a "User" instance.
2020-05-12 08:29:50.071589+00:00: [INFO(4)] Finished
The ERROR
string means that we can not import for some reason, which is explained in this line: the user
column is a user reference, and import package doesn't know, how to import it.
Let we exclude a user
column a while to make import possible. You can find an avoid
reflection in the help page below the record which does exactly what we need now.
Change view of the options
field from the Tree
to the Code
and input:
{
"reflections": {
"user": "avoid"
}
}
then press "Save and continue editing" button at the bottom.
You will find a new import log record at the bottom with the following log text:
2020-05-12 09:02:23.176985+00:00: [INFO(4)] Starting import for: uploads/test_WyIyugj.csv
2020-05-12 09:02:23.178720+00:00: [INFO(4)] Trying to import uploads/test_WyIyugj.csv
2020-05-12 09:02:23.182253+00:00: [INFO(4)] Import file has been recognized, 6 columns, 2 rows: uploads/test_WyIyugj.csv
2020-05-12 09:02:23.184719+00:00: [INFO(4)] Import has been processed, 2 rows successfully imported
2020-05-12 09:02:23.185278+00:00: [INFO(4)] Finished
The text Import has been processed, 2 rows successfully imported
means that all 2 records are successfully imported.
You can ensure it looking to the Import Example
instance list of the admin.
We can see that neither user
(which has been avoided), nor kind
column have been imported.
Looking into the model, we can see that the kind
column has a choice selector. Let we convert the type
CSV column to the kind
model field. Searching in the help on the admin page, we can see a enum
reflection which does what we would like to have.
{
"reflections": {
"user": "avoid",
"kind": {
"function": "enum",
"parameters": {
"column": "type",
"mapping": {
"S": "steel",
"W": "wood",
"O": "oil"
}
}
}
}
}
After saving the job, we can see success log. Check the imported instances. We can see, that instances have been imported successfully, but duplicated.
Remove all imported instances of the Import Example
model. We should avoid such a problem in a future, the identity
option will help us to do it. Change options appropriately:
{
"reflections": {
"user": "avoid",
"kind": {
"function": "enum",
"parameters": {
"column": "type",
"mapping": {
"S": "steel",
"W": "wood",
"O": "oil"
}
}
}
},
"identity": [ "name" ]
}
You can also notice that there are too many log records in the interface. Just select every of them using the flag Delete
on the right side of every such a record.
Then we will save the Import Job
and see what happens.
The import happens successfully, all import log entries have been deleted, and a new one created with the log of the last import.
Let we save the job again and check the Import Example
admin page. We will see that no any additional records are added, so the identity
option helps us as we expected.
Now it's a time to make a reference to the user in the imported records. We will replace the avoid
reflection by the special lookup
reflection which does what we would like to have.
{
"reflections": {
"user": {
"function": "lookup",
"parameters": {
"lookup_field": "username"
}
},
"kind": {
"function": "enum",
"parameters": {
"column": "type",
"mapping": {
"S": "steel",
"W": "wood",
"O": "oil"
}
}
}
},
"identity": [ "name" ]
}
Let we have a different format of the CSV, like the Excel
application produces if the Russian localization is a default in the system (you will find an example in the dev/tests/data/test-ru.csv
file).
"name";"quantity";"weight";"price";"type";"user"
"etewrt";123;10,3;11,11;S;"u1"
"cvbncv";112;54,333;34,12;W;"u2"
We can pass explicit CSV convertor parameters to the conversion function (all available parameters for the CSV conversion are on the pandas page:
{
"reflections": {
"user": {
"function": "lookup",
"parameters": {
"lookup_field": "username"
}
},
"kind": {
"function": "enum",
"parameters": {
"column": "type",
"mapping": {
"S": "steel",
"W": "wood",
"O": "oil"
}
}
}
},
"identity": [ "name" ],
"format": "csv",
"parameters": {
"sep": ";",
"decimal": ","
}
}
Delete all previously imported records and save the job again. You will see that all records are imported successfully.
You can import the Excel
file directly (the example is present in the /dev/tests/data/test.xls
file). Upload a file and use the appropriate format in the options:
{
"reflections": {
"user": {
"function": "lookup",
"parameters": {
"lookup_field": "username"
}
},
"kind": {
"function": "enum",
"parameters": {
"column": "type",
"mapping": {
"S": "steel",
"W": "wood",
"O": "oil"
}
}
}
},
"identity": [ "name" ],
"format": "excel"
}