Skip to content

Latest commit

 

History

History
172 lines (116 loc) · 5.65 KB

README.md

File metadata and controls

172 lines (116 loc) · 5.65 KB

AmCAT - Amsterdam Content Analysis Toolkit

Master: Build Status

Release 3.3: Build Status

Installation and Configuration

Prerequisites

Most of the (python) prerequisites for AmCAT are automatically installed using pip (see below). To install the non-python requirements, you can use the following (on ubuntu):

$ sudo apt-get install antiword unrtf rabbitmq-server python-pip python-dev libxml2-dev libxslt-dev lib32z1-dev postgresql postgresql-server-dev-9.4 postgresql-contrib-9.4

It is probably best to install AmCAT in a virtual environment. Run the following commands to setup and activate a virtual environment for AmCAT: (on ubuntu)

$ sudo apt-get install python-virtualenv
$ virtualenv amcat-env
$ source amcat-env/bin/activate

If you use a virtual environment, every time you start working with AmCAT you need to repeat the source line to load the environment. If you don't use a virtual environment, you will need to run most pip command below using sudo.

Database

AmCAT requires a database to store its documents in. The default settings look for a postgres database 'amcat' on localhost. To set up the current user as a superuser in postgres and create the database, use:

$ sudo -u postgres createuser -s $USER
$ createdb amcat

Elastic

AmCAT uses elasticsearch for searching articles. Since we use a custom similarity to provide hit counts instead of relevance, this needs to be installed 'by hand'. You can probably skip this and rely on a pre-packaged elasticsearch if you don't care about hit counts, although you still need to install the elasticsearch plugins.

First, install oracle java (from http://www.webupd8.org/2012/01/install-oracle-java-jdk-7-in-ubuntu-via.html) For java 8 visit: http://www.webupd8.org/2012/09/install-oracle-java-8-in-ubuntu-via-ppa.html

$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer #for java 7
$ sudo apt-get install oracle-java8-installer #for java 8

Next, download and extract elasticsearch and our custom hitcount jar, and install the required plugins:

cd /tmp

# Download and install elasticsearch
wget "https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.4.4.deb"
sudo dpkg -i elasticsearch-1.4.4.deb

# Install plugins
cd /usr/share/elasticsearch
# sudo bin/plugin -install elasticsearch/elasticsearch-lang-python/2.4.1 (no longer needed for master)
sudo bin/plugin -install elasticsearch/elasticsearch-analysis-icu/2.4.2
sudo bin/plugin -install mobz/elasticsearch-head
sudo wget http://hmbastiaan.nl/martijn/amcat/hitcount.jar

# Allow dynamic scripting (no longer needed for master)
# cd /etc/elasticsearch
# echo -e "\nscript.disable_dynamic: false" | sudo tee -a elasticsearch.yml

# Make sure elasticsearch detects hitcount.jar
sudo editor /etc/init.d/elasticsearch

# Add after ES_HOME:
ES_CLASSPATH=$ES_HOME/hitcount.jar
export ES_CLASSPATH

# Add to DAEMON_OPTS:
-Des.index.similarity.default.type=nl.vu.amcat.HitCountSimilarityProvider

# Save file and close editor
# Restart elasticsearch
sudo service elasticsearch restart
cd

Installing AmCAT (pip install from git)

Now you are ready to install AmCAT. The easiest way to do this is to pip install it direct from github. This is not advised unless you use a virtual environment.

pip install git+https://github.com/amcat/amcat.git

Installing AmCAT (clone)

Alternatively, clone the project from github and pip install the requirements. If you plan to make changes to AmCAT, this is probably the best thing to do.

git clone https://github.com/amcat/amcat.git
pip install -r amcat/requirements.txt

If you install amcat via cloning, be sure to add the new directory to the pythonpath. Also, add AMCAT_ES_LEGACY hash to the environment. If you add these lines to amcat-env/bin/activate they will be automatically set when you activate.

export PYTHONPATH=$PYTHONPATH:$HOME/amcat
export AMCAT_ES_LEGACY_HASH=N

Collecting static files

AmCAT uses bower to install javascript/CSS libraries. On Ubuntu, you need to install the legacy version of nodejs first, and then install bower by using npm:

sudo apt-get install nodejs-legacy npm
sudo npm install -g bower

On older ubuntu versions, if the above does not work, try installing nodejs via Chris Lea's ppa:

sudo add-apt-repository ppa:chris-lea/node.js
sudo apt-get update
sudo apt-get install nodejs
sudo apt-get upgrade nodejs
sudo npm install -g bower

Then, in the top-directory of AmCAT itself run:

bower install

Setting up the database

Whichever way you installed AmCAT, you need tocall the syncdb command to populate the database and set the elasticsearch mapping:

python -m amcat.manage syncdb

Start AmCAT web server

For debugging, it is easiest to start amcat using runserver:

python -m amcat.manage runserver

Start celery worker

Finally, to use the query screen you need to start a celery worker. In a new terminal, type:

DJANGO_SETTINGS_MODULE=settings celery -A amcat.amcatcelery worker -l info -Q amcat

(if you are using a virtual environment, make sure to activate that first)

Configuring AmCAT

The main configuration parameters for AmCAT reside in the settings folder. In many places, these settings are defaults that can be overridden with environment variables.