Forked from Subhasis Dutta's original github pages on this so we have it under the OEDA / openeventdata pages for future reference. The files referenced in here are proprietary and can be accessed under agreements via for OEDA participants.
This project is built to code events about political conflict and cooperation among different political actors in multiple languages (primarily English, Arabic and Spanish).
It is part of "Modernizing Political Event Data for Big Data Social Science Research".
Implemented under the guidance of:
- Prof. Patrick Brandt
- Prof. Latifur Khan
- Prof. Vincent Ng
- Prof. Jennifer Holmes
Built at University of Texas at Dallas, funded by U.S National Science Foundation under NSF-SBE-SMA-1539302.
Details here:
This repo contains script to parse the entity file of JRC names, CAMEO actoe data and puts organizes it in a structured format in database.
Conflict and Mediation Event Observations dataset obtained from
Details about CAMEO:
JRC-Names is a highly multilingual named entity resource for person and organisation names (called 'entities'). It consists of large lists of names and their many spelling variants (up to hundreds for a single person), including across languages (e.g., Latin, Greek, Arabic, Cyrillic, Japanese, Chinese, etc.) for details see:
BabelNet is both a multilingual encyclopedic dictionary, with lexicographic and encyclopedic coverage of terms, and a semantic network which connects concepts and named entities in a very large network of semantic relations, made up of about 14 million entries, called Babel synsets. Each Babel synset represents a given meaning and contains all the synonyms which express that meaning in a range of different languages.
Contains all the scripts to :
- Extract information from CAMEO dataset and load it into MongoDB.
- Extract information from JRC Entity dataset and load it into MongoDB.
- Find all the relation between the CAMEO and JRC Actors and store it in a relation table.
Details to Run the Scripts are present in ir-scripts/ReadME
This package will contain the Map reduce implementation of the join operation between CAMEO and JRC data.
Details present in mr-script/ReadME
This package will contain the Spark join implementation for the join operation between CAMEO and JRC data.
Details present in spark-script/ReadME
This package will contain the Machine Learning based classifier to identify the unidentified language for the different political Actors in JRC entity data set.
Details present in jrc-classification/ReadME
Contains a web-server to allow different client systems to access the data from different client systems.
Demo Server :
Available API:
- Check Status :
<Server IP:port>/
- Should give a response
{"Status": "Server Running", "multilangAPI": "Welcome"}
Ex - - Full Raw Output :
<Server IP:port>/search?query=<Person Name>
- Returns all combined search resuls from CAMEO, JRC, BableNET and dbPedia.
Ex: - Filtered Output:
a.<Server IP:port>/filter?query=<Person Name>
- Returns all the translation found for the Person in default
(Arabic, Spanish). The default is setup in config/config.cnf.
b.<Server IP:port>/filter?query=<Person Name>&source=bablenet
- Returns all person name in default language for only one data source. (By default it returns from all data source. Currently support: jrc, bablenet)
c.<Server IP:port>/filter?query=<Person Name>&lang=<Language Code>
- Returns all person name in a particular language.
Eg: (Names in German) (Names in Japanese) (Names in Hindi) (Names in Bengali)
Details to setup and run is present in rest-server/ReadME
This package contains a User Interface Client to visualize the data that can be accessed by the API.
Details present in webappn/ReadME
The scripts have dependency on PyMongo(For database connection), Python Tornado(for API server), editdistance
To install all dependency run
pip install -r requirements.txt