Skip to content

This is the repository for the Lazada Q&A Intent Classification Project of the DLSU Center for Language Technologies. The data is sourced from the question and answer section from products in Lazada.

License

Notifications You must be signed in to change notification settings

dlsucelt/lazadaQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

lazadaQA

The Lazada Q&A Intent Classification Project of the DLSU Center for Language Technologies is a project with the aim of classifying dialog acts of Taglish queries for the eventual goal of helping computers understand Taglish.

Description

This repository currently contains the annotated dataset LazadaQA-Taglish-7k. 7,265 posts were scraped from the Question and Answer section of products from Lazada Philippines. The corpus was created to address the lack of Filipino corpora as well as the lack of e-commerce dialog act corpora. Existing corpora are usually general in nature and are collected before 2005. Since the available tagsets for dialog act classification are for general purposes, a new set of tags were created for the domain of e-commerce. The dataset currently contains the following columns:

  • question (customer utterance)
  • answer (seller utterance)
  • customer (customer username)
  • seller (seller name)
  • time (relative time)
  • tags (annotation)

The tags only correspond to the question asked, therefore during the annotation process, the answer, the customer, the seller, and the time are disregarded by the annotators. The assignment of tags were decided by majority agreement (i.e. a tag is assigned if 2 out of 3 annotators agree).

Paper

This corpus is a part of a paper presented in ECONLP 2019 (EMNLP-IJCNLP 2019 Hong Kong). The pre-print is available at ResearchGate and the final paper is now available at the ACL Anthology.

Poster

The poster, which is also presented to ECONLP 2019, is available at ResearchGate.

License

gpl-3.0

About

This is the repository for the Lazada Q&A Intent Classification Project of the DLSU Center for Language Technologies. The data is sourced from the question and answer section from products in Lazada.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published