-
Notifications
You must be signed in to change notification settings - Fork 63
How to make a processor
4CAT is a modular tool. Its modules come in two varietes: data sources and processors. This article covers the latter.
Processors are bits of code that produce a dataset. Typically, their input is another dataset. As such they can be used to analyse data; for example, a processor can take a csv file containing posts as input, count how many posts occur per month, and produce another csv file with the amount of posts per month (one month per row) as output.
4CAT has an API that can do most of the scaffolding around this for you so processors can be quite lightweight and mostly focus on the analysis while 4CAT's back-end takes care of the scheduling, determining where the output should go, et cetera.
This is a minimal example of a 4CAT processor:
from backend.abstract.processor import BasicProcessor
class ExampleProcessor(BasicProcessor):
type = "example-processor"
category = "Examples"
title = "A simple example"
description = "This doesn't do much"
extension = "csv"
input = "csv:body"
output = "csv:value"
def process(self):
data = {"value": "Hello world!"}
self.write_csv_and_finish(data)
Or, annotated:
"""
A minimal example 4CAT processor
"""
from backend.abstract.processor import BasicProcessor
class ExampleProcessor(BasicProcessor):
"""
Example Processor
"""
type = "example-processor" # job type ID
category = "Examples" # category
title = "A simple example" # title displayed in UI
description = "This doesn't do much" # description displayed in UI
extension = "csv" # extension of result file, used internally and in UI
input = "csv:body"
output = "csv:value"
def process(self):
"""
Saves a CSV file with one column ("value") and one row with a value ("Hello
world") and marks the dataset as finished.
"""
data = {"value": "Hello world!"}
self.write_csv_and_finish(data)
🐈🐈🐈🐈