AI Content Moderation Analysis (ACMA)

Overview

ACMA (AI Content Moderation Analysis) is an advanced AI-driven content moderation system designed to detect and analyze toxicity, inappropriate visuals, violence, and other harmful content in various media types including text, images, audio, and video. This system helps maintain safe online environments by enforcing community guidelines, legal compliance, and ethical standards while respecting user privacy and freedom of expression.

This project was developed as a final year project to demonstrate the application of machine learning and computer vision techniques in content moderation.

Features

Multi-Modal Content Analysis

Text Moderation: Detects toxic language and analyzes sentiment using NLP techniques

Image Moderation: Extracts text from images using OCR and classifies visual content for inappropriate material (porn, hentai, sexy content) and violence detection

Audio Moderation: Transcribes speech to text and analyzes for toxicity

Video Moderation: Processes video frames to detect nudity, violence, and toxic content in audio tracks

Key Capabilities

Toxicity detection using machine learning classifiers

Image classification for NSFW content using deep learning models

Violence detection in images and videos

OCR (Optical Character Recognition) for text extraction from images

Speech-to-text conversion for audio analysis

Sentiment analysis for text content

Real-time content analysis through web interface

Technologies Used

Backend: Python Flask

Machine Learning: TensorFlow, Keras, scikit-learn

Computer Vision: OpenCV, EasyOCR

Natural Language Processing: NLTK

Audio Processing: SpeechRecognition, MoviePy

Frontend: HTML, CSS (Tailwind CSS), JavaScript

Data Processing: NumPy, Joblib

Project Workflow Diagram

graph TD A[User Input: Text, Image, Audio, or Video] --> B{Input Type?} B -->|Text| C[Preprocess Text Remove special characters] C --> D[TF-IDF Vectorization] D --> E[Toxicity Classifier Predict Toxic/Non-Toxic] E --> F[Sentiment Analysis Positive/Negative/Neutral] F --> G[Return Result] B -->|Image| H[OCR Text Extraction From Image] H --> I[Preprocess Extracted Text] I --> J[TF-IDF Vectorization] J --> K[Toxicity Check on Text] K --> L[Image Classification NSFW Detection porn/hentai/sexy] L --> M[Violence Detection in Image] M --> N{All Checks Pass ?} N -->|Yes| O[Return: Can be published] N --> |No| P[Return: Cannot be published] O --> G P --> G B --> |Audio| Q[Speech to Text Transcription] Q --> R[Preprocess Transcribed Text] R --> S[TF-IDF Vectorization] S --> T[Toxicity Classifier] T --> G B --> |Video| U[Extract Frames Every 3 seconds] U --> V[Classify Each Frame NSFW Detection] V --> W[Calculate Average NSFW Percentages] W --> X[Extract Audio Track] X --> Y[Speech to Text Transcription] Y --> Z[Preprocess Text] Z --> AA[TF-IDF Vectorization] AA --> BB[Toxicity Check on Audio] BB --> CC{Video Safe?} CC --> |Yes| DD[Can be Published] CC --> |No| EE[Cannot be Published] DD --> G EE --> G G --> FF[Display Result to User]

Loading

Project Structure

├── app.py # Main Flask application ├── toxicity_classifier.pkl # Trained toxicity detection model ├── tfidf_vectorizer.pkl # TF-IDF vectorizer for text processing ├── IMG_MODEL.299x299.h5 # Image & Video classification model ├── VIOLENCE_DETECTION.h5 # Violence detection model ├── template/ # HTML templates │ ├── index.html # Home page │ ├── predict.html # Content analysis interface │ └── aboutus.html # About page ├── static/ # Static assets │ ├── img/ # Images and icons │ ├── scripts/ # JavaScript files │ └── styles/ # CSS stylesheets └── uploads/ # Directory for uploaded files

How It Works

Text Analysis

Input text is preprocessed (removing special characters)

Features are extracted using TF-IDF vectorization

Toxicity classifier predicts if content is toxic

Sentiment analysis determines positive/negative/neutral sentiment

Image Analysis

OCR extracts any text from the image

Extracted text is analyzed for toxicity

Image is classified using deep learning model for inappropriate content

Violence detection model checks for violent content

Content is flagged as "Cannot be published" if any checks fail

Audio Analysis

Speech recognition converts audio to text

Transcribed text is analyzed for toxicity using the same text pipeline

Video Analysis

Video frames are extracted at regular intervals

Each frame is analyzed for nudity/inappropriate content

Audio track is extracted and analyzed for toxicity

Video is flagged if any frame or audio contains prohibited content

Installation and Setup

Prerequisites

Python 3.7+

pip package manager

System Requirements

Hardware: Windows 10 or higher, 8 GB RAM at least, CPU of 2 GHz or higher frequency, GPU is recommended for fast performance, HDD/SSD 500 GB.

Software: Python version 3.11.5 (recommended) or higher and TensorFlow version 2.12.0 (recommended).

Libraries: flask, tensorflow, keras, easyocr, opencv, re, speech_recognition, joblib, numpy, nltk, moviepy.

IDE: VS Code or Python IDE for running the project.

Dependencies

Install the required packages using the provided requirements.txt file:

pip install -r requirements.txt

NLTK Data

Download required NLTK data:

import nltk nltk.download('vader_lexicon')

Model Files

Ensure all model files are present in the root directory:

toxicity_classifier.pkl

tfidf_vectorizer.pkl

IMG_MODEL.299x299.h5

VIOLENCE_DETECTION.h5

Usage

Running the Application

Open the project folder (ACMA).

Right-click and open VS Code.

Run the Flask application in terminal:

python app.py

'Ctrl + Click' & 'Follow the link': http://localhost:5000

You will be directed to the ACMA front end where you can analyze the content.

Using the Web Interface

Home Page: Overview of the system and its features

Classify Page: Upload content for analysis

Enter text directly or upload files (images, audio, video)

Click "Test" to analyze the content

View results showing detected text and toxicity status

API Usage

The system provides a REST API endpoint:

POST /detect_toxicity

Parameters:

text (optional): Text content to analyze

file (optional): File upload (image, audio, or video)

Response format:

{ "text": "extracted or input text", "toxicity_result": "Toxic/Non-Toxic/Cannot be published" }

Vision and Goals

The vision of ACMA is to create safer online communities by:

Maintaining user safety and well-being

Enforcing community guidelines

Ensuring legal compliance

Protecting privacy and ethical standards

Leveraging AI and machine learning for efficient moderation

Balancing content control with freedom of expression

Content Disclaimer

This project may contain explicit language, adult themes, or sensitive material, including audio, video, images, and text. Such content is included solely for testing purposes within the project.

Contact Information

Developers: Aditya Singh, Harshit Saxena, Ayush Sharma, Ayush Vishnoi
College: MIT Moradabad, India

License

© 2023-Present ACMA - All rights reserved.

This project demonstrates the integration of multiple AI technologies for comprehensive content moderation. For educational and research purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
__pycache__		__pycache__
static		static
template		template
uploads		uploads
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
AI-Content-Moderation-Analysis.pdf		AI-Content-Moderation-Analysis.pdf
Dockerfile		Dockerfile
IMG_MODEL.299x299.h5		IMG_MODEL.299x299.h5
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Content Moderation Analysis (ACMA)

Overview

Features

Multi-Modal Content Analysis

Key Capabilities

Technologies Used

Project Workflow Diagram

Project Structure

How It Works

Text Analysis

Image Analysis

Audio Analysis

Video Analysis

Installation and Setup

Prerequisites

System Requirements

Dependencies

NLTK Data

Model Files

Usage

Running the Application

Using the Web Interface

API Usage

Vision and Goals

Content Disclaimer

Contact Information

License

About

Uh oh!

Releases

Packages

Languages

adityasinghcoding/AI-Content-Moderation-Analysis-Final-Year-Project-

Folders and files

Latest commit

History

Repository files navigation

AI Content Moderation Analysis (ACMA)

Overview

Features

Multi-Modal Content Analysis

Key Capabilities

Technologies Used

Project Workflow Diagram

Project Structure

How It Works

Text Analysis

Image Analysis

Audio Analysis

Video Analysis

Installation and Setup

Prerequisites

System Requirements

Dependencies

NLTK Data

Model Files

Usage

Running the Application

Using the Web Interface

API Usage

Vision and Goals

Content Disclaimer

Contact Information

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages