Skills DIY Collection

A collection of custom Claude Code skills for NLP model development and data quality analysis.

📚 Available Skills

1. NLP Dataset Analyzer

Analyze NLP datasets before model training or annotation to identify quality issues and get improvement recommendations.

Features:

📊 Comprehensive dataset statistics (samples, size, format)
📝 Text content analysis (length distribution, vocabulary, quality)
🏷️ Label distribution analysis (class balance, rare class detection)
🔍 Data quality assessment with scoring
💡 Actionable improvement recommendations

Installation:

npx skills add https://github.com/Difficult-Burger/skills-diy.git --skill nlp-dataset-analyzer --global --yes

Usage:

Analyze this NLP dataset: sentiment_data.jsonl

→ Full Documentation

2. Bad Case Analyzer

Analyze model prediction errors with comprehensive metrics and interactive HTML reports for efficient error review.

Features:

📊 Complete classification metrics (Accuracy, Precision, Recall, F1)
🔢 Confusion matrix visualization
❌ FP/FN identification and categorization
🔍 Error pattern detection
🌐 Interactive HTML reports with advanced filtering
🎯 Confidence calibration analysis

Installation:

npx skills add https://github.com/Difficult-Burger/skills-diy.git --skill bad-case-analyzer --global --yes

Usage:

Analyze bad cases from model_predictions.jsonl

→ Full Documentation

🚀 Quick Start

Install Both Skills

# Install NLP Dataset Analyzer
npx skills add https://github.com/Difficult-Burger/skills-diy.git --skill nlp-dataset-analyzer --global --yes

# Install Bad Case Analyzer
npx skills add https://github.com/Difficult-Burger/skills-diy.git --skill bad-case-analyzer --global --yes

Verify Installation

npx skills list

You should see both skills in the list.

💡 Typical Workflow

End-to-End NLP Project Workflow

graph LR
    A[Collect Data] --> B[Dataset Analysis]
    B --> C[Fix Quality Issues]
    C --> D[Train Model]
    D --> E[Bad Case Analysis]
    E --> F[Improve Model/Data]
    F --> D

Step-by-step:

Before Training - Use NLP Dataset Analyzer
```
Analyze this NLP dataset: training_data.jsonl
```
- Check data quality
- Identify class imbalance
- Find missing/duplicate samples
- Get improvement suggestions
After Training - Use Bad Case Analyzer
```
Analyze bad cases from predictions.jsonl
```
- Calculate performance metrics
- Identify systematic errors
- Generate interactive HTML report
- Filter and review specific error types
Iterate - Based on bad case insights:
- Collect more data for confused classes
- Fix annotation inconsistencies
- Adjust model architecture
- Apply targeted data augmentation

📖 Documentation

Each skill has comprehensive documentation:

NLP Dataset Analyzer Documentation
- Data format requirements
- Analysis workflow
- Quality checklist
- Common issues and solutions
Bad Case Analyzer Documentation
- Input format requirements
- Metrics explained
- HTML report features
- Error pattern guide

🎯 Use Cases

Data Quality Assurance

Pre-training dataset validation
Annotation quality monitoring
Data collection gap identification

Model Evaluation

Post-training error analysis
Confusion pair identification
Confidence calibration check

Iterative Improvement

Error-driven data collection
Systematic bias detection
Model comparison across versions

Team Collaboration

Share HTML reports for review
Document data quality findings
Track improvement over iterations

📊 Example Reports

NLP Dataset Analyzer Output

# NLP Dataset Analysis Report

## 📊 Dataset Overview
- Total Samples: 5,000
- Data Format: JSONL
- Overall Quality Score: 78/100

## 🏷️ Label Distribution
- positive: 2,800 (56%)
- negative: 1,500 (30%)
- neutral: 700 (14%)
- Imbalance Ratio: 4.0:1 (Medium)

## 💡 Recommendations
- [ ] Consider oversampling neutral class
- [ ] Remove 15 duplicate samples
- [ ] Fix 8 empty text samples

Bad Case Analyzer Output

Interactive HTML report with:

Overall Accuracy: 85.3%
Top Confusion: positive→negative (34 cases)
Filterable bad case browser
Clickable confusion matrix

🛠️ Technical Details

Language: English (all documentation and outputs)

Supported Formats:

JSONL (recommended)
JSON
CSV/TSV

Dependencies: None - works with standard Python libraries

Compatibility: Claude Code with skills framework

🤝 Contributing

Contributions are welcome! To add a new skill:

Fork the repository
Create a new skill directory following the structure
Include SKILL.md, README.md, and necessary references
Submit a pull request

📝 License

Apache 2.0

🙏 Credits

Created with Claude Code and the skills framework.

Both skills leverage the Well-known Agent Skill Discovery (WASD) protocol for seamless integration with Claude Code.

Questions or Issues? Please open an issue on GitHub or check the individual skill documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Skills DIY Collection

📚 Available Skills

1. NLP Dataset Analyzer

2. Bad Case Analyzer

🚀 Quick Start

Install Both Skills

Verify Installation

💡 Typical Workflow

End-to-End NLP Project Workflow

📖 Documentation

🎯 Use Cases

Data Quality Assurance

Model Evaluation

Iterative Improvement

Team Collaboration

📊 Example Reports

NLP Dataset Analyzer Output

Bad Case Analyzer Output

🛠️ Technical Details

🤝 Contributing

📝 License

🙏 Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
bad-case-analyzer		bad-case-analyzer
nlp-dataset-analyzer		nlp-dataset-analyzer
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Skills DIY Collection

📚 Available Skills

1. NLP Dataset Analyzer

2. Bad Case Analyzer

🚀 Quick Start

Install Both Skills

Verify Installation

💡 Typical Workflow

End-to-End NLP Project Workflow

📖 Documentation

🎯 Use Cases

Data Quality Assurance

Model Evaluation

Iterative Improvement

Team Collaboration

📊 Example Reports

NLP Dataset Analyzer Output

Bad Case Analyzer Output

🛠️ Technical Details

🤝 Contributing

📝 License

🙏 Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages