Skip to content

Difficult-Burger/skills-diy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Skills DIY Collection

A collection of custom Claude Code skills for NLP model development and data quality analysis.

📚 Available Skills

Analyze NLP datasets before model training or annotation to identify quality issues and get improvement recommendations.

Features:

  • 📊 Comprehensive dataset statistics (samples, size, format)
  • 📝 Text content analysis (length distribution, vocabulary, quality)
  • 🏷️ Label distribution analysis (class balance, rare class detection)
  • 🔍 Data quality assessment with scoring
  • 💡 Actionable improvement recommendations

Installation:

npx skills add https://github.com/Difficult-Burger/skills-diy.git --skill nlp-dataset-analyzer --global --yes

Usage:

Analyze this NLP dataset: sentiment_data.jsonl

→ Full Documentation


Analyze model prediction errors with comprehensive metrics and interactive HTML reports for efficient error review.

Features:

  • 📊 Complete classification metrics (Accuracy, Precision, Recall, F1)
  • 🔢 Confusion matrix visualization
  • ❌ FP/FN identification and categorization
  • 🔍 Error pattern detection
  • 🌐 Interactive HTML reports with advanced filtering
  • 🎯 Confidence calibration analysis

Installation:

npx skills add https://github.com/Difficult-Burger/skills-diy.git --skill bad-case-analyzer --global --yes

Usage:

Analyze bad cases from model_predictions.jsonl

→ Full Documentation


🚀 Quick Start

Install Both Skills

# Install NLP Dataset Analyzer
npx skills add https://github.com/Difficult-Burger/skills-diy.git --skill nlp-dataset-analyzer --global --yes

# Install Bad Case Analyzer
npx skills add https://github.com/Difficult-Burger/skills-diy.git --skill bad-case-analyzer --global --yes

Verify Installation

npx skills list

You should see both skills in the list.

💡 Typical Workflow

End-to-End NLP Project Workflow

graph LR
    A[Collect Data] --> B[Dataset Analysis]
    B --> C[Fix Quality Issues]
    C --> D[Train Model]
    D --> E[Bad Case Analysis]
    E --> F[Improve Model/Data]
    F --> D
Loading

Step-by-step:

  1. Before Training - Use NLP Dataset Analyzer

    Analyze this NLP dataset: training_data.jsonl
    
    • Check data quality
    • Identify class imbalance
    • Find missing/duplicate samples
    • Get improvement suggestions
  2. After Training - Use Bad Case Analyzer

    Analyze bad cases from predictions.jsonl
    
    • Calculate performance metrics
    • Identify systematic errors
    • Generate interactive HTML report
    • Filter and review specific error types
  3. Iterate - Based on bad case insights:

    • Collect more data for confused classes
    • Fix annotation inconsistencies
    • Adjust model architecture
    • Apply targeted data augmentation

📖 Documentation

Each skill has comprehensive documentation:

🎯 Use Cases

Data Quality Assurance

  • Pre-training dataset validation
  • Annotation quality monitoring
  • Data collection gap identification

Model Evaluation

  • Post-training error analysis
  • Confusion pair identification
  • Confidence calibration check

Iterative Improvement

  • Error-driven data collection
  • Systematic bias detection
  • Model comparison across versions

Team Collaboration

  • Share HTML reports for review
  • Document data quality findings
  • Track improvement over iterations

📊 Example Reports

NLP Dataset Analyzer Output

# NLP Dataset Analysis Report

## 📊 Dataset Overview
- Total Samples: 5,000
- Data Format: JSONL
- Overall Quality Score: 78/100

## 🏷️ Label Distribution
- positive: 2,800 (56%)
- negative: 1,500 (30%)
- neutral: 700 (14%)
- Imbalance Ratio: 4.0:1 (Medium)

## 💡 Recommendations
- [ ] Consider oversampling neutral class
- [ ] Remove 15 duplicate samples
- [ ] Fix 8 empty text samples

Bad Case Analyzer Output

Interactive HTML report with:

  • Overall Accuracy: 85.3%
  • Top Confusion: positive→negative (34 cases)
  • Filterable bad case browser
  • Clickable confusion matrix

🛠️ Technical Details

Language: English (all documentation and outputs)

Supported Formats:

  • JSONL (recommended)
  • JSON
  • CSV/TSV

Dependencies: None - works with standard Python libraries

Compatibility: Claude Code with skills framework

🤝 Contributing

Contributions are welcome! To add a new skill:

  1. Fork the repository
  2. Create a new skill directory following the structure
  3. Include SKILL.md, README.md, and necessary references
  4. Submit a pull request

📝 License

Apache 2.0

🙏 Credits

Created with Claude Code and the skills framework.

Both skills leverage the Well-known Agent Skill Discovery (WASD) protocol for seamless integration with Claude Code.


Questions or Issues? Please open an issue on GitHub or check the individual skill documentation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages