Skip to content

[IMPROVE] Agent: Early deduplication check and token budget management #22

@DavideDaniel

Description

@DavideDaniel

Observed Issues During Collection Run

During the 2026-02-15T20 collector run, several workflow inefficiencies emerged:

1. Late Deduplication Discovery

Problem: Agent spent significant time and tokens searching for videos with "Today" filter, only to discover after extensive searching that all recent content was already covered in Issues #13-16 (all from same day).

Impact:

  • Wasted 20+ minutes of search time
  • Consumed tokens on searches that couldn't yield usable results
  • Agent had to restart with different time window

Recommendation:

  • Check recent issues FIRST, before any video searching
  • Extract key topics/companies from last 2-3 issues
  • Use that information to guide search terms and avoid covered ground
  • Consider: "Recent issues covered OpenAI, Anthropic, Meta, Google DeepThink. Search for: xAI, NVIDIA, Microsoft, or different angles on same companies."

2. Token Budget Not Considered Upfront

Problem: Full video analysis workflow requires:

  • Navigate to video
  • Expand description (click "...more")
  • Click "Show transcript"
  • Read transcript
  • Write 200-500 word summary
  • Extract references
  • Identify unverified claims

For 2 videos, this is substantial token usage, but agent didn't assess token budget before committing to full analysis.

Current token usage this run: ~109,000 / 200,000 (54%)

Recommendation:

  • Check token budget at start of run
  • If < 100K tokens remaining, consider:
    • Simplified collection format
    • Single video instead of 2
    • Note-taking mode rather than full transcript analysis
  • Plan token allocation: ~40K per video for full analysis

3. Search Strategy Not Adaptive

Problem: Agent used same search pattern even after discovering "Today" filter wasn't working:

  • Multiple different search terms with same "Today" filter
  • Each search yielded same dominated results (India AI Summit)
  • Took many attempts before trying "This week" filter

Recommendation:

  • After 2-3 searches with same filter yield similar dominated results, switch strategies:
    • Try different time windows (This week, This month)
    • Try more specific terms (CEO names, company-specific)
    • Try different content sources (specific channel names)
  • Adaptive search: "Today filter dominated by Event X. Trying This week filter..." rather than repeating failed strategy

4. No Graceful Degradation Path

Problem: When ideal workflow (2 videos, full analysis, 48-hour window) wasn't feasible, agent had no clear degradation path and had to ask user for guidance.

Recommendation: Define fallback options in advance:

  • Tier 1 (ideal): 2 videos, full analysis, 48-hour window
  • Tier 2 (good): 2 videos, full analysis, 72-96 hour window
  • Tier 3 (acceptable): 1-2 videos, simplified summaries, this week
  • Tier 4 (minimal): Partial collection noting videos found, workflow issues
  • Tier 5 (failed run): No qualifying videos, document search attempts

Agent should auto-detect which tier is feasible and proceed accordingly, only asking user for clarification at decision points.

5. Workflow Steps Not Optimal Order

Current order:

  1. Review recent issues (done late, after wasted searches)
  2. Search for videos
  3. Analyze videos
  4. Create issue
  5. File problems (if any)

Recommended order:

  1. Check token budget
  2. Review recent issues FIRST (extract already-covered topics)
  3. Plan search strategy based on gaps
  4. Search for videos with adaptive strategy
  5. Select videos
  6. Assess if full analysis is feasible given tokens
  7. Analyze videos (full or simplified based on budget)
  8. Create collection issue
  9. File improvement issues for observed problems

Implementation Suggestions

Add to Collector Instructions:

**Before searching for videos:**
1. Check token budget - if < 100K remaining, plan for simplified collection
2. Review last 3 issues in collector label - note companies and topics covered
3. Identify gaps: which major companies (OpenAI, Google, Anthropic, Meta, xAI, NVIDIA, Microsoft) haven't been covered recently?
4. Plan search strategy to fill gaps

**During video search:**
- If 2-3 searches with same parameters yield dominated/duplicate results, switch time window or search approach
- Document search terms attempted

**Before full video analysis:**
- Confirm token budget supports full analysis for selected number of videos
- If not, use simplified format or reduce video count

Add Fallback Tiers to Instructions:

Include the tier system above so agent knows when to adapt and doesn't need to ask user for guidance at each decision point.

Benefits of These Changes

  1. Reduced wasted effort: Early deduplication prevents searching for content that won't qualify
  2. Better token efficiency: Planning token usage upfront prevents running out mid-analysis
  3. Faster adaptation: Recognizing dead-end search strategies quickly and pivoting
  4. More autonomous operation: Clear fallback paths mean agent can handle sub-ideal conditions without user intervention
  5. Consistent quality: Even partial runs provide value by documenting what was attempted and why it didn't work

Example Improved Flow

1. Token check: 180K available - good for full analysis
2. Review issues #13-16: All from Feb 15, covered: OpenAI, Anthropic, Google, Meta. Gap: xAI, NVIDIA, Microsoft
3. Search strategy: Look for xAI/Grok, NVIDIA chips, Microsoft AI. Start with "This week" filter since today's content may overlap.
4. Search "xAI Grok news 2026" + This week filter
5. Found: Forbes on Grok market position (3 days ago) ✓
6. Search "NVIDIA AI 2026" + This week filter  
7. Found: Fortune on DeepMind CEO (4 days ago) ✓
8. Select 2 videos, different companies, good diversity
9. Proceed with full analysis - token budget sufficient
10. Create issue

Clean, efficient, no wasted motion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions