Skip to content

Conversation

@irvingpop
Copy link

Summary

Replaces manual session.json file management with Playwright's persistent browser context for more reliable LinkedIn authentication and session persistence.

Motivation

  • Re-authenticating was a hassle
  • LinkedIn security features were triggered more frequently

Changes

Core Implementation

  • New: PersistentBrowserManager class using launch_persistent_context()
  • Storage: Sessions now persist in ~/.linkedin-mcp/browser-profile/ directory
  • Automatic: Cookies and state persist automatically (no manual save/load)

Migration

  • Automatic migration from legacy session.json on first run
  • Validates session still works before committing migration
  • Backs up old session to session.json.backup
  • Graceful fallback with clear error messages

Configuration

  • Added --user-data-dir CLI option for custom profile locations
  • Fixed default path confusion (was incorrectly session.json, now browser-profile)
  • Updated all documentation (README, AGENTS.md)

Breaking Changes

This is a breaking change (v3.0.0):

  • Session location changed: ~/.linkedin-mcp/session.json~/.linkedin-mcp/browser-profile/
  • Migration is automatic for existing users
  • Fallback: Run --get-session to re-authenticate if migration fails

Benefits

  1. More reliable: Cookies persist like a real browser
  2. Less maintenance: No manual save/load cycles
  3. Better Docker support: Standard volume mount pattern
  4. LinkedIn-friendly: Behaves more like normal browser usage (fewer CAPTCHAs)

Testing

  • ✅ All existing tests updated and passing
  • ✅ Migration logic tested with mock legacy sessions
  • ✅ Fresh install flow verified
  • ✅ Session persistence verified across restarts

Verification Checklist

  • Fresh install: --get-session creates profile
  • Persistence: No re-authentication needed on restart
  • Session info: --session-info reports correct status
  • Clear session: --clear-session removes profile
  • Migration: Legacy session.json migrates successfully
  • Documentation: README, AGENTS.md updated

Migration Guide for Users

Existing users (v2.x → v3.0):
Migration is automatic! On first run with v3.0, the server will:

  1. Detect your existing session.json
  2. Migrate cookies to new browser profile
  3. Verify the session still works
  4. Backup old session to session.json.backup

Btw, I'm happy to go with whatever version numbering you want here.

@greptile-apps
Copy link

greptile-apps bot commented Jan 27, 2026

Greptile Overview

Greptile Summary

Replaces manual session.json file management with Playwright's persistent browser context (launch_persistent_context) for automatic session persistence. This is a breaking change (v2.3.0 → v3.0.0) that migrates from ~/.linkedin-mcp/session.json to ~/.linkedin-mcp/browser-profile/ directory.

Key Changes:

  • New PersistentBrowserManager class handles browser lifecycle with automatic state persistence
  • Automatic migration from legacy session.json on first run with validation and backup
  • Added --user-data-dir CLI option for custom profile locations
  • Updated all documentation and tests to reflect new architecture
  • Proper error handling for profile corruption and concurrent access

Benefits:

  • More reliable session persistence (cookies persist like a real browser)
  • Eliminates manual save/load cycles
  • Better Docker support with standard volume mount pattern
  • Reduces LinkedIn security triggers (behaves more like normal browser usage)

Migration:
The migration flow automatically detects legacy session.json, validates the session is still active, transfers cookies to the new persistent context, verifies login status, and backs up the old file. Users with existing sessions will be automatically migrated on first run of v3.0.0.

Confidence Score: 4.5/5

  • Safe to merge with minimal risk - well-tested breaking change with automatic migration
  • Score reflects solid implementation with comprehensive migration logic, thorough test coverage, and proper documentation updates. Minor uncertainty around BrowserManager.context interface compatibility during migration, but overall architecture is sound and follows Playwright best practices.
  • Check that linkedin_scraper's BrowserManager has a .context property before merging (used in migration at browser.py:266)

Important Files Changed

Filename Overview
linkedin_mcp_server/drivers/persistent_browser.py New class implementing Playwright's persistent browser context with proper error handling for corruption and concurrent access
linkedin_mcp_server/drivers/browser.py Replaced BrowserManager with PersistentBrowserManager, added migration logic from legacy session.json format
linkedin_mcp_server/authentication.py Updated to use profile_exists() instead of session_exists(), clear_session now removes directory instead of file
linkedin_mcp_server/cli_main.py Added automatic migration flow from legacy session.json on startup with user feedback
linkedin_mcp_server/setup.py Updated to create persistent browser context instead of saving session.json manually
linkedin_mcp_server/config/schema.py Added user_data_dir config field with validation to ensure parent directory is creatable

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as cli_main.py
    participant Browser as browser.py
    participant Persistent as PersistentBrowserManager
    participant Playwright

    User->>CLI: Start server
    CLI->>CLI: Check needs_migration()
    alt Legacy session exists
        CLI->>Browser: migrate_from_legacy_session()
        Browser->>Browser: Load legacy BrowserManager
        Browser->>Browser: Extract storage state
        Browser->>Persistent: Create new context
        Browser->>Persistent: Transfer state
        Browser->>Browser: Verify login
        Browser-->>CLI: Migration successful
    end
    
    CLI->>Browser: get_or_create_browser()
    Browser->>Persistent: Initialize with user_data_dir
    Browser->>Persistent: start()
    Persistent->>Playwright: Start playwright
    Persistent->>Playwright: Launch persistent context
    Note over Playwright: State persists automatically
    Playwright-->>Persistent: BrowserContext with Page
    Persistent-->>Browser: PersistentBrowserManager
    
    Browser->>Browser: Navigate to LinkedIn
    Browser->>Browser: Verify authentication
    Browser-->>CLI: Authenticated browser
    
    CLI->>CLI: Start FastMCP server
    Note over CLI: Tools use singleton browser
    
    User->>CLI: Shutdown
    CLI->>Browser: close_browser()
    Browser->>Persistent: close()
    Persistent->>Playwright: Close context
    Persistent->>Playwright: Stop playwright
    Note over Persistent: Session persisted
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

await persistent.start()

# Copy cookies from old session to new persistent context
storage_state = await temp_browser.context.storage_state()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verify BrowserManager.context property exists - this relies on an undocumented interface from linkedin_scraper

Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/drivers/browser.py
Line: 266:266

Comment:
Verify `BrowserManager.context` property exists - this relies on an undocumented interface from `linkedin_scraper`

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

…ement

Replace manual session.json file management with Playwright's persistent
browser context. Sessions now persist automatically in browser profile
directory, eliminating need for manual save/load cycles.

**Major Changes:**
- Add PersistentBrowserManager using launch_persistent_context()
- Change session storage: session.json file → browser-profile/ directory
- Add automatic migration for existing session.json users
- Update configuration with --user-data-dir option
- Fix CLI default path (session.json → browser-profile)

**Breaking Changes:**
- Session location changed from ~/.linkedin-mcp/session.json to
  ~/.linkedin-mcp/browser-profile/
- Automatic migration provided for existing users
- Version bumped to 3.0.0

**Benefits:**
- More reliable cookie persistence (behaves like real browser)
- No manual save/load cycles needed
- Better Docker support with standard volume mount pattern
- More LinkedIn-friendly (reduces CAPTCHAggers)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@stickerdaniel
Copy link
Owner

Hey, can you explain the re-authenticating issues you had? The session management is implemented in the upstream scraper; maybe create an issue there suggesting the use of Playwright's persistent browser context.

@stickerdaniel stickerdaniel marked this pull request as draft January 27, 2026 19:13
@irvingpop
Copy link
Author

Hey Daniel, thanks for the quick response! Maybe I got caught in weird moment, I was getting bitten by the is_logged_in() issue and kept having to re-authenticate every time and it was getting rather tedious.

But that had me thinking: the session IDs won't last forever, and the is_logged_in() detection is bound to break in the future because it is inherently fragile. So why not make it a little bit easier on myself and others by reusing the same browser session, rather than a fresh one every time?

I do agree this could be more elegantly implemented in the upstream scraper, but I saw that project had a really long queue of unreviewed PRs and plus I wanted to verify this was even the right solution so I implemented here. Totally understand if you'd rather see it upstreamed, and if so I can work on that but it'll be a much more circuitous route.

@stickerdaniel
Copy link
Owner

I see where you're coming from, but I think the upstream PR backlog is mostly stale v2 code. My recent issues there were resolved quite fast.

@stickerdaniel
Copy link
Owner

My main constraint is avoiding the maintenance burden of custom session management within this repository

@irvingpop
Copy link
Author

Fair point, and totally understandable. If I refactored this such that persistent context stuff went into the scraper library, would you accept a PR to utilize that?

@stickerdaniel
Copy link
Owner

Yes absolutely

@irvingpop
Copy link
Author

Upstream PR: joeyism/linkedin_scraper#270

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants