Author: Ming Liu (刘铭)
This project extracts conference contribution data from the SRF2025 conference PDF and creates an interactive web-based data explorer. The scraper processes the complete contributions.pdf file (357 pages) and extracts 278 conference papers with full metadata including titles, abstracts, authors, presenters, dates, and session information.
srf2025_pdf_extractor.py
- Main PDF processing scriptsrf2025_data_explorer.html
- Interactive web data explorersrf2025_data.js
- External JavaScript data file (301KB)index.html
- Project homepage with navigationrequirements.txt
- Python dependenciesREADME.md
- This documentation
- Python 3.7+
- PyPDF2 library
- Modern web browser with JavaScript enabled
- Ensure Python 3.7 or higher is installed
- Install dependencies:
pip install -r requirements.txt
python srf2025_pdf_extractor.py
Open index.html
in your web browser to access:
- Project homepage with navigation
- Interactive data explorer with multiple filters
- Complete PDF text extraction (pages 2-357)
- 278 conference contributions processed
- Author and presenter information
- Session and track classifications
- Date/time information
- Abstract content
- Contribution codes and types
- Search: Full-text search across titles, codes, and abstracts
- Filters:
- Session filter (16 different sessions)
- Type filter (Oral, Poster, Student presentations)
- Author filter (278 unique authors)
- Presenter filter (278 unique presenters)
- Date filter (conference dates)
- Gradient backgrounds with particle animations
- Responsive design for mobile devices
- Smooth hover effects and transitions
- Card-based layout with expandable abstracts
- Real-time filter statistics
SRF2025_Data/
├── SRF2025_All_Contributions.csv # Complete CSV dataset
├── SRF2025_Complete_Index.json # JSON data index
└── SRF2025_Extraction_Report.txt # Processing statistics
- Real-time Search: Instant filtering as you type
- Multiple Filters: Combine any number of filters
- Statistics Display: Shows filtered vs total results
- Expandable Abstracts: Click "Read more" for full content
- Responsive Cards: Beautiful card layout with session tags
- Clear Filters: One-click reset functionality
- Project overview and navigation
- Author attribution
- Links to data explorer and GitHub
- Modern gradient design
{
"scrape_info": {
"extraction_time": "2025-09-29 20:29:00",
"total_contributions": 278,
"sessions_processed": 16
},
"sessions": [
{
"session_info": {"id": "MOA", "name": "Monday Opening and Awards"},
"papers": [
{
"contribution_id": "2",
"contribution_code": "MOA01",
"type": "Invited Oral Presentation",
"title": "5 year operation of RIKEN super-conducting linac",
"date_time": "Monday, September 22, 2025 8:30 AM",
"abstract": "...",
"footnotes": "Author: SAKAMOTO, Naruhiko...",
"session": "Monday Opening and Awards"
}
]
}
]
}
- Session Filter: 16 sessions (MOA, MOP, TUA, etc.)
- Type Filter: Oral, Poster, Student presentations
- Author Filter: All contributing authors
- Presenter Filter: All presenters
- Date Filter: Conference dates
- Chrome 80+
- Firefox 75+
- Safari 13+
- Edge 80+
- Data file: 301KB (external loading)
- Initial load: <2 seconds
- Filter response: <100ms
- Memory usage: Minimal (client-side processing)
The web explorer is fully client-side and requires no server configuration. Simply open the HTML files in any modern web browser.
- Data Source: Original
contributions.pdf
must be present for extraction - File Size: The JavaScript data file is 301KB - ensure sufficient bandwidth
- Browser Security: Some browsers may block local file access for JavaScript
- Mobile Friendly: Responsive design works on all screen sizes
A: Place contributions.pdf
in the same directory and run python srf2025_pdf_extractor.py
A: Ensure srf2025_data.js
is in the same directory as the HTML files
A: Yes, all files are static and can be served from any web server
A: Edit the JavaScript in srf2025_data_explorer.html
to add custom filters
A: Re-run the PDF extraction script to regenerate clean data files
If you encounter issues, please check:
- Verify Python 3.7+ is installed
- Check all dependencies are installed
- Ensure
contributions.pdf
is accessible - Open browser console for JavaScript errors
- Check file permissions for data access
This project is for academic and research purposes. Please respect copyright and conference data usage policies.
- Complete SRF2025 data extraction (278 contributions)
- Interactive web explorer with 5 filter categories
- Modern UI with animations and responsive design
- External data file for performance
- Comprehensive documentation
Ming Liu (刘铭)
- GitHub: @iuming
- Project: SRF2025 Data Scraper and Explorer
- SRF2025 Conference organizers
- PyPDF2 library developers
- Open source community