Feature: Head Office Location Detection

**TL;DR:**

To implement "Head Office Location Detection," complete the sub-issues in this order:
1. DuckDuckGo robots.txt compliance (#16)
2. Google-dorking/contact page scraping (#15)
3. Output format/validation (#13)
4. Manual lookup fallback (#14)
5. Config/docs update (#12)
6. Testing/QA (#11)

Compliance comes first, then extraction, output, manual fallback, docs/config, and finally QA.

---

**Feature Name**
Head Office Location Detection

**Is your feature request related to a problem? Please describe.**
Currently, Elvis does not identify or record the Head Office location of companies in the call list. This information would be valuable for users who need to know the main office location for outreach or reporting.

**Describe the solution you'd like**
Elvis should extract and include the Head Office location for each company, if available, during the scraping process. This could be an additional field in the output or a new column in the call list.

**Implementation details (optional)**
- Add logic to the extraction scripts (e.g., `data_input.sh`, `lib/*.awk`, `lib/*.sed`) to parse and capture Head Office location using various and multiple google-dorking queries via [https://lite.duckduckgo.com/lite](https://lite.duckduckgo.com/lite) to obtain company head office address in the search results and further refinement by accessing the company web pages (contact page) and business listings.
- If no result is found via automated methods, flag the entry with a tag such as "Manual lookup needed" to indicate that human intervention is required.
- Update the output format and validation scripts to support the new field and flag.
- Consider configuration options in `etc/elvisrc` for toggling this feature.
- Ensure all queries to DuckDuckGo Lite comply with [robots.txt](https://lite.duckduckgo.com/robots.txt).

**Alternatives considered**
- Manual lookup of Head Office locations after scraping as a fallback when automated methods do not yield results (flagged as "Manual lookup needed").

**Additional context**
This request was received as feedback from a user via the GitHub MCP server.

**Compliance Check**
- [ ] I have reviewed the compliance settings in [etc/elvisrc](../../etc/elvisrc).
- [ ] I have read the [Security policy](../../SECURITY.md).
- [ ] I have read the [Contribution guidelines](../../CONTRIBUTING.md).
- [ ] This feature does not bypass robots.txt or violate any security guidelines.

**Project Board:**
https://github.com/users/2MuchC0ff33/projects/1

---

### Recommended Sub-Issue Completion Order

To implement the "Head Office Location Detection" feature efficiently, please address the sub-issues in the following order:

1. **#16: DuckDuckGo robots.txt compliance**
   - Ensure all scraping and data extraction methods are compliant with robots.txt and project compliance policies.
2. **#15: Google-dorking/contact page scraping**
   - Implement logic to extract head office/contact info from company websites using compliant search and scraping methods.
3. **#13: Output format/validation**
   - Define and validate the output format for head office location data in the calllist.
4. **#14: Manual lookup fallback**
   - Add a manual lookup or override mechanism for cases where automated extraction fails.
5. **#12: Config/docs update**
   - Update configuration files and documentation to reflect new options, toggles, and compliance notes.
6. **#11: Testing/QA**
   - Implement and run tests to ensure all new logic is robust, compliant, and meets quality standards.

**Rationale:**
- Compliance must be established first to avoid rework and ensure all subsequent work is policy-aligned.
- Extraction logic should be implemented before output and manual fallback mechanisms.
- Output format and validation should be defined before integrating manual overrides.
- Documentation and configuration should be updated after core logic is in place.
- Testing and QA should be the final step to validate the complete feature.

---

Please follow this order to maximize efficiency and minimize rework. If dependencies or blockers arise, update this list accordingly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Head Office Location Detection #9

Recommended Sub-Issue Completion Order

Sub-issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Feature: Head Office Location Detection #9

Description

Recommended Sub-Issue Completion Order

Sub-issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions