Skip to content

Commit

Permalink
Merge pull request #13 from Fundacio-i2CAT/10-integration-of-ollama-l…
Browse files Browse the repository at this point in the history
…lm-container

Add ollama container for LLM capabilities and enhanced employee profiling
  • Loading branch information
xampla authored Feb 7, 2024
2 parents 4d12bf5 + 7772390 commit 1c96186
Show file tree
Hide file tree
Showing 16 changed files with 243 additions and 131 deletions.
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
# InfoHound - OSINT tool for domain profiling
During the reconnaissance phase, an attacker searches for any information about his target to create a profile that will later help him to identify possible ways to get in an organization. InfoHound performs passive analysis techniques (which do not interact directly with the target) using OSINT to extract a large amount of data given a web domain name. This tool will retrieve emails, people, files, subdomains, usernames and urls that will be later analyzed to extract even more valuable information.

In addition, InfoHound leverages an Ollama instance, enabling it to harness the power of LLM (Large Language Models). By default, InfoHound utilizes llama2, a pre-trained LLM model, to generate brief descriptions of the roles of individuals within an organization. This functionality provides valuable insights into the organizational structure and facilitates the identification of key personnel. Moreover, ongoing development efforts aim to introduce additional features and enhancements to further enrich the tool's capabilities in the future.

## :house: Infohound architecture
<p align="center"><img src="https://github.com/Fundacio-i2CAT/InfoHound/blob/main/infohound_diagram.jpg" alt="Infohound diagram" ></p>
<p align="center"><img src="https://github.com/Fundacio-i2CAT/InfoHound/blob/main/new_infohound_diagram.jpg" alt="Infohound diagram" ></p>

## 🛠️ Installation
```
Expand Down Expand Up @@ -38,6 +40,7 @@ InfoHound has 2 different types of modules, those which retreives data and those
| Find Emails From URLs | Sometimes, the discovered URLs can contain sensitive information. This task retrieves all the emails from URL paths. |
| Execute Dorks | It will execute the dorks defined in the dorks folder. Remember to group the dorks by categories (filename) to understand their objectives. |
| Find Emails From Dorks | By default, InfoHound has some dorks defined to discover emails. This task will look for them in the results obtained from dork execution. |
| Find People From Google | Uses the Google JSON API to find people who work in the company asociated to the domain |

### :microscope: Analysis
| Name | Description |
Expand All @@ -51,6 +54,7 @@ InfoHound has 2 different types of modules, those which retreives data and those
| Get Emails From Files Content | Usually, emails can be included in corporate files, so this task will retrieve all the emails from the downloaded files' content. |
| Find Registered Services using Emails | It is possible to find services or social networks where an email has been used to create an account. This task will check if an email InfoHound has discovered has an account in Twitter, Adobe, Facebook, Imgur, Mewe, Parler, Rumble, Snapchat, Wordpress, and/or Duolingo. |
| Check Breach | This task checks Firefox Monitor service to see if an email has been found in a data breach. Although it is a free service, it has a limitation of 10 queries per day. If Leak-Lookup API key is set, it also checks it. |
| AI-Powered Profile Analisys | You can use the profile analysis task to employ an AI-powered tool that examines the metadata and creates a description for you. |

## :pill: Custom modules
InfoHound lets you create custom modules, you just need to add your script inside `infohoudn/tool/custom_modules`. One custome module has been added as an example which uses [Holehe](https://github.com/megadose/holehe) tool to check if the emails previously are attached to an account on sites like Twitter, Instagram, Imgur and more than 120 others.
Expand Down
20 changes: 20 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,15 @@ services:
- db
- redis
- celery_worker
- ollama
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
- POSTGRES_DB=infohound_db
- REDIS_HOST=redis
- REDIS_PORT=6379
command: sh -c "python manage.py makemigrations infohound && python manage.py migrate && python manage.py runserver 0.0.0.0:8000"

celery_worker:
build:
context: .
Expand All @@ -35,10 +37,12 @@ services:
- REDIS_HOST=redis
- REDIS_PORT=6379
command: sh -c "celery -A infohound_project worker --loglevel=info"

redis:
image: redis:latest
ports:
- '6378:6379'

db:
image: postgres:12
volumes:
Expand All @@ -47,5 +51,21 @@ services:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
- POSTGRES_DB=infohound_db

ollama:
image: ollama/ollama:latest
ports:
- '11434:11434'
# Uncomment if you want to use GPU. More info: https://ollama.ai/blog/ollama-is-now-available-as-an-official-docker-image
#environment:
# - gpus=all
#deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: 1
# capabilities: [gpu]
volumes:
postgres_data:

5 changes: 4 additions & 1 deletion infohound/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,11 @@ class Domain(models.Model):

class People(models.Model):
name = models.CharField(max_length=255)
phones = models.JSONField(default=list, null=True)
phones = models.JSONField(null=True,default=list)
social_profiles = models.JSONField(default=list)
ocupation_summary = models.TextField(default="This profile doesn't have a description yet. You can use the profile analysis task to employ an AI-powered tool that examines the metadata and creates a description for you.")
raw_metadata = models.TextField(null=True,default=None)
url_img = models.TextField(default="https://static.thenounproject.com/png/994628-200.png")
source = models.CharField(max_length=255)
domain = models.ForeignKey(Domain, on_delete=models.CASCADE)

Expand Down
76 changes: 40 additions & 36 deletions infohound/static/infohound/js/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -186,34 +186,34 @@ function loadPeople() {
person_name = person.name.length == 0 ? "[Not found]" : person_name

card.innerHTML = `
<div class="row g-0">
<div class="col-md-4 d-flex align-items-center justify-content-center">
<svg xmlns="http://www.w3.org/2000/svg" width="80%" height="80%" fill="currentColor" class="bi bi-person-circle" viewBox="0 0 16 16">
<path d="M11 6a3 3 0 1 1-6 0 3 3 0 0 1 6 0z"/>
<path fill-rule="evenodd" d="M0 8a8 8 0 1 1 16 0A8 8 0 0 1 0 8zm8-7a7 7 0 0 0-5.468 11.37C3.242 11.226 4.805 10 8 10s4.757 1.225 5.468 2.37A7 7 0 0 0 8 1z"/>
</svg>
<div class="card-body">
<div class="row">
<div class="col-md-3 p-1">
<img src="${person.url_img}" class="img-fluid float-left">
</div>
<div class="col-md-9">
<h5 class="card-title">${person_name}</h5>
<div class="d-flex align-items-center mb-2">
<i class="bi bi-envelope-fill me-2"></i>
<span class="me-3">${person.emails}</span>
<i class="bi bi-telephone-fill me-2"></i>
<span class="me-3">${person.phones}</span>
<i class="bi bi-key-fill me-2"></i>
<span class="me-3">${person.keys}</span>
<i class="bi bi-person-fill me-2"></i>
<span>${person.accounts}</span>
</div>
<div class="col-md-8">
<div class="card-body">
<h5 class="card-title">${person_name}</h5>
<div class="d-flex align-items-center mb-2">
<i class="bi bi-envelope-fill me-2"></i>
<span class="me-3">${person.emails}</span>
<i class="bi bi-telephone-fill me-2"></i>
<span class="me-3">${person.phones}</span>
<i class="bi bi-key-fill me-2"></i>
<span class="me-3">${person.keys}</span>
<i class="bi bi-person-fill me-2"></i>
<span>${person.accounts}</span>
</div>
<hr>
<div class="d-flex align-items-center justify-content-center">
${socialIcons}
</div>
<div class="personID d-none">${person.id}</div>
</div>
<div class="col-md-12">
<small>${person.ocupation_summary}</small>
</div>
<hr>
<div class="d-flex align-items-center justify-content-center">
${socialIcons}
</div>
<div class="personID d-none">${person.id}</div>
</div>
</div>
</div>
`;
col.appendChild(card)
cardContainer.append(col);
Expand Down Expand Up @@ -310,18 +310,14 @@ function loadTasks() {
"findEmailsTask", "findEmailsFromURLsTask", "findSocialProfilesByEmailTask"]
data.forEach(task => {
const card = document.createElement('div');
card.className = 'card shadow mb-3';
card.className = 'col-md-4 p-3';
b = `
<div class="col-1 d-flex justify-content-center align-items-center">
<button id="${task.id}" type="button" class="btn btn-primary task-executer">Execute</button>
</div>
<button id="${task.id}" type="button" class="btn btn-primary task-executer">Execute</button>
`;
pb = "";
if(task.state == "PENDING") {
b = `
<div class="col-1 d-flex justify-content-center align-items-center">
<button type="button" class="btn btn-info" disabled>${task.state}</button>
</div>
<button type="button" class="btn btn-info" disabled>${task.state}</button>
`
pb = `
<div class="progress" role="progressbar" aria-label="Animated striped example" aria-valuenow="100" aria-valuemin="0" aria-valuemax="100">
Expand Down Expand Up @@ -363,16 +359,24 @@ function loadTasks() {
}

card.innerHTML = `
<div class="card-body">
<div class="card shadow h-100">
<div class="card-body d-flex flex-column">
<div class="row">
<div class="col-11">
<div class="col-md-12">
${h5}
<p class="card-text">${task.description}</p>
${pb}
</div>
${b}
</div>
<div class="row flex-fill">
<div class="col-md-12 d-flex justify-content-end align-items-end">
${b}
</div>
</div>
<div class="col-md-12 pt-1">
${pb}
</div>
</div>
</div>
`;
if (task.type == "Retrieve") {
taskRetrievalContainer.appendChild(card);
Expand Down
18 changes: 15 additions & 3 deletions infohound/tasks.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
from infohound.tool.retriever_modules import domains,subdomains,urls,files,emails,people,dorks
from infohound.tool.analysis_modules import domain_analysis,email_analysis,files_analysis,usernames_analysis
from infohound.tool.analysis_modules import domain_analysis,email_analysis,files_analysis,usernames_analysis,people_analisys
from celery import shared_task
import trio
import importlib

# ------------------------------------- #
# ------------- RETRIEVAL ------------- #
# ------------------------------------- #

@shared_task(bind=True, name="get_whois_info")
def getWhoisInfoTask(self, domain):
Expand Down Expand Up @@ -50,9 +53,14 @@ def executeDorksTask(self, domain):
def findEmailsFromDorksTask(self, domain):
emails.findEmailsFromDorks(domain)

@shared_task(bind=True, name="find_people_from_google")
def findPeopleFromGoogleTask(self, domain):
people.findPeopleFromGoogle(domain)


# -------------ANALYSIS-------------
# ------------------------------------- #
# ------------- ANALYSIS -------------- #
# ------------------------------------- #

@shared_task(bind=True, name="subdomain_take_over_analysis")
def subdomainTakeOverAnalysisTask(self, domain):
domain_analysis.subdomainTakeOverAnalysis(domain)
Expand Down Expand Up @@ -89,6 +97,10 @@ def findRegisteredSitesTask(self, domain):
def checkBreachTask(self, domain):
email_analysis.checkBreach(domain)

@shared_task(bind=True, name="summarize_profile")
def summarize_profile(self, domain):
people_analisys.summarize_profile(domain)

# --------------CUSTOM--------------

@shared_task(bind=True, name="custom_task")
Expand Down
24 changes: 24 additions & 0 deletions infohound/tool/ai_assistant/ollama.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
from ollama import Client
from infohound_project.settings import OLLAMA_URL,OLLAMA_MODEL

def check_or_pull_model(client):
models = client.list()
present = False
for model in models["models"]:
if OLLAMA_MODEL == model["name"].split(":")[0]:
present = True
if not present:
client.pull(OLLAMA_MODEL)

def ollama_flexible_prompt(in_prompt):
client = Client(host=OLLAMA_URL)
check_or_pull_model(client)
desc = None
try:
res = client.generate(model=OLLAMA_MODEL,prompt=in_prompt)
except Exception as e:
print(f"Could not call Ollama instance: {e}")

if "response" in res:
desc = res["response"].strip()
return desc
17 changes: 17 additions & 0 deletions infohound/tool/analysis_modules/people_analisys.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
import time
from infohound.models import People
from infohound.tool.ai_assistant import ollama

def summarize_profile(domain_id):
queryset = People.objects.filter(domain_id=domain_id, ocupation_summary__contains="This profile doesn't have a description yet")

for entry in queryset.iterator():
try:
summarize_prompt = "Summarize the ocupation of the person in just 150 words given the following data: "
raw_data = entry.raw_metadata
print ("Executing AI-Powered Profile Analisis of: " + entry.name)
entry.ocupation_summary = ollama.ollama_flexible_prompt(summarize_prompt + raw_data)
print ("Summary: " +entry.ocupation_summary)
entry.save()
except Exception as e:
print(f"Error inesperado: {str(e)}")
47 changes: 45 additions & 2 deletions infohound/tool/data_sources/google_data.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import requests
import json
import html
import time
import urllib.parse
import infohound.tool.infohound_utils as infohound_utils
from bs4 import BeautifulSoup
Expand Down Expand Up @@ -50,6 +51,48 @@ def getUrls(query):
#- files
#- url

def discoverPeople (query):
start = 1
total_results = 0
total_gathered = 0
limit = False
results = True
people = []

print("Testing query: " + query)

while results and start < 100 and not limit:
payload = {"key":API_KEY,"cx":ID,"start":start,"q":query}
res = requests.get("https://www.googleapis.com/customsearch/v1",params=payload)
data = json.loads(res.text)
if "error" in data:
print(data["error"]["status"])
limit = True
else:
if start == 1:
total_results = data["searchInformation"]["totalResults"]
if "items" in data:
for item in data["items"]:
try:
url = item["link"]
first_name = item["pagemap"]["metatags"][0]["profile:first_name"]
last_name = item["pagemap"]["metatags"][0]["profile:last_name"]
url_img = item["pagemap"]["cse_image"][0]["src"]
name = f"{first_name} {last_name}"
people.append((name,url,json.dumps(item),url_img))
print("Added: " + name)
total_gathered = total_gathered + 1
except KeyError as e:
print(f"Error: The key '{e.args[0]}' is not present in the results.")
except Exception as e:
print(f"Unexpected error: {str(e)}")
else:
results = False
start = start + 10
time.sleep(1)

print("Found "+str(total_results)+" and added "+str(total_gathered))
return (people)

def discoverEmails(domain):
emails = []
Expand Down Expand Up @@ -101,7 +144,7 @@ def discoverSocialMedia(domain,email):
scope = email.split("@")[1]

url = f"https://www.google.com/search?q='{username}' {scope}"
cookies = {"CONSENT": "YES+srp.gws"}
cookies = {"CONSENT": "YES+","SOCS":"CAISHAgCEhJnd3NfMjAyNDAxMzEtMF9SQzQaAmVzIAEaBgiAkIuuBg"}

try:
user_agent = infohound_utils.getUserAgents()
Expand Down Expand Up @@ -179,4 +222,4 @@ def discoverSocialMediaByDorks(domain,email):
return data




2 changes: 0 additions & 2 deletions infohound/tool/infohound_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,6 @@ def extractSocialInfo(text):
if t is not None:
data.append(t.group(0))



# Twitter
regex = r"(http(s)?:\/\/)?([\w]+\.)?twitter\.com\/[^&\/?\"\%]*"
t = re.search(regex, text)
Expand Down
Loading

0 comments on commit 1c96186

Please sign in to comment.