Skip to content

Commit 6e6f82e

Browse files
committed
refactor: remove score_decay_rate and update formatting
Removed unused score_decay_rate from schema and documentation. Cleaned up code formatting and improved consistency across methods. Updated markdown documentation to align with current schema changes.
1 parent 1d199fe commit 6e6f82e

File tree

3 files changed

+40
-52
lines changed

3 files changed

+40
-52
lines changed

docs/db/schema.md

Lines changed: 10 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,3 @@
1-
Here’s an updated markdown version with explanations for `SemanticVector` and `score_decay_rate`:
2-
3-
---
4-
5-
# Graph Representation
6-
71
## Nodes
82

93
### Channel
@@ -24,9 +18,9 @@ Here’s an updated markdown version with explanations for `SemanticVector` and
2418

2519
- **topic_id**: Unique identifier
2620
- **name**: Summary of the topic
27-
- **keywords**: List of key terms with scores
28-
- **overall_score**: Average or cumulative score
21+
- **keywords**: List of key terms with associated weights (e.g., `[{"term": "AI", "weight": 0.35}, {"term": "neural networks", "weight": 0.28}]`)
2922
- **bertopic_metadata**: BerTopic metadata
23+
- **topic_embedding: Topic embedding
3024
- **updated_at**: Last updated timestamp
3125

3226
---
@@ -45,58 +39,31 @@ Here’s an updated markdown version with explanations for `SemanticVector` and
4539
### SemanticVector
4640

4741
- **vector_id**: Unique identifier
48-
- **semantic_vector**: Aggregated representation of recent message semantics in a channel. This vector captures the
49-
summarized, anonymized essence of new content without storing individual messages, aligning with privacy requirements.
42+
- **semantic_vector**: Aggregated representation of recent message semantics in a channel, preserving privacy by summarizing content instead of storing individual messages.
5043
- **created_at**: Creation date
5144

52-
> **Explanation**: The `SemanticVector` represents the semantic profile of recent messages in a channel, allowing
53-
> Concord to adjust topic relevance without storing each message. Each vector aggregates the semantics of recent content
54-
> into a general representation, which can influence the `channel_score` in `ASSOCIATED_WITH` relationships between
55-
> channels and topics. This approach maintains user privacy while updating topic relevance dynamically.
45+
> **Explanation**: The SemanticVector node represents a general semantic profile of recent messages in a channel, supporting dynamic topic relevance without storing each message individually. This approach aligns with privacy requirements while allowing for the adjustment of topic relevance.
5646
5747
---
5848

5949
## Relationships
6050

6151
### ASSOCIATED_WITH (Channel → Topic)
6252

63-
- **channel_score**: Cumulative or weighted score representing a topic’s importance or relevance to the channel
64-
- **keywords_weights**: Channel-specific keywords and their weights, reflecting the unique relationship between the
65-
channel and topic
53+
- **topic_score**: Cumulative or weighted score representing a topic’s importance or relevance to the channel
54+
- **keywords_weights**: Channel-specific keywords and their weights, highlighting the unique relationship between the channel and topic
6655
- **message_count**: Number of messages analyzed in relation to the topic
6756
- **last_updated**: Timestamp of the last update
68-
- **score_decay_rate**: Rate at which `channel_score` decreases over time if no new relevant messages are analyzed. This
69-
decay rate allows topic scores to adjust gradually, so less active or outdated topics diminish in relevance without
70-
active content.
7157
- **trend**: Indicator of topic trend over time within the channel
7258

73-
> **Explanation**: `score_decay_rate` ensures that topics associated with a channel decrease in relevance if no new
74-
> messages support their ongoing importance. This helps maintain an accurate and current reflection of active discussions
75-
> in a channel, giving more weight to trending or frequently discussed topics while allowing older or less relevant topics
76-
> to fade naturally.
59+
> **Explanation**: This relationship captures the importance of each topic to specific channels, with channel-specific keyword weights providing additional insight into unique topic-channel dynamics. `trend` enables tracking how each topic's relevance changes over time within the channel.
7760
7861
---
7962

8063
### RELATED_TO (Topic ↔ Topic)
8164

8265
- **similarity_score**: Degree of similarity between two topics
83-
- **temporal_similarity**: Time-based similarity metric to track changing topic relationships over time
84-
- **co-occurrence_rate**: Frequency with which two topics are discussed together across channels
66+
- **temporal_similarity**: Metric to track similarity over time
67+
- **co-occurrence_rate**: Frequency of concurrent discussion of topics across channels
8568
- **common_channels**: Number of shared channels discussing both topics
86-
- **topic_trend_similarity**: Similarity in trends or changes in relevance for each topic
87-
88-
```mermaid
89-
graph TD
90-
%% Nodes
91-
Channel["Channel<br>-------------------------<br>channel_id: Unique identifier<br>platform: Platform (e.g., Telegram)<br>name: Name of the channel<br>description: Description of the channel<br>created_at: Creation date<br>active_members_count: Number of active members<br>language: Language of the channel<br>region: Geographical region<br>activity_score: Posting activity score"]
92-
Topic["Topic<br>-------------------------<br>topic_id: Unique identifier<br>name: Summary of the topic<br>keywords: List of key terms with scores<br>overall_score: Average or cumulative score<br>bertopic_metadata: BerTopic metadata<br>updated_at: Last updated timestamp"]
93-
TopicUpdate["TopicUpdate<br>-------------------------<br>update_id: Unique identifier<br>channel_id: Associated channel<br>topic_id: Associated topic<br>keywords: Keywords from the update<br>score_delta: Change in topic score<br>timestamp: Update time"]
94-
SemanticVector["SemanticVector<br>-------------------------<br>vector_id: Unique identifier<br>semantic_vector: Aggregated semantics<br>created_at: Creation date"]
95-
%% Relationships
96-
Channel -.-> ASSOCIATED_WITH["ASSOCIATED_WITH Relationship<br>-------------------------<br>channel_score: Cumulative or weighted score<br>keywords_weights: Channel-specific keywords and weights<br>message_count: Number of messages analyzed<br>last_updated: Timestamp of last update<br>score_decay_rate: Rate of score decay<br>trend: Topic trend over time"] --> Topic
97-
Topic -.-> RELATED_TO["RELATED_TO Relationship<br>-------------------------<br>similarity_score: Degree of similarity<br>temporal_similarity: Time-based similarity<br>co-occurrence_rate: Co-occurrence of keywords<br>common_channels: Number of shared channels<br>topic_trend_similarity: Trend alignment"] --> Topic
98-
TopicUpdate --> Topic
99-
SemanticVector --> Channel
100-
```
101-
102-
---
69+
- **topic_trend_similarity**: Measure of similarity in topic trends across channels

src/bert/concord.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,13 @@
11
# concord.py
22

33
from bert.pre_process import preprocess_documents
4+
from graph.schema import Topic
45

56

6-
def concord(topic_model, documents):
7+
def concord(
8+
topic_model,
9+
documents,
10+
):
711
# Load the dataset and limit to 100 documents
812
print(f"Loaded {len(documents)} documents.")
913

@@ -40,4 +44,4 @@ def concord(topic_model, documents):
4044
print(f" {word_score_str}")
4145

4246
print("\nTopic modeling completed.")
43-
return len(documents), None
47+
return len(documents), Topic.create_topic()

src/graph/schema.py

Lines changed: 24 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,10 @@
1010

1111
# Relationship Models
1212
class AssociatedWithRel(StructuredRel):
13-
channel_score = FloatProperty()
13+
topic_score = FloatProperty()
1414
keywords_weights = ArrayProperty()
1515
message_count = IntegerProperty()
1616
last_updated = DateTimeProperty()
17-
score_decay_rate = FloatProperty()
1817
trend = StringProperty()
1918

2019

@@ -60,14 +59,13 @@ def create_channel(cls, platform: str, name: str, description: str,
6059

6160
def associate_with_topic(self, topic: 'Topic', channel_score: float,
6261
keywords_weights: List[str], message_count: int,
63-
score_decay_rate: float, trend: str) -> None:
62+
trend: str) -> None:
6463
self.topics.connect(
6564
topic, {
6665
'channel_score': channel_score,
6766
'keywords_weights': keywords_weights,
6867
'message_count': message_count,
6968
'last_updated': datetime.utcnow(),
70-
'score_decay_rate': score_decay_rate,
7169
'trend': trend
7270
})
7371

@@ -83,8 +81,8 @@ class Topic(StructuredNode):
8381
topic_id = UniqueIdProperty()
8482
name = StringProperty()
8583
keywords = ArrayProperty()
86-
overall_score = FloatProperty()
8784
bertopic_metadata = JSONProperty()
85+
topic_embedding = ArrayProperty()
8886
updated_at = DateTimeProperty(default_now=True)
8987

9088
# Relationships
@@ -96,17 +94,24 @@ class Topic(StructuredNode):
9694

9795
# Wrapper Functions
9896
@classmethod
99-
def create_topic(cls, name: str, keywords: List[str], overall_score: float,
97+
def create_topic(cls, name: str, keywords: List[str],
10098
bertopic_metadata: Dict[str, Any]) -> 'Topic':
99+
"""
100+
Create a new topic node with the given properties.
101+
"""
101102
return cls(name=name,
102103
keywords=keywords,
103-
overall_score=overall_score,
104104
bertopic_metadata=bertopic_metadata).save()
105105

106106
def relate_to_topic(self, other_topic: 'Topic', similarity_score: float,
107107
temporal_similarity: float, co_occurrence_rate: float,
108108
common_channels: int,
109109
topic_trend_similarity: float) -> None:
110+
"""
111+
Create a relationship to another topic with various similarity metrics.
112+
"""
113+
if not isinstance(other_topic, Topic):
114+
raise ValueError("The related entity must be a Topic instance.")
110115
self.related_topics.connect(
111116
other_topic, {
112117
'similarity_score': similarity_score,
@@ -118,10 +123,22 @@ def relate_to_topic(self, other_topic: 'Topic', similarity_score: float,
118123

119124
def add_update(self, update_keywords: List[str],
120125
score_delta: float) -> 'TopicUpdate':
126+
"""
127+
Add an update to the topic with keyword changes and score delta.
128+
"""
121129
update = TopicUpdate.create_topic_update(update_keywords, score_delta)
122130
update.topic.connect(self)
123131
return update
124132

133+
def set_topic_embedding(self, embedding: List[float]) -> None:
134+
"""
135+
Set the topic embedding vector, ensuring all values are floats.
136+
"""
137+
if not all(isinstance(val, float) for val in embedding):
138+
raise ValueError("All elements in topic_embedding must be floats.")
139+
self.topic_embedding = embedding
140+
self.save()
141+
125142

126143
class TopicUpdate(StructuredNode):
127144
update_id = UniqueIdProperty()

0 commit comments

Comments
 (0)