-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implemented IO for file reading and then data generation
- Loading branch information
1 parent
c37d01f
commit 47dccc4
Showing
6 changed files
with
60 additions
and
26 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Once upon a time, in a land far away, there was a small village. The villagers were known for their kindness and generosity. Every year, they celebrated the harvest festival with music, dance, and delicious food. One day, a traveler came to the village. He was tired and hungry, but the villagers welcomed him with open arms. The traveler shared stories of his adventures as the villagers listened intently. He told them about distant lands and strange creatures. The villagers were fascinated by his tales. As the evening drew to a close, the traveler offered to leave the village, but the villagers insisted he stay for another night. The next morning, the traveler said goodbye and continued his journey. The villagers waved him off, grateful for the stories and the company. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
use regex::Regex; | ||
use std::fs; | ||
|
||
fn split_into_sentences(text: String) -> Vec<String> { | ||
let re = Regex::new(r"[.!?]").unwrap(); // Matches sentence-ending punctuation | ||
let mut sentences: Vec<String> = Vec::new(); // We want to store owned Strings, not &str | ||
|
||
let mut last_index = 0; | ||
for mat in re.find_iter(&text) { | ||
let end = mat.end(); | ||
// Extract the sentence up to the matched punctuation | ||
let sentence = text[last_index..end].trim().to_string(); // Convert to String | ||
if !sentence.is_empty() { | ||
sentences.push(sentence); | ||
} | ||
last_index = end; | ||
} | ||
|
||
// Add any remaining text as a sentence | ||
if last_index < text.len() { | ||
let remaining = text[last_index..].trim().to_string(); // Convert remaining to String | ||
if !remaining.is_empty() { | ||
sentences.push(remaining); | ||
} | ||
} | ||
|
||
sentences | ||
} | ||
|
||
pub fn get_input() -> Vec<String> { | ||
let file_path = "src/data/in/training_input.txt"; | ||
let content: String = fs::read_to_string(file_path).unwrap(); // Read the file content | ||
split_into_sentences(content) // Call the function to split into sentences | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters