The next C# code find a string pattern(Including all possible chars) in a text file, by using MapReduce algorithm and Thread tools
The program will get the next Inputs:
Text: text file to analysis.
Keyword: the string-pattern to search in the text file
ThreadsNumber: How many threads will used for the search
Delta: The distance between letters in the search pattern
The pogram will return the next Output:
The program will output the locations of te keword in the txt file, in the next order:
[row number,location in the row] with referring to the delta parameter.
NetaLavi.txt:
[Row 0]:Neta Lavi is the Best
[Row 1]:Football player in the world
[Row 2]:Neta Lavi6 is a legend
Input:
•Text: NetaLavi.txt •Keyword: Neta •ThreadsNumber:2 •Delta:0
Output:
[0,0] [2,0]
Input:
•Text: NetaLavi.txt •Keyword: Ni •ThreadsNumber:4 •Delta:7
Output:
[0,3], [1,0]
Input:
•Text: NetaLavi.txt •Keyword: al •ThreadsNumber:3 •Delta:2
Output:
[0,3], [1,5], [2,3], [2,14]
My program is based on The MapReduce algorithm and the Regular Expression tools.
•Regular Expression: A regular expression is a sequence of characters that specifies a search pattern in the text. Usually, such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation.
•The MapReduce algorithm: MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel. The MapReduce algorithm contains two important tasks, namely Map and Reduce. Mapper class takes the input and maps it. The output of the Mapper class is used as input by the Reducer class, which in turn searches for matching pairs and reduces them.
Why use these two methods?
- MapReduce implements various mathematical algorithms to divide a task into small parts and assign them to multiple systems. In technical terms, the MapReduce algorithm helps in sending the Map & Reduce tasks to appropriate servers. In my program, I perform a search process using Threads. The Threads is our multiple systems.
- The main use of regular expressions is to match patterns of the text so that the program can easily recognize and manipulate the text file. In our Text, we are required to fast search processing, and used regular expression Allows it.
My Algorithm includes three main steps:
Step A: Data pre-processing
Step B: The Map phase
Step C: The Reduce phase
in this Process, we will calculate the next four elements:
1.(Regex) rx_For_Pre_Processing: Calculate the Regex to Identify the Candidate's Strings
2.(Regex) rx_For_Match: Calculate the Regex Who Identify the matches String
3.(int) CandidateStringLen: Calculate the Length of the string
4.(string[]) Text Lines: Read the file.txt to an array. Each cell represents a row in the text file.
The map task is done by means of Mapper Class.The Map phase processes take the text input file and provides the string who candidate for the match by the next representation :
(<key, Val> : <Row_In_the_text_file, Location_In_the_row>).
The map Process include a use in Regular expression
The reduce task is done by means of Reducer Class:
The Reduce phase (searching technique) will accept the input from the Map phase as a key-value pair with Row_In_the_text_file and Location_In_the_row.
Using searching technique, the combiner will check all the Key value pair and he will print to the screen the String how match for the Matching Regular Expression.