Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mahdieh #152

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
289 changes: 289 additions & 0 deletions source/NeoCortexUtils/docs/ApproveMultiSequenceLearning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,289 @@
# ML22/23-15 Approve Prediction of Multisequence Learning

## Introduction

In this project, we have tried to implement new methods along the `MultisequenceLearning` algorithm. The new methods automatically read the dataset from the given path in `HelperMethods.ReadDataset(datasetPath)`, we also have test data in another file which needs to be read for later testing the subsequences in a similar form as `HelperMethods.ReadDataset(testsetPath)`. `RunMultiSequenceLearningExperiment(sequences, sequencesTest)` takes the multiple sequences in `sequences` and test subsequences in `sequencesTest` and is passed to `RunMultiSequenceLearningExperiment(sequences, sequencesTest)`. After learning is completed, calculation of the accuracy of the predicted element.

## Implementation

![image](./images/overview.png)

Fig: Architecture of Approve Prediction of Multisequence Learning

Above the flow of implementation of our project.

`Sequence` is the model of how we process and store the dataset. And can be seen below:

```csharp
public class Sequence
{
public String name { get; set; }
public int[] data { get; set; }
}
```

eg:
- Dataset

```json
[
{
"name": "S1",
"data": [ 0, 2, 5, 6, 7, 8, 10, 11, 13 ]
},
{
"name": "S2",
"data": [ 1, 2, 3, 4, 6, 11, 12, 13, 14 ]
},
{
"name": "S3",
"data": [ 1, 2, 3, 4, 7, 8, 10, 12, 14 ]
}
]
```

- Test Dataset

```json
[
{
"name": "T1",
"data": [ 1, 2, 4 ]
},
{
"name": "T2",
"data": [ 2, 3, 4 ]
},
{
"name": "T3",
"data": [ 4, 5, 7 ]
},
{
"name": "T4",
"data": [ 5, 8, 9 ]
}
]

```

Our implemented methods are in `HelperMethod.cs` and can be found [here](../HelperMethods.cs):

1. FetchHTMConfig()

Here we save the HTMConfig which is used for Hierarchical Temporal Memory to `Connections`

```csharp
/// <summary>
/// HTM Config for creating Connections
/// </summary>
/// <param name="inputBits">input bits</param>
/// <param name="numColumns">number of columns</param>
/// <returns>Object of HTMConfig</returns>
public static HtmConfig FetchHTMConfig(int inputBits, int numColumns)
{
HtmConfig cfg = new HtmConfig(new int[] { inputBits }, new int[] { numColumns })
{
Random = new ThreadSafeRandom(42),

CellsPerColumn = 25,
GlobalInhibition = true,
LocalAreaDensity = -1,
NumActiveColumnsPerInhArea = 0.02 * numColumns,
PotentialRadius = (int)(0.15 * inputBits),
MaxBoost = 10.0,
DutyCyclePeriod = 25,
MinPctOverlapDutyCycles = 0.75,
MaxSynapsesPerSegment = (int)(0.02 * numColumns),
ActivationThreshold = 15,
ConnectedPermanence = 0.5,e.
PermanenceDecrement = 0.25,
PermanenceIncrement = 0.15,
PredictedSegmentDecrement = 0.1,
};

return cfg;
}
```

All the fields are self-explanatory as per HTM theory.

2. getEncoder()

We have used `ScalarEncoder` since we are encoding all numeric values only.

Remeber that `inputBits` is same as `HTMConfig`.

```csharp
/// <summary>
/// Get the encoder with settings
/// </summary>
/// <param name="inputBits">input bits</param>
/// <returns>Object of EncoderBase</returns>
public static EncoderBase GetEncoder(int inputBits)
{
double max = 20;

Dictionary<string, object> settings = new Dictionary<string, object>()
{
{ "W", 15},
{ "N", inputBits},
{ "Radius", -1.0},
{ "MinVal", 0.0},
{ "Periodic", false},
{ "Name", "scalar"},
{ "ClipInput", false},
{ "MaxVal", max}
};

EncoderBase encoder = new ScalarEncoder(settings);

return encoder;
}
```

Note that `MaxValue` for the encoder is set to `20` which can be changed but then this value should be matched while creating the synthetic dataset.

3. ReadDataset()

Reads the JSON file when passed as a full path and returns the object of the list of `Sequence`

```csharp
/// <summary>
/// Reads dataset from the file
/// </summary>
/// <param name="path">full path of the file</param>
/// <returns>Object of list of Sequence</returns>
public static List<Sequence> ReadDataset(string path)
{
Console.WriteLine("Reading Sequence...");
String lines = File.ReadAllText(path);
//var sequence = JsonConvert.DeserializeObject(lines);
List<Sequence> sequence = System.Text.Json.JsonSerializer.Deserialize<List<Sequence>>(lines);

return sequence;
}
```

4. CreateDataset()

We enhanced to create datasets automatically so we do not have to manually spend time. Here we create a dataset with parameters such as `numberOfSequence` to be created, `size` of a sequence, `startVal` possibly start range, and `endVal` the start range of the sequence.

```csharp
/// <summary>
/// Creates a list of Sequence as per configuration
/// </summary>
/// <returns>Object of list of Sequence</returns>
public static List<Sequence> CreateDataset()
{
int numberOfSequence = 3;
int size = 12;
int startVal = 0;
int endVal = 15;
Console.WriteLine("Creating Sequence...");
List<Sequence> sequence = HelperMethods.CreateSequences(numberOfSequence, size, startVal, endVal);

return sequence;
}
```

Note that `endVal` should be less than equal to `MaxVal` of `ScalarEncoder` used above

5. SaveDataset()

Saves the dataset in the `dataset` director of the `BasePath` of the application where it is running.

```csharp
/// <summary>
/// Saves the dataset in the 'dataset' folder in BasePath of the application
/// </summary>
/// <param name="sequences">Object of list of Sequence</param>
/// <returns>Full path of the dataset</returns>
public static string SaveDataset(List<Sequence> sequences)
{
string BasePath = AppDomain.CurrentDomain.BaseDirectory;
string reportFolder = Path.Combine(BasePath, "dataset");
if (!Directory.Exists(reportFolder))
Directory.CreateDirectory(reportFolder);
string reportPath = Path.Combine(reportFolder, $"dataset_{DateTime.Now.Ticks}.json");

Console.WriteLine("Saving dataset...");

if (!File.Exists(reportPath))
{
using (StreamWriter sw = File.CreateText(reportPath))
{
sw.WriteLine(JsonConvert.SerializeObject(sequences));
}
}

return reportPath;
}
```

6. Calculating accuracy in PredictNextElement() in `Program.cs`

![image](./images/approve_prediction.png)

Fig: Predictions and calculating accuracy

```csharp
int matchCount = 0;
int predictions = 0;
double accuracy = 0.0;

foreach (var item in list)
{
Predict();
//compare current element with prediction of previous element
if(item == Int32.Parse(prediction.Last()))
{
matchCount++;
}
predictions++;
accuracy = (double)matchCount / predictions * 100;
}
```

Note that the prediction code is omitted.

## How to run the project

### To create a synthetic dataset

1. Open the [sln](../../../NeoCortexApi.sln) and select `MultiSequenceLearning` as startup project.

2. In `Program.cs` we have the `Main()`. Uncomment the below code to create a synthetic dataset.

```csharp
//to create a synthetic dataset
string path = HelperMethods.SaveDataset(HelperMethods.CreateDataset());
Console.WriteLine($"Dataset saved: {path}");
```

*and comment rest of the lines*.

3. Run to create the dataset and save the path of the dataset folder and name.

![dataset](./images/dataset.jpg)

### To run the experiment

1. Open the [NeoCortexApi.sln](../../../NeoCortexApi.sln) and select `MultiSequenceLearning` as startup project.

2. In `Program.cs` we have the `Main()`. Change the name of `dataset` file saved from previous run as seen below:

```csharp
//to read dataset
string BasePath = AppDomain.CurrentDomain.BaseDirectory;
string datasetPath = Path.Combine(BasePath, "dataset", "dataset_03.json"); //edit name of dataset here
Console.WriteLine($"Reading Dataset: {datasetPath}");
List<Sequence> sequences = HelperMethods.ReadDataset(datasetPath);
```

and also *copy the [test data](../dataset/test_01.json) to the folder* (`{BASEPATH}\neocortexapi\source\MySEProject\MultiSequenceLearning\bin\Debug\net6.0\dataset`).

## Results

We have run the experiment max possible number of times with different datasets. We have tried to keep the size of the dataset small and several sequences are also small due to the large time in execution.

![results](./images/result.png)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added source/NeoCortexUtils/docs/images/dataset.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added source/NeoCortexUtils/docs/images/overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added source/NeoCortexUtils/docs/images/result.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 4 additions & 3 deletions source/NeoCortexUtils/docs/working-with-sdrs.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
# working-with-sdrs

## Introduction:
Neural network is a focal element in the area of machine learning. Inspired by the biological neurons that are present in the human brain, an artificial neural network is designed that mimics the human brain’s behavior, helping computer programs to identify patterns and answers to these related issues. It would be able to perform actions like the human brain and have the capability of learning things. These neural networks work on the principle of learning input/output operations. In this project, SDR representation has been implemented in a variety of ways, including SDR as indices and bitmaps. Furthermore, we developed methods for comparing two SDRs by using intersection, union, and overlap. In addition, we have added a new representation of Spatial pooler learning as a "Column/Overlap" ratio, which is another representation of a heatmap.
Neural networks represent a cornerstone in the field of machine learning, drawing inspiration from the biological neurons in the human brain. These artificial neural networks are engineered to emulate the brain's ability to recognize patterns and solve complex problems. Central to this capability is their potential to learn and perform tasks akin to human cognition, grounded in the principle of learning from input/output operations.

The inputs that we are using are scalar values and images. We specified how these inputs are converted to SDR. Furthermore, this procedure of SDR representations involves the use of Encoders, Spatial Pooler (SP), and Temporal Memory (TM). Encoders are the basic components used in this network, which takes human justifiable information as input data i.e. (image, scalar value), and changes it to machine-readable format, binary array with n size. SP uses these encoded binary arrays from encoders as input for the generation of SDRs.
In our project, we have explored various implementations of Sparse Distributed Representation (SDR), including using SDRs as indices and bitmaps. Our methods for comparing SDRs involve techniques like intersection, union, and overlap calculations. Additionally, we've introduced a novel concept: representing Spatial Pooler learning through a "Column/Overlap" ratio, akin to a heatmap representation.

TM is used to learn the sequence of these generated SDRs which are given as input from the Spatial Pooler (SP).
The inputs for our neural network are scalar values and images. We have detailed the process of converting these inputs into SDRs, which is a crucial step in our methodology. This conversion involves several key components: Encoders, the Spatial Pooler (SP), and Temporal Memory (TM). Encoders serve as the initial processing unit, transforming human-interpretable data (such as images or scalar values) into a binary array format that is machine-readable. The Spatial Pooler then takes these encoded arrays and generates SDRs.

Finally, Temporal Memory plays a pivotal role in learning the sequences of these SDRs, which are fed from the Spatial Pooler. This learning process is fundamental in enabling the neural network to understand and predict patterns in the data, a critical aspect of machine learning.

#### What is an SDR:
According to recent research in neuroscience, our brain uses SDRs to process information. SDRs are the binary representation of data which is approximately 2% of bits that are active. In SDRs, each bit has a meaning i.e. the active bits in the same places of two different vectors make them semantically similar. By comparing SDRs of different samples, the similarity between them can be estimated. For storing the SDRs, a list of indices of active bits is kept which saves a lot of space.
Expand Down
Loading