clustering algorithm evaluation #43

ddfridley · 2023-03-15T18:44:31Z

Create a standalone node program that generates test data, that is mongo document like, into an array.
Build a clustering algorithm and run it on the data.
Evaluate the results.

Mongo document like means it has an _id property that is a unique string.
import ObjectID from 'isomorphic-mongo-objectid/src/isomorphic-mongo-objectid'
use this to generate ObjectID.

const statements=[
  {  _id: ObjectID(),
     description: "3", // a random number
     userId: // an ObjectId
   },
...
]

const groups=[
  {  _id: 
    userId: //
    groupings: [
       [statementId1, statmentId2],
       [staementId7,statementId3]
    ],
    allStatements: [
      statementId1,
      statementId2,
      ....
    ]
  },
  ...
]

The text was updated successfully, but these errors were encountered:

gengjianye1997 · 2023-03-20T18:46:52Z

Last week, I read the documents of MongoDB and started writing the test data. I've finished generating the User and Statements data.
This week, I plan to complete the work of generating test data and try to apply 1 - 2 clustering algorithms to test data to obtain clustering results.
For now, I don't have other blocks.

gengjianye1997 · 2023-04-03T18:59:54Z

Last week, I finished generating test data for the clustering algorithm, mainly working on the groups data. I selected 20 statements for each user according to the rule and then group the statements according to the user type and save them in groups data. I also checked the generated test data and made sure it meets the requirements.
The relevant files have already been pushed into the clustering brunch.

For the rest of the week, I will write the clustering file, try to apply several different clustering algorithms, and finally integrate them into the test data to get the final result. Then, we can evaluate the results of different clustering algorithms, analyze and compare them, and select the most suitable clustering algorithm for this project.

gengjianye1997 · 2023-04-10T19:24:47Z

Last week I wrote the generate data section and clustering section functions and called them in the clustering_algorithm_evaluation file to generate the data and use the produced data to get clustering results. At present, I use three clustering methods in clustering, among which DBSCAN and OPTICS algorithms are density clustering algorithms. There are still some problems in the implementation of the hierarchical clustering algorithms. It will cause the function to loop indefinitely.
In the next step, I need to solve the implementation of hierarchical clustering algorithms so that I can get the clustering results.
In addition, for the project of unpoll, the input data of the clustering algorithm should be the groups data generated in the first step, so I need to change the input data in clustering. At present, I want to first produce a density result for each group in the groups and input the result into the clustering algorithm to obtain the result. Then, according to the clustering result, use the input data index to obtain the statement set that is clustered into a cluster. Finally, display statement sets to clearly compare the result accuracy and operation efficiency of different clustering algorithms.

gengjianye1997 · 2023-04-18T21:05:04Z

Last week I found a suitable package to implement hierarchical clustering algorithms. In addition, I also tried to change the input of the clustering algorithm to groupings data, but there are still some problems with the clustering results.
Next, I need to do some research on how to map the groupings data into the input data required by the clustering algorithm and get the correct clustering results.

gengjianye1997 · 2023-04-24T18:30:30Z

Last week, I researched how to map the groupings data into the input data required by the clustering algorithm but didn't find any effective way to do it.
So, next, I will try to form any two statements in a group generated by each user into a pair of data and generate all pair data. Then go through all groups, if the current pair appears in the group generated by more than a certain proportion of users, The current pair can then be treated as they should be in the same group. Continue traversing the next pair of data until get the final result.

gengjianye1997 · 2023-05-05T06:54:42Z

Finish pair statements data and assign agreed pair data into the same group, then print the group result. The agreed pair data means more than 50% of the users assigned these two statements agree they should be in the same group.

ddfridley added the Javascript label Mar 15, 2023

ddfridley assigned gengjianye1997 Mar 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clustering algorithm evaluation #43

clustering algorithm evaluation #43

ddfridley commented Mar 15, 2023 •

edited

Loading

gengjianye1997 commented Mar 20, 2023

gengjianye1997 commented Apr 3, 2023

gengjianye1997 commented Apr 10, 2023

gengjianye1997 commented Apr 18, 2023

gengjianye1997 commented Apr 24, 2023

gengjianye1997 commented May 5, 2023

clustering algorithm evaluation #43

clustering algorithm evaluation #43

Comments

ddfridley commented Mar 15, 2023 • edited Loading

gengjianye1997 commented Mar 20, 2023

gengjianye1997 commented Apr 3, 2023

gengjianye1997 commented Apr 10, 2023

gengjianye1997 commented Apr 18, 2023

gengjianye1997 commented Apr 24, 2023

gengjianye1997 commented May 5, 2023

ddfridley commented Mar 15, 2023 •

edited

Loading