@@ -42,8 +42,31 @@ idx = knncolle.build_index(params, y)
42
42
43
43
# Performing the search:
44
44
res = knncolle.find_knn(idx, num_neighbors = 10 )
45
- res.index
46
- res.distance
45
+
46
+ res.index # each row is an observation, each column is a neighbor
47
+ # # array([[881, 74, 959, ..., 917, 385, 522],
48
+ # # [586, 8, 874, ..., 895, 52, 591],
49
+ # # [290, 215, 298, ..., 148, 627, 443],
50
+ # # ...,
51
+ # # [773, 44, 669, ..., 775, 287, 819],
52
+ # # [658, 847, 691, ..., 630, 861, 434],
53
+ # # [796, 158, 11, ..., 606, 815, 882]],
54
+ # # shape=(1000, 10), dtype=uint32)
55
+
56
+ res.distance # distances to the neighbors in 'index'
57
+ # # array([[1.12512471, 1.12792771, 1.15229055, ..., 1.21499808, 1.2176659 ,
58
+ # # 1.23952456],
59
+ # # [0.9988856 , 1.03782045, 1.08870223, ..., 1.16899062, 1.17007634,
60
+ # # 1.17147675],
61
+ # # [1.2471501 , 1.26328659, 1.2643019 , ..., 1.32229768, 1.32679721,
62
+ # # 1.33451926],
63
+ # # ...,
64
+ # # [1.05765983, 1.08981287, 1.11295647, ..., 1.18395012, 1.1976068 ,
65
+ # # 1.21577234],
66
+ # # [0.96758957, 1.02363497, 1.05326212, ..., 1.21518925, 1.22847612,
67
+ # # 1.24106054],
68
+ # # [1.17846147, 1.22299985, 1.2248128 , ..., 1.35088373, 1.39274142,
69
+ # # 1.40207528]], shape=(1000, 10))
47
70
```
48
71
49
72
Check out the [ reference documentation] ( https://knncolle.github.io/knncolle-py ) for details.
@@ -77,27 +100,62 @@ Given a separate query dataset of the same dimensionality, we can find the neare
77
100
``` python
78
101
q = numpy.random.rand(50 , 20 )
79
102
qres = knncolle.query_knn(idx, q, num_neighbors = 10 )
80
- qres.index
81
- qres.distance
103
+
104
+ qres.index.shape # each row is an observation in 'q'
105
+ # # (50, 10)
106
+ qres.distance.shape
107
+ # # (50, 10)
108
+
109
+ qres.index[0 ,:]
110
+ # # array([712, 947, 924, 506, 640, 228, 424, 662, 299, 473], dtype=uint32)
111
+
112
+ qres.distance[0 ,:]
113
+ # # array([0.9846863 , 0.99493741, 1.01642662, 1.02303339, 1.02915264,
114
+ # # 1.05241022, 1.0690309 , 1.09889404, 1.1327715 , 1.14832321])
82
115
```
83
116
84
117
We can ask ` find_knn() ` to report variable numbers of neighbors for each observation:
85
118
86
119
``` python
87
- variable_k = (numpy.random.rand(y.shape[1 ]) * 10 ).astype(numpy.uint32)
120
+ variable_k = (numpy.random.rand(y.shape[0 ]) * 10 ).astype(numpy.uint32)
88
121
var_res = knncolle.find_knn(idx, num_neighbors = variable_k)
89
- var_res.index
90
- var_res.distance
122
+
123
+ len (var_res.index)
124
+ # # 1000
125
+
126
+ len (var_res.distance)
127
+ # # 1000
128
+
129
+ variable_k[0 ]
130
+ # # np.uint32(7)
131
+
132
+ var_res.index[0 ]
133
+ # # array([881, 74, 959, 135, 148, 946, 276], dtype=uint32)
134
+
135
+ var_res.distance[0 ]
136
+ # # array([1.12512471, 1.12792771, 1.15229055, 1.16210922, 1.19067866,
137
+ # # 1.19773984, 1.21375003])
91
138
```
92
139
93
140
We can find all observations within a distance threshold of each observation via ` find_neighbors() ` .
94
141
The related ` query_neighbors() ` function handles querying of observations in a separate dataset.
95
142
Both functions also accept a variable threshold for each observation.
96
143
97
144
``` python
98
- range_res = knncolle.find_neighbors(idx, threshold = 10 )
99
- range_res.index
100
- range_res.distance
145
+ range_res = knncolle.find_neighbors(idx, threshold = 1.2 )
146
+
147
+ len (range_res.index)
148
+ # # 1000
149
+
150
+ len (range_res.distance)
151
+ # # 1000
152
+
153
+ range_res.index[0 ]
154
+ # # array([881, 74, 959, 135, 148, 946], dtype=uint32)
155
+
156
+ range_res.distance[0 ]
157
+ # # array([1.12512471, 1.12792771, 1.15229055, 1.16210922, 1.19067866,
158
+ # # 1.19773984])
101
159
```
102
160
103
161
## Use with C++
0 commit comments