forked from usnistgov/VTSG
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO
345 lines (256 loc) · 13 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
*created "Wed Feb 16 08:52:17 2022" *by "Paul E. Black"
*modified "Tue Apr 16 15:10:09 2024" *by "Paul E. Black"
Ideas for Enhancement of and Things To Do for VTSG
7 March 2024
Factor out file "header" (and trailer) stuff from file_template and the complexities
with <body> code that will go in other files. This includes comments, license,
location for imports and local declarations. This way all generated files are sure
to have the same, common header and can be rendered with the same, common code.
7 March 2024
Document what "in", "out", and "traversal" do. Document when variable assignments
are created (other then for declaration & initialization).
26 February 2024
In complexities_generator.py (and test_case.py that uses it) keep ALL code and stuff
in the list of class codes, instead of treating the main code special and putting
auxiliary stuff in class codes.
13 February 2024
Allow a function body to be in another file. Currently only class (definitions) are
in other files.
Make sure VTSG can generate code where a class definition is in the SAME file.
8 February 2024
Report unknown jinja variables. That is, if an xml file has, say, <impOT> (instead
of <import>), VTSG silently nulls it out. It can be quite hard to determine *why*
your change isn't working. In the above case, VTSG should say
Unknown variable: {{impot}} in sinks.xml, line 85
or something.
23 January 2024
sample.py looks for and saves _need_id. That means inputs, filters, and sinks can
have a "need_id". Code in test_case.py increments the var_id if an input, filter, or
sink needs an id.
But need_id is NOT documented in the manual (except for complexities), nor is it ever
used in PHP, C#, or Python language files (again, except for complexities).
Remove this capability or else document and test it.
9 January 2024
Improve variable assignment. Often there are redundant assignments. For example,
Python/Exception/CWE369/safe/CWE369__I_readline__F_nonzero__S_div_zero__1-1.1.py
tainted_2 = int(input())
if 1==1:
if tainted_2 == 0:
sys.exit("Zero input not allowed")
tainted_3 = tainted_2
user_input = tainted_3
print(f'The reciprocal of {user_input} is {1/user_input}')
Notice that tainted_3 = tainted_2 occurs twice.
A human would write this more like
tainted_2 = int(input())
if 1==1:
if tainted_2 == 0:
sys.exit("Zero input not allowed")
user_input = tainted_2
print(f'The reciprocal of {user_input} is {1/user_input}')
or even use "user_input" instead of tainted_2.
At a minimum, carefully document when/where assignments are synthesized. Warn about
*unsafe* variants or function bodies in which tainted_3 is *not* initialized.
31 July 2023
Change <sample></sample> to <module></module>.
30 March 2023
Add some kind of --label or --source option so that every line of code in the final
generated case has some comment indicating where it comes from. Something like
try: # input
tainted_0 = int(sys.argv[1]) # input
except ValueError: # input
tainted_0 = 1776 # input
# input
tainted_1 = tainted_0 # synthesized
# filter
# No filtering (sanitization) # filter
tainted_1 = tainted_0 # filter
# filter
array = [0, 1, 2, 3, 4] # sink
(The file name tells us *which* input, filter, or sink was used.)
This should be fairly easy to do.
Step 1. Add a --label or --source switch to the command line and structure
Step 2. In complexities-generator.py and test_case.py, add the comments to the line
when synthesizing make_assign and edit the code replacing \n with \n # source.
It might be really hard to get comments to line up as above.
21 March 2023
Have some method of variants. They are vaguely like complexities, but apply broadly.
Juliet has the same code instantiated with different types. Kratkiewcz has array
access at the end, just beyond, some beyond, and far beyond. These variants have to
potentially be incorporated in all modules. Kratkiewcz also has a few basic loop
templates, but then does variants on them where the initial value, the terminating
test value, or the increment value (and combinations of all of them!) are passed
through variables instead of being hard coded.
Sinks should be able to supply code to other modules (kinda' like <body> puts code in
other files). In particular, certain sinks need a hard coded "input" to be a certain
value in order to be safe or to exploit the weakness.
21 March 2023
Warn if the chosen sink doesn't generate any test cases. It's relatively easy to
warn if there are no matching filters. It gets progressively harder to count that
there are no filter/input or filter/input/exec_query combinations.
Maybe just count the number of cases generated (for an input). If the number is
zero, produce a warning.
4 Jan 2023
Change <dir> in modules to <des> (for description) or something. This is only used
in test case names. Maybe files were originally placed in separate directories
depending on their input or sink. Thus <path> ... <dir> made sense. (This may be
why multiple <dir> entries are allowed.)
Maybe replace the <path> <dir> stuff with one <description> or <name> keyword.
(You'd have to fix a few modules with multiple <dir> entries.)
Remember to update DTDs.
3 Mar 2023
Allow the user to specify the variable name root, that is, something other than
"tainted".
Of course, a programmer could customize var_name (in complexities_generator.py) to
build a distinct module to generate names. For instance, it could pick from some
large list (e.g. row, column, flaw, counts, tree, elapsed, etc.), assign random
labels (e.g. rT6huz), use an alphabetic counter (e.g. vara, varb, varc, vard), or
something.
Caution: many of the variable "matching" routines (input out_var to filter in_var)
are based on counters being incremented or decremented by one.
4 Jan 2023
Maybe there should be a new value "maybe" or "random" or "indeterminate" for test
conditions code when the analyzer cannot determine. For example, Juliet,
if (random()) {
code
}
23 Nov 2022
Thoroughly document what various attributes in complexities do. For example,
• indirection: “1” if the code is split into two chunks (call and declaration) or
calls a function. The body tag should be present when calling a function.
What does indirection="1" DO to the final code?? Does indirection="0" do anything,
that is, is this an optional flag? Other attributes that need more explanation are
• need_id
• in_out_var (especially the magic that happens when the value is "traversal")
16 Nov 2022
Explain in manual what "supported" complexity types and groups do. That is, *any*
string can be placed in type or group, but some strings trigger special functions in
VTSG.
Add examples of how to code things or how to use options, like need_id.
7 Nov 2022
Complexities are only wrapped around filters. In Juliet they can wrap the sink (and
the input, I think). Maybe add markers in file_template.xml to indicate where
complexities can go? VTSG's current scheme would be represented as:
{{begin_complexity}}
INDENT
{{filtering_content}}
DEDENT
{{end_complexity}}
{{sink_content}}
Here's Juliet's scheme:
{{filtering_content}}
{{begin_complexity}}
INDENT
{{sink_content}}
DEDENT
{{end_complexity}}
7 Nov 2022
An unrecognized input module input_type, that is, one that doesn't appear in the
file_template, results in variables with no initialization:
tainted_0 =
tainted_1 =
There should be a check or warning about this.
Note: get_init_var_code() in file_template.py return empty string if type is not
found. The only place it is called, complexities_generator.py line 283, checks for
invalid return, but it checks for (not) None.
Different or custom types make it easier to connect certain flaw templates with
the proper initialization or input.
A higher level capability is to allow "qualified" types, e.g.
<input_type>int,primesOnly</input_type>
so flaws that need a particular "input" to be "hooked" to modules that produce that
particular output.
9 Nov 2022
Allow modules, other than complexities, to have <body></body> code to be placed in
another file.
9 Nov 2022
VTSG doesn't have any notion of a fixed "support" file. That is, a file written by
the developer that the test cases (may) need in order to execute. This is like
testcasesupport files, for instance, io.c or std_testcase.h.
This support file would have to be copied into any directory where test cases are
placed.
7 Nov 2022
There should be some way to put several pieces of generated code in the same case, to
mimic Juliet's structure of having a bad version and one or more good versions in the
same file.
Note: that we can mimic Juliet's corresponding good and bad cases by having TWO
sinks: one with the good code and one with the bad code. They go in separate test
cases (in different subdirectories: safe and unsafe), but at least it's a way for
VTSG to generate matched pairs (or triples if we need two good versions). Idea: the
sinks could have comments that connect the two cases, e.g., "This case has a
corresponding bad [good] case with the same name, but OBOb [OBOg] as the sink."
23 Nov 2022
Enhance the Juliet/Python tests with THREE variable scopes if you want to make sure
analyzers follow Python scope rules.
Change localFive and globalFive to localConst and globalConst, and add an
enclosingConst. These can be combinations of 5 or not 5. Refer to scopeConst, e.g.
def f():
scopeConst = 6
def g():
scopeConst = 5
if scopeConst == 5:
# this will always execute . . .
or perhaps
scopeConst = globalConst
7 Nov 2022
Adding a // flaw line is only done for sink modules. (In generator.py, around line
430; search for # add comment into code ...) To be more general, perhaps VTSG should
be capable of adding a //flaw line in any module.
10 Nov 2022
Add a "new case hook". That is, every time a new case is written, VTSG will run
a script(?) you give as a "hook". This takes advantage of modern
multiprocessors. While VTSG is generating cases, the generated cases can be
checked or executed.
4 Nov 2022
Make the names of language modules consistent. In particular, filter modules and
functions are often referred to as "filtering".
Change language entries from {{ filtering_content }} to {{ filter_content }}?
C#
- check all str functions
- Complexities :
- ternary : None (possible?)
- switch : Testing
- goto : Testing
- loop : Testing
- for : Testing
- foreach : Testing
- function : Testing (only traversal/in/out)
- classes : Testing (only traversal/in/out)
# Hard to do
- add C# print debug to test if safe/unsafe or if it's executed
Maybe add language-specific code in VTSG itself instead of trying to specify the
syntax (and composition rules) in, say, file_template.xml. For example, C++ and C#
are far more similar in syntax than C# and Python. This will help lessen the amount
of code that the generator runs specifically running only the functions it needs for
each language.
GENERATE MORE KINDS OF CASES
Add swapped argument errors, see Vineeth Kashyap https://arxiv.org/abs/2009.09117
Generalize the structure of VTSG. Now it looks for inputs, filters, and queries in
rigid combinations. Maybe VTSG should read the template, then go hunting for each
kind of module mentioned. Connections between them, via variables or call/return,
could be generalized.
GENERATE CASES SMARTER
Create s01, s02, s03, etc. subdirectories so there aren't hundreds of thousands of
cases in one directory.
OTHER? GENERAL?
Shorten the generated file name. Or, add a --short-case-name option, to produce
Juliet-like names.
A better alternative is to allow the user to specify generated file name in a
general format, like strftime, ISO/IEC 9899 (C90), or find -printf. Here is a sample:
%A is the full weekday name.
%a is the abbreviated weekday name.
%B is the full month name.
%b is the abbreviated month name.
%d is the day of the month as a decimal number (01-31).
%H is the hour (24-hour clock) as a decimal number (00-23).
%I is the hour (12-hour clock) as a decimal number (01-12).
%M is the minute as a decimal number (00-59).
%m is the month as a decimal number (01-12).
The built-in-test rules in the Makefile leave a lot of "junk" files and directories
in the current directory. Put all the self-test stuff in another directory so the
top-level directory doesn't get cluttered.
Write manifest in our current SARIF-style. There is sarif_writer.py, which is a
start from August 2020. A newer implementation (Dec 2021) of sarif writer is in Paul's
Work/Ockham.
Update contact and license information in setup.py. (Re)register in PyPI. Let
Bertrand know about this new GitHub repository, so he can "point" his to this one.
end of TODO