Skip to content

Latest commit

 

History

History
197 lines (133 loc) · 9.38 KB

README.md

File metadata and controls

197 lines (133 loc) · 9.38 KB

Abstract

This is a gdb-like debugger focusing on Python bytecode. So far as I know, this is the only debugger available specifically for Python bytecode.

However to do this, you need to use underneath x-python: a Python Interpreter written in Python.

This project builds off of a previous Python 3 debugger called trepan3k.

Example

demo

Below we'll try to break down what's going on above.

We'll invoke the a Greatest Common Divisors program gcd.py using our debugger. The source is found in test/example/gcd.py.

In this section we'll these some interesting debugger commands that are not common in Python debuggers:

  • stepi to step a bytecode instruction
  • set loglevel to show a the x-python log "info"-level log tracing.
  • info stack to show the current stack frame evaluation stack
$ trepan-xpy test/example/gcd.py 3 5
 Running x-python test/example/gcd.py with ('3', '5')
 (test/example/gcd.py:10): <module>
 -> 2 """
 (trepan-xpy)
 

Above we are stopped before we have even run the first instruction. The -> icon before 2 means we are stopped calling a new frame.

(trepan-xpy) step
(test/example/gcd.py:2): <module>
-- 2 """Greatest Common Divisor"""
@ 0: LOAD_CONST 'Greatest Common Divisor'

Ok, now we are stopped before the first instruction LOAD_CONST which will load a constant onto the evaluation stack. The icon changed from -> 2 to -- 2 which indicates we are on a line-number boundary at line 2.

The Python construct we are about to perform is setting the program's docstring. Let's see how that is implemented.

First we see that the variable __doc__ which will eventually hold the docstring isn't set:

We see here that the first part is loading this constant onto an evaluation stack.

At this point, to better see the execution progress we'll issue the command set loglevel which will show the instructions as we step along.

Like trepan3k, trepan-xpy has extensive nicely formatted help right in the debugger. Let's get the help for the set loglevel command:

(trepan-xpy) help set loglevel
set loglevel [ on | off | debug | info ]

Show loglevel PyVM logger messages. Initially logtracing is off.

However running set loglevel will turn it on and set the log level to debug.
So it's the same thing as set loglevel debug.

If you want the less verbose messages, use info. And to turn off, (except
critical errors), use off.

Examples:

     set loglevel         # turns x-python on info logging messages
     set loglevel info    # same as above
     set loglevel debug   # turn on info and debug logging messages
     set loglevel off     # turn off all logging messages except critical ones


So now lets's set that:

(trepan-xpy) set loglevel
(trepan-xpy)

A rather unique command that you won't find in most Python debuggers but is in low-level debuggers is stepi which steps and instruction. Let's use that:

(trepan-xpy) stepi
(test/example/gcd.py:2 @2): <module>
.. 2 """Greatest Common Divisor"""
@ 2: STORE_NAME 'Greatest Common Divisor') __doc__

The .. at the beginning indicates that we are on an instruction which is in between lines.

We've now loaded the docstring onto the evaluation stack with LOAD_CONST Let's see the evaluation stack with info stack:

(trepan-xpy) info stack
0: <class 'str'> 'Greatest Common Divisor'

Here we have pushed the docstring for the program but haven't yet stored that in __doc__. To see this, can use the auto-eval feature of trepan-xpy: it will automatically evaluate strings it doesn't recognize as a debugger command:

(trepan-xpy) __doc__ is None
True

Let's step the remaining instruction, STORE_NAME to complete the instructions making up line 1.

trepan-xpy) stepi
INFO:xpython.vm:L. 10  @  4: LOAD_CONST 0
(test/example/gcd.py:10 @4): <module>
-- 10 import sys
@ 4: LOAD_CONST 0

The leading -- before 10 import... indicates we are on a line boundary now. Let's see the stack now that we have run STORE_NAME:

(trepan-xpy) info stack
Evaluation stack is empty

And to see that we've stored this in __doc__ we can run eval to see its value:

(trepan-xpy) eval __doc__
"Greatest Common Divisor"

(Entering just _doc_ is the same thing as eval __doc__ when auto-evaluation is on.

Now let's step a statement (not instructions), to see how a module becomes visable.

(trepan-xpy) step
INFO:xpython.vm:       @  6: LOAD_CONST None
INFO:xpython.vm:       @  8: IMPORT_NAME (0, None) sys
INFO:xpython.vm:       @ 10: STORE_NAME (<module 'sys' (built-in)>)
INFO:xpython.vm:L. 12  @ 12: LOAD_CONST <code object check_args at 0x7f2a0a286f60, file "test/example/gcd.py", line 12>
(test/example/gcd.py:12 @12): <module>
-- 12 def check_args():
@ 12: LOAD_CONST <code object check_args at 0...est/example/gcd.py", line 12>

The INFO are initiated by the VM interpreter. As a result of the set loglevel the interpreters logger log level was increased. This in turn causes a callback is made to a formatting routine provided by the debugger to nicly colorize the information. And that is why parts of this are colorized in a terminal session. In x-python you can get the same information, just not colorized.

One thing to note is the value after the operand and in parenthesis, like after STORE NAME. Compare that line with what you'll see from a static disassembly like Python's dis or xdis version of that:

10 STORE_NAME                1 (sys)

In a static disassembler, the "1" indicates the name index in the code object. The value in parenthesis is what that name, here at index 1 is, namely sys.

In trepan-xpy and x-python however we omit the name index, 1, since that isn't of much interest. Instead we show that dynamic stack entries or operands that STORE_NAME is going to work on. In particular the object that is going to be stored in variable sys is the built-in module sys.

Now let's step another statement to see how a function becomes available:

trepan-xpy) step
INFO:xpython.vm:       @ 14: LOAD_CONST 'check_args'
INFO:xpython.vm:       @ 16: MAKE_FUNCTION (check_args) Neither defaults, keyword-only args, annotations, nor closures
INFO:xpython.vm:       @ 18: STORE_NAME (<Function check_args at 0x7fdb1d4d49f0>) check_args
INFO:xpython.vm:L. 25  @ 20: LOAD_CONST <code object gcd at 0x7fdb1d55fed0, file "test/example/gcd.py", line 25>
(test/example/gcd.py:25 @20): <module>
-- 25 def gcd(a,b):
@ 20: LOAD_CONST <code object gcd at 0x7fdb1d...est/example/gcd.py", line 25>

A difference between a dynamic language like Python and a statically compiled language like C, or Java is that there is no linking step in the complation; modules and functions are imported or created and linked as part of the execution of the code.

Notice again what's in the parenthesis after the opcode and how that differs from a static disassembly. For comparison here is what 2nd and 3rd instruction look like from pydisasm:

16 MAKE_FUNCTION             0 (Neither defaults, keyword-only args, annotations, nor closures)
18 STORE_NAME                2 (check_args)

Again, indices into a name table are dropped and in their place are the evaluation stack items. For MAKE_FUNCTION the name of the function that is created is shown; while for STORE_NAME, as before, the item that gets stored (a function object) is shown.

The rest of the screencast shows that in addition to the step (step into) and stepi (step instruction) debugger commands there is a next or step over debugger command, and a slightly buggy finish (step out) command

I don't have breakpoints hooked in yet.

But in contrast to any other Python debugger I know about, we can cause an immediate return with a value and that is shown in the screencast.

We've only show a few of the many debugger features.

Bytecode-specific commands

Here are some interesting commands not typically found in Python debuggers, like pdb

  • info blocks lets you see the block stack
  • set pc <offset> lets you set the Program counter within the frame
  • set autopc runs info pc to show the debugged program's program counter before each time the debugger's command-loop REPL is run.
  • set autostack runs info stack to show the debugged program's evaluation stack before each time the debugger's command-loop REPL is run.
  • vmstack {peek | push, pop} - inspects or modifies evaluation stack

See Also