Friday, 3 August 2012

Quick Cython debugging

Recently I found the need to debug a Cython module that was computing derivatives generated from SymPy. There were too many variables to use printf statements effectively and I wanted a quick way just to watch the changes in variables. The purpose of this post is to share a quick and dirty method for debugging Cython modules.

Take the example code below to create a module enorm which exposes a function to calculate the Euclidean norm of a one-dimensional NumPy vector.

# enorm.pyx
# cython: boundscheck=False
# cython: cdivision=True

# Imports
import numpy as np
cimport numpy as np

# Types
ctypedef np.float64_t DTYPE_t
DTYPE = np.float64

# External
cdef extern from "math.h":
    double sqrt(double)

# enorm
def enorm(np.ndarray[DTYPE_t, ndim=1] x, bint normalise=False):
    cdef DTYPE_t l = 0.
    cdef Py_ssize_t i

    for i in range(x.shape[0]):
        l += x[i]*x[i]

    l = sqrt(l)

    if normalise:
        for i in range(x.shape[0]):
            x[i] /= l

    return l

setup.py:
#!/usr/bin/env python

# Imports
import sys, os
from distutils.core import setup
from distutils.extension import Extension

from Cython.Distutils import build_ext

from numpy import get_include
NP_INCLUDE = get_include()

setup(
    ext_modules=[Extension('enorm', 
                           ['enorm.pyx'], 
                           include_dirs=[NP_INCLUDE, '.'])],
    cmdclass = {'build_ext' : build_ext}
)

#!/usr/bin/env python

# Imports
import numpy as np
from enorm import enorm

# main
def main():
    # check enorm
    X = np.array([1., 1., 1.])
    print X, enorm(X), np.sqrt(np.sum(X**2))

    # check `normalise`
    enorm(X, normalise=True)
    print X, enorm(X)

if __name__ == '__main__':
    main()

To step through enorm in enorm.pyx, first produce enorm.c:

cython enorm.pyx

Now open enorm.c and find the first line of enorm:

/* "enorm.pyx":19
* # enorm
* def enorm(np.ndarray[DTYPE_t, ndim=1] x, bint normalise=False):
*     cdef DTYPE_t l = 0.             # <<<<<<<<<<<<<<
*     cdef Py_ssize_t i
* 
*/


__pyx_v_l = 0.;


Now add a breakpoint interrupt above the line of interest. For GCC this is:

asm("int $0x3");

For MSVC:

__asm { int 0x3 };

If inspecting control flow it can be helpful at this point to also turn off optimisations e.g. for GCC add the extra_compile_args=['-O0'] argument to Extension in setup.py. Next, build the extension module with debugging information.

python setup.py build_ext --inplace -g

On Windows you will have to rename the extension module from enorm_d.pyd to enorm.pyd. Now, just run python main.py in GDB/Visual Studio/OllyDbg to begin step through the function.

In GDB for example:

$ gdb python
GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/python...(no debugging symbols found)...done.
(gdb) run main.py 
Starting program: /usr/bin/python main.py
[Thread debugging using libthread_db enabled]

Program received signal SIGTRAP, Trace/breakpoint trap.
__pyx_pf_5enorm_enorm (__pyx_self=0x0, __pyx_args=0xb7bd18cc, __pyx_kwds=0x0)
    at enorm.c:1049
1049   __pyx_v_l = 0.;
=> 0xb72c9c51 <__pyx_pf_5enorm_enorm+858>:  d9 ee fldz   
   0xb72c9c53 <__pyx_pf_5enorm_enorm+860>:  dd 5d a0 fstp   QWORD PTR [ebp-0x60]
(gdb) l
1044  *     cdef DTYPE_t l = 0.             # <<<<<<<<<<<<<<
1045  *     cdef Py_ssize_t i
1046  * 
1047  */
1048  asm("int $0x3");
1049   __pyx_v_l = 0.;
1050 
1051   /* "enorm.pyx":22
1052  *     cdef Py_ssize_t i
1053  * 
(gdb) 

To more easily step through the original source use layout src (Beej's Quick Guide to GDB).