View my account

Post-processing filters in C: leak and segmentation fault

Comments

36 comments

  • MartyX Grover

    I researched your problem with Pi 4 again by simply searching for Valgrind Pi 4 problems in general.   There was a forum where other Pi users were encountering problems with uninitialised errors.  They found that Raspbian Buster may be using an old version of Valgrind by default, and that compiling an up to date version of Valgrind could correct the errors.

    https://www.raspberrypi.org/forums/viewtopic.php?t=275485 

    0
    Comment actions Permalink
  • Fabrizio Dini

    Ok, let's see if we can figure this out...

     

    I have managed to run gdb on the RPI4, so I could inspect each thread's call stack when the segfault happens. Here's what I found.

     

    When the segfault occurs, there are many threads running (20), here's the summary from gdb:

    (gdb) info thread
    Id Target Id Frame
    1 Thread 0xb3e178a0 (LWP 1218) "VehicleCounter" macap_util_resize (src=..., width=640, height=360, method=0) at macap_util.c:264
    2 Thread 0xb3e13f40 (LWP 1235) "VehicleCounter" __GI___select (timeout=0xbeffed1a, exceptfds=0x0, writefds=0xb3e11890, readfds=0xb3e11810, nfds=8) at ../sysdeps/unix/sysv/linux/select.c:41
    3 Thread 0xb3612f40 (LWP 1236) "VehicleCounter" __GI___poll (timeout=-1, nfds=1, fds=0xb360a818) at ../sysdeps/unix/sysv/linux/poll.c:29
    4 Thread 0xb2e11f40 (LWP 1237) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xb2e117d0, expected=0, futex_word=0x5ffc90) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
    7 Thread 0xb2610f40 (LWP 1240) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xb26107d0, expected=0, futex_word=0x61e648) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
    8 Thread 0xb1e0ff40 (LWP 1241) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xb1e0f7d0, expected=0, futex_word=0x4b8e88) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
    9 Thread 0xb160ef40 (LWP 1242) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xb160e7d0, expected=0, futex_word=0x61b4a0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
    10 Thread 0xb0e0df40 (LWP 1243) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xb0e0d7d0, expected=0, futex_word=0x621ae0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
    11 Thread 0xb060cf40 (LWP 1244) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xb060c7d0, expected=0, futex_word=0x615c28) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
    12 Thread 0xafe0bf40 (LWP 1245) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xafe0b7d0, expected=0, futex_word=0x6162c8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
    14 Thread 0xaf60af40 (LWP 1247) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xaf60a7d0, expected=0, futex_word=0x618b28) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
    16 Thread 0xaedc8f40 (LWP 1249) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xaedc86e8, expected=0, futex_word=0x6240d8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
    17 Thread 0xae5c7f40 (LWP 1250) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xae5c77d0, expected=0, futex_word=0x62d6f8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
    18 Thread 0xaddc6f40 (LWP 1251) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xaddc67d0, expected=0, futex_word=0x62e450) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
    19 Thread 0xad5c5f40 (LWP 1252) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xad5c57d0, expected=0, futex_word=0x62dda0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
    20 Thread 0xacdc4f40 (LWP 1253) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xacdc47d0, expected=0, futex_word=0x623660) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
    21 Thread 0xac5c3f40 (LWP 1254) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xac5c37d0, expected=0, futex_word=0x626a50) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
    * 22 Thread 0xabbfef40 (LWP 1255) "VehicleCounter" librealsense::frame_source::invoke_callback (this=0x6abd28, frame=...) at /home/realsense/librealsense/src/source.cpp:117
    23 Thread 0xaadfef40 (LWP 1256) "ZMQbg/0" 0xb47719d0 in epoll_wait (epfd=<optimized out>, events=0xaadfd868, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
    24 Thread 0xaa5fdf40 (LWP 1257) "ZMQbg/1" 0xb47719d0 in epoll_wait (epfd=<optimized out>, events=0xaa5fc868, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30

    (VehicleCounter is my application's name). The call stack on thread 22 is the following:

    (gdb) bt
    #0 0xb6249834 in librealsense::frame_source::invoke_callback(librealsense::frame_holder) const (this=0x6abd28, frame=...) at /home/realsense/librealsense/src/source.cpp:117
    #1 0xb5f22144 in librealsense::synthetic_source::frame_ready(librealsense::frame_holder) (this=0x6abd98, result=...) at /home/realsense/librealsense/src/proc/synthetic-stream.cpp:312
    #2 0xb61a4310 in rs2_synthetic_frame_ready(rs2_source*, rs2_frame*, rs2_error**) (source=0x6b4888, frame=0x811878, error=0xabbfd99c) at /home/realsense/librealsense/src/rs.cpp:1586
    #3 0xb5f2b4bc in rs2::frame_source::frame_ready(rs2::frame) const (this=0xabbfda78, result=...) at /home/realsense/librealsense/_build/../include/librealsense2/hpp/rs_processing.hpp:102
    #4 0xb5f1f7ac in librealsense::generic_processing_block::<lambda(rs2::frame, const rs2::frame_source&)>::operator()(rs2::frame, const rs2::frame_source &) const (__closure=0x6b4b7c, f=..., source=...) at /home/realsense/librealsense/src/proc/synthetic-stream.cpp:91
    #5 0xb5f2a98c in rs2::frame_processor_callback<librealsense::generic_processing_block::generic_processing_block(char const*)::<lambda(rs2::frame, const rs2::frame_source&)> >::on_frame(rs2_frame *, rs2_source *) (this=0x6b4b78, f=0x811878, source=0x6b4888)
    at /home/realsense/librealsense/_build/../include/librealsense2/hpp/rs_processing.hpp:128
    #6 0xb5f1f2a4 in librealsense::processing_block::invoke(librealsense::frame_holder) (this=0x6abcdc, f=...) at /home/realsense/librealsense/src/proc/synthetic-stream.cpp:43
    #7 0xb62034a4 in librealsense::synthetic_sensor::<lambda(librealsense::frame_holder)>::operator()(librealsense::frame_holder) const (__closure=0x7cee24, f=...) at /home/realsense/librealsense/src/sensor.cpp:1476
    #8 0xb620c4d4 in librealsense::internal_frame_callback<librealsense::synthetic_sensor::start(librealsense::frame_callback_ptr)::<lambda(librealsense::frame_holder)> >::on_frame(rs2_frame *) (this=0x7cee20, fref=0x811878) at /home/realsense/librealsense/src/types.h:943
    #9 0xb624991c in librealsense::frame_source::invoke_callback(librealsense::frame_holder) const (this=0x62be4c, frame=...) at /home/realsense/librealsense/src/source.cpp:125
    #10 0xb61f8788 in librealsense::uvc_sensor::<lambda(librealsense::platform::stream_profile, librealsense::platform::frame_object, std::function<void()>)>::operator()(librealsense::platform::stream_profile, librealsense::platform::frame_object, std::function<void()>)
    (__closure=0x7d0218, p=..., f=..., continuation=...) at /home/realsense/librealsense/src/sensor.cpp:379
    #11 0xb62059f0 in std::_Function_handler<void(librealsense::platform::stream_profile, librealsense::platform::frame_object, std::function<void()>), librealsense::uvc_sensor::open(const stream_profiles&)::<lambda(librealsense::platform::stream_profile, librealsense::platform::frame_object, std::function<void()>)> >::_M_invoke(const std::_Any_data &, librealsense::platform::stream_profile &&, librealsense::platform::frame_object &&, std::function<void()> &&) (__functor=..., __args#0=..., __args#1=..., __args#2=...)
    at /usr/include/c++/8/bits/std_function.h:297
    #12 0xb5ecc900 in std::function<void (librealsense::platform::stream_profile, librealsense::platform::frame_object, std::function<void ()>)>::operator()(librealsense::platform::stream_profile, librealsense::platform::frame_object, std::function<void ()>) const
    (this=0x62a3d4, __args#0=..., __args#1=..., __args#2=...) at /usr/include/c++/8/bits/std_function.h:687
    #13 0xb604eccc in librealsense::platform::v4l_uvc_device::poll() (this=0x62a2dc) at /home/realsense/librealsense/src/linux/backend-v4l2.cpp:965
    #14 0xb6050cc8 in librealsense::platform::v4l_uvc_device::capture_loop() (this=0x62a2dc) at /home/realsense/librealsense/src/linux/backend-v4l2.cpp:1305
    #15 0xb604d49c in librealsense::platform::v4l_uvc_device::<lambda()>::operator()(void) const (__closure=0x623d9c) at /home/realsense/librealsense/src/linux/backend-v4l2.cpp:755
    #16 0xb60554c8 in std::__invoke_impl<void, librealsense::platform::v4l_uvc_device::stream_on(std::function<void(const librealsense::notification&)>)::<lambda()> >(std::__invoke_other, librealsense::platform::v4l_uvc_device::<lambda()> &&) (__f=...)
    at /usr/include/c++/8/bits/invoke.h:60
    #17 0xb605403c in std::__invoke<librealsense::platform::v4l_uvc_device::stream_on(std::function<void(const librealsense::notification&)>)::<lambda()> >(librealsense::platform::v4l_uvc_device::<lambda()> &&) (__fn=...) at /usr/include/c++/8/bits/invoke.h:95
    #18 0xb6058958 in std::thread::_Invoker<std::tuple<librealsense::platform::v4l_uvc_device::stream_on(std::function<void(const librealsense::notification&)>)::<lambda()> > >::_M_invoke<0>(std::_Index_tuple<0>) (this=0x623d9c) at /usr/include/c++/8/thread:244
    #19 0xb6058918 in std::thread::_Invoker<std::tuple<librealsense::platform::v4l_uvc_device::stream_on(std::function<void(const librealsense::notification&)>)::<lambda()> > >::operator()(void) (this=0x623d9c) at /usr/include/c++/8/thread:253
    #20 0xb60588f0 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<librealsense::platform::v4l_uvc_device::stream_on(std::function<void(const librealsense::notification&)>)::<lambda()> > > >::_M_run(void) (this=0x623d98) at /usr/include/c++/8/thread:196
    #21 0xb43949b0 in () at /usr/lib/arm-linux-gnueabihf/libstdc++.so.6
    #22 0xb6ef1494 in start_thread (arg=0xabbfef40) at pthread_create.c:486
    #23 0xb4771578 in () at ../sysdeps/unix/sysv/linux/arm/clone.S:73

    Frame #0 is at line 117, here's the context:

    112         } 
    113         void frame_source::invoke_callback(frame_holder frame) const
    114         {
    115             if (frame)
    116             {
    117                 auto callback = frame.frame->get_owner()->begin_callback();
    118                 try
    119                 {
    120                     frame->log_callback_start(_ts ? _ts->get_time() : 0);
    121                     if (_callback)

    So, it seems that the invoke_callback() function receives the parameter "frame" which contains:

    (gdb) p frame 
    $24 = {frame = 0x811878}

    Inspecting it we found:

    (gdb) p frame.frame 
    $25 = (librealsense::frame_interface *) 0x811878
    (gdb) p frame.frame->get_owner()
    $26 = (librealsense::archive_interface *) 0x0

    And here we are: get_owner() returns a NULL pointer, but line 117 still wants to dereference it to call begin_callback().
    I don't know if this is due to a misuse of the librealsense api on my side or it is an actual bug. Probably it is not expected that a frame has "no owner"... still, this is what happens.

    What do I do now MartyG? Should I file a bug on your bug tracker? Can you provide a link, or a work-around for this issue?

    Thanks a lot!

     

    0
    Comment actions Permalink
  • Fabrizio Dini

    MartyG, I have make a correction to what I wrote above.

     

    The previous analysis was made on the Raspbian image you linked a few days ago. There, the librealsense version is 2.34. While inspecting the problem, I found also a few bugs on my side, and fixed them. Still, on THAT version of librealsense the application crashes and that is probably due to the bug I described.

     

    But know I got back on librealsense v.2.40, and the application is now running without crashing. So before reporting a new bug I think it's better to verify if that bug has already been fixed. Can I ask you to check this? I have searched on the issues of the github repo but could not find anything.

     

    Thank you for your valuable help!

    0
    Comment actions Permalink
  • MartyX Grover

    Research into futex_abstimed_wait_cancelable indicates that it is involved in multi-threading, which would explain the 20 threads listed above.  Example:

    https://stackoverflow.com/questions/54766479/logging-multithreading-deadlock-in-python 

    If your program works on your laptop but not on Pi 4, I wonder if that suggests a conflict with running multi-threaded code in the Pi's Arm processor rather than the x64-type processor that your laptop likely uses

    0
    Comment actions Permalink
  • MartyX Grover

    I would add that SDK 2.34 is not recommended to be used if possible as it had problems with continuously-generating timing errors.

    https://github.com/IntelRealSense/librealsense/issues/6189 

    0
    Comment actions Permalink
  • Fabrizio Dini

    Hi everybody! I am back on this issue because I managed to spend some time on this code after quite a while. In the meantime, we got SDK version 2.41. I updated that (both on my laptop and the raspberry Pi) but that didn't solved the problem. I performed further analysis by building the SDK form sources (on tag v2.41, of course) so to be able to debug both the application and the Realsense SDK. I am still using Valgrind, since this seems to speed up things a lot (the application crashes very quickly, conversely it would take some time before crashing when running without valgrind). However, here's the point. Valgrind shows an illegal read: there's no doubt, a NULL pointer has been dereferenced inside libRealSense.

    Having built librealsense in debug mode, I have a code snippet for each line of the stack trace. Let me report just the last one, for briefness:

    On line 195, there are 2 pointers: 'this' and 'owner'. The former cannot be NULL. So I guess the problem is with the latter, 'owner'. I already found previously (see above posts) that there were cases where 'owner' seemed to be null and still dereferenced. Now this new finding strongly suggest that it could happen.
    Is there anyone that can shed a light on the problem? Should I open a bug? Could that depend on a mis-use of the API?

    Thanks in advance!

    Regards

    Fabrizio

     

    0
    Comment actions Permalink

Please sign in to leave a comment.