Post-processing filters in C: leak and segmentation fault
Hello every body! First post in this community!
I am developing a pure C application that uses librealsense (with D415 or D435 cameras) to get a depth frame and perform some calculation (which calculation is irrelevant by now...)
Before taking the depth frame to the functions tate make all the computation, I need to transform it to a single channel 8bpp image. To do this I defined a chain of post-processing filters as described here:
https://dev.intelrealsense.com/docs/post-processing-filters
and ended the chain with a colorizer filter that returns a gray level image.
Unfortunately, the application crashes (with segmentation fault) after a while. Looking at the process memory with tools like top or htop it seems that there is some memory leak in the code, and valgrind confirms that too. But all the lost memory is within librealsense, which sound strange actually. I am thinking that maybe I am missing some passage, but the documentation is not very detailed.
The code is wide... not sure if I can post it here. However, I suspect that the issue is in cleaning up of the pointers I get from librealsense. Each post-processing filters return a filtered image that is the input to the next filter in the chain. After all the processing has been done, and the 8bpp image has been copied to the rest of the applicaition, I release them all with rs2_release_frame().
Does anybody here have some insights about how memory is managed in librealsense? Should I release the memory or not? Am I missing something in using the filters? Any complete example available?
Thanks in advance!
regards
Fabrizio
-
I researched your problem with Pi 4 again by simply searching for Valgrind Pi 4 problems in general. There was a forum where other Pi users were encountering problems with uninitialised errors. They found that Raspbian Buster may be using an old version of Valgrind by default, and that compiling an up to date version of Valgrind could correct the errors.
-
Ok, let's see if we can figure this out...
I have managed to run gdb on the RPI4, so I could inspect each thread's call stack when the segfault happens. Here's what I found.
When the segfault occurs, there are many threads running (20), here's the summary from gdb:
(gdb) info thread
Id Target Id Frame
1 Thread 0xb3e178a0 (LWP 1218) "VehicleCounter" macap_util_resize (src=..., width=640, height=360, method=0) at macap_util.c:264
2 Thread 0xb3e13f40 (LWP 1235) "VehicleCounter" __GI___select (timeout=0xbeffed1a, exceptfds=0x0, writefds=0xb3e11890, readfds=0xb3e11810, nfds=8) at ../sysdeps/unix/sysv/linux/select.c:41
3 Thread 0xb3612f40 (LWP 1236) "VehicleCounter" __GI___poll (timeout=-1, nfds=1, fds=0xb360a818) at ../sysdeps/unix/sysv/linux/poll.c:29
4 Thread 0xb2e11f40 (LWP 1237) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xb2e117d0, expected=0, futex_word=0x5ffc90) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
7 Thread 0xb2610f40 (LWP 1240) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xb26107d0, expected=0, futex_word=0x61e648) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
8 Thread 0xb1e0ff40 (LWP 1241) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xb1e0f7d0, expected=0, futex_word=0x4b8e88) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
9 Thread 0xb160ef40 (LWP 1242) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xb160e7d0, expected=0, futex_word=0x61b4a0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
10 Thread 0xb0e0df40 (LWP 1243) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xb0e0d7d0, expected=0, futex_word=0x621ae0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
11 Thread 0xb060cf40 (LWP 1244) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xb060c7d0, expected=0, futex_word=0x615c28) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
12 Thread 0xafe0bf40 (LWP 1245) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xafe0b7d0, expected=0, futex_word=0x6162c8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
14 Thread 0xaf60af40 (LWP 1247) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xaf60a7d0, expected=0, futex_word=0x618b28) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
16 Thread 0xaedc8f40 (LWP 1249) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xaedc86e8, expected=0, futex_word=0x6240d8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
17 Thread 0xae5c7f40 (LWP 1250) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xae5c77d0, expected=0, futex_word=0x62d6f8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
18 Thread 0xaddc6f40 (LWP 1251) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xaddc67d0, expected=0, futex_word=0x62e450) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
19 Thread 0xad5c5f40 (LWP 1252) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xad5c57d0, expected=0, futex_word=0x62dda0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
20 Thread 0xacdc4f40 (LWP 1253) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xacdc47d0, expected=0, futex_word=0x623660) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
21 Thread 0xac5c3f40 (LWP 1254) "VehicleCounter" futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0xac5c37d0, expected=0, futex_word=0x626a50) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
* 22 Thread 0xabbfef40 (LWP 1255) "VehicleCounter" librealsense::frame_source::invoke_callback (this=0x6abd28, frame=...) at /home/realsense/librealsense/src/source.cpp:117
23 Thread 0xaadfef40 (LWP 1256) "ZMQbg/0" 0xb47719d0 in epoll_wait (epfd=<optimized out>, events=0xaadfd868, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
24 Thread 0xaa5fdf40 (LWP 1257) "ZMQbg/1" 0xb47719d0 in epoll_wait (epfd=<optimized out>, events=0xaa5fc868, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30(VehicleCounter is my application's name). The call stack on thread 22 is the following:
(gdb) bt
#0 0xb6249834 in librealsense::frame_source::invoke_callback(librealsense::frame_holder) const (this=0x6abd28, frame=...) at /home/realsense/librealsense/src/source.cpp:117
#1 0xb5f22144 in librealsense::synthetic_source::frame_ready(librealsense::frame_holder) (this=0x6abd98, result=...) at /home/realsense/librealsense/src/proc/synthetic-stream.cpp:312
#2 0xb61a4310 in rs2_synthetic_frame_ready(rs2_source*, rs2_frame*, rs2_error**) (source=0x6b4888, frame=0x811878, error=0xabbfd99c) at /home/realsense/librealsense/src/rs.cpp:1586
#3 0xb5f2b4bc in rs2::frame_source::frame_ready(rs2::frame) const (this=0xabbfda78, result=...) at /home/realsense/librealsense/_build/../include/librealsense2/hpp/rs_processing.hpp:102
#4 0xb5f1f7ac in librealsense::generic_processing_block::<lambda(rs2::frame, const rs2::frame_source&)>::operator()(rs2::frame, const rs2::frame_source &) const (__closure=0x6b4b7c, f=..., source=...) at /home/realsense/librealsense/src/proc/synthetic-stream.cpp:91
#5 0xb5f2a98c in rs2::frame_processor_callback<librealsense::generic_processing_block::generic_processing_block(char const*)::<lambda(rs2::frame, const rs2::frame_source&)> >::on_frame(rs2_frame *, rs2_source *) (this=0x6b4b78, f=0x811878, source=0x6b4888)
at /home/realsense/librealsense/_build/../include/librealsense2/hpp/rs_processing.hpp:128
#6 0xb5f1f2a4 in librealsense::processing_block::invoke(librealsense::frame_holder) (this=0x6abcdc, f=...) at /home/realsense/librealsense/src/proc/synthetic-stream.cpp:43
#7 0xb62034a4 in librealsense::synthetic_sensor::<lambda(librealsense::frame_holder)>::operator()(librealsense::frame_holder) const (__closure=0x7cee24, f=...) at /home/realsense/librealsense/src/sensor.cpp:1476
#8 0xb620c4d4 in librealsense::internal_frame_callback<librealsense::synthetic_sensor::start(librealsense::frame_callback_ptr)::<lambda(librealsense::frame_holder)> >::on_frame(rs2_frame *) (this=0x7cee20, fref=0x811878) at /home/realsense/librealsense/src/types.h:943
#9 0xb624991c in librealsense::frame_source::invoke_callback(librealsense::frame_holder) const (this=0x62be4c, frame=...) at /home/realsense/librealsense/src/source.cpp:125
#10 0xb61f8788 in librealsense::uvc_sensor::<lambda(librealsense::platform::stream_profile, librealsense::platform::frame_object, std::function<void()>)>::operator()(librealsense::platform::stream_profile, librealsense::platform::frame_object, std::function<void()>)
(__closure=0x7d0218, p=..., f=..., continuation=...) at /home/realsense/librealsense/src/sensor.cpp:379
#11 0xb62059f0 in std::_Function_handler<void(librealsense::platform::stream_profile, librealsense::platform::frame_object, std::function<void()>), librealsense::uvc_sensor::open(const stream_profiles&)::<lambda(librealsense::platform::stream_profile, librealsense::platform::frame_object, std::function<void()>)> >::_M_invoke(const std::_Any_data &, librealsense::platform::stream_profile &&, librealsense::platform::frame_object &&, std::function<void()> &&) (__functor=..., __args#0=..., __args#1=..., __args#2=...)
at /usr/include/c++/8/bits/std_function.h:297
#12 0xb5ecc900 in std::function<void (librealsense::platform::stream_profile, librealsense::platform::frame_object, std::function<void ()>)>::operator()(librealsense::platform::stream_profile, librealsense::platform::frame_object, std::function<void ()>) const
(this=0x62a3d4, __args#0=..., __args#1=..., __args#2=...) at /usr/include/c++/8/bits/std_function.h:687
#13 0xb604eccc in librealsense::platform::v4l_uvc_device::poll() (this=0x62a2dc) at /home/realsense/librealsense/src/linux/backend-v4l2.cpp:965
#14 0xb6050cc8 in librealsense::platform::v4l_uvc_device::capture_loop() (this=0x62a2dc) at /home/realsense/librealsense/src/linux/backend-v4l2.cpp:1305
#15 0xb604d49c in librealsense::platform::v4l_uvc_device::<lambda()>::operator()(void) const (__closure=0x623d9c) at /home/realsense/librealsense/src/linux/backend-v4l2.cpp:755
#16 0xb60554c8 in std::__invoke_impl<void, librealsense::platform::v4l_uvc_device::stream_on(std::function<void(const librealsense::notification&)>)::<lambda()> >(std::__invoke_other, librealsense::platform::v4l_uvc_device::<lambda()> &&) (__f=...)
at /usr/include/c++/8/bits/invoke.h:60
#17 0xb605403c in std::__invoke<librealsense::platform::v4l_uvc_device::stream_on(std::function<void(const librealsense::notification&)>)::<lambda()> >(librealsense::platform::v4l_uvc_device::<lambda()> &&) (__fn=...) at /usr/include/c++/8/bits/invoke.h:95
#18 0xb6058958 in std::thread::_Invoker<std::tuple<librealsense::platform::v4l_uvc_device::stream_on(std::function<void(const librealsense::notification&)>)::<lambda()> > >::_M_invoke<0>(std::_Index_tuple<0>) (this=0x623d9c) at /usr/include/c++/8/thread:244
#19 0xb6058918 in std::thread::_Invoker<std::tuple<librealsense::platform::v4l_uvc_device::stream_on(std::function<void(const librealsense::notification&)>)::<lambda()> > >::operator()(void) (this=0x623d9c) at /usr/include/c++/8/thread:253
#20 0xb60588f0 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<librealsense::platform::v4l_uvc_device::stream_on(std::function<void(const librealsense::notification&)>)::<lambda()> > > >::_M_run(void) (this=0x623d98) at /usr/include/c++/8/thread:196
#21 0xb43949b0 in () at /usr/lib/arm-linux-gnueabihf/libstdc++.so.6
#22 0xb6ef1494 in start_thread (arg=0xabbfef40) at pthread_create.c:486
#23 0xb4771578 in () at ../sysdeps/unix/sysv/linux/arm/clone.S:73Frame #0 is at line 117, here's the context:
112 }
113 void frame_source::invoke_callback(frame_holder frame) const
114 {
115 if (frame)
116 {
117 auto callback = frame.frame->get_owner()->begin_callback();
118 try
119 {
120 frame->log_callback_start(_ts ? _ts->get_time() : 0);
121 if (_callback)So, it seems that the invoke_callback() function receives the parameter "frame" which contains:
(gdb) p frame
$24 = {frame = 0x811878}Inspecting it we found:
(gdb) p frame.frame
$25 = (librealsense::frame_interface *) 0x811878(gdb) p frame.frame->get_owner()
$26 = (librealsense::archive_interface *) 0x0And here we are: get_owner() returns a NULL pointer, but line 117 still wants to dereference it to call begin_callback().
I don't know if this is due to a misuse of the librealsense api on my side or it is an actual bug. Probably it is not expected that a frame has "no owner"... still, this is what happens.What do I do now MartyG? Should I file a bug on your bug tracker? Can you provide a link, or a work-around for this issue?
Thanks a lot!
-
MartyG, I have make a correction to what I wrote above.
The previous analysis was made on the Raspbian image you linked a few days ago. There, the librealsense version is 2.34. While inspecting the problem, I found also a few bugs on my side, and fixed them. Still, on THAT version of librealsense the application crashes and that is probably due to the bug I described.
But know I got back on librealsense v.2.40, and the application is now running without crashing. So before reporting a new bug I think it's better to verify if that bug has already been fixed. Can I ask you to check this? I have searched on the issues of the github repo but could not find anything.
Thank you for your valuable help!
-
Research into futex_abstimed_wait_cancelable indicates that it is involved in multi-threading, which would explain the 20 threads listed above. Example:
https://stackoverflow.com/questions/54766479/logging-multithreading-deadlock-in-python
If your program works on your laptop but not on Pi 4, I wonder if that suggests a conflict with running multi-threaded code in the Pi's Arm processor rather than the x64-type processor that your laptop likely uses
-
I would add that SDK 2.34 is not recommended to be used if possible as it had problems with continuously-generating timing errors.
-
Hi everybody! I am back on this issue because I managed to spend some time on this code after quite a while. In the meantime, we got SDK version 2.41. I updated that (both on my laptop and the raspberry Pi) but that didn't solved the problem. I performed further analysis by building the SDK form sources (on tag v2.41, of course) so to be able to debug both the application and the Realsense SDK. I am still using Valgrind, since this seems to speed up things a lot (the application crashes very quickly, conversely it would take some time before crashing when running without valgrind). However, here's the point. Valgrind shows an illegal read: there's no doubt, a NULL pointer has been dereferenced inside libRealSense.

Having built librealsense in debug mode, I have a code snippet for each line of the stack trace. Let me report just the last one, for briefness:

On line 195, there are 2 pointers: 'this' and 'owner'. The former cannot be NULL. So I guess the problem is with the latter, 'owner'. I already found previously (see above posts) that there were cases where 'owner' seemed to be null and still dereferenced. Now this new finding strongly suggest that it could happen.
Is there anyone that can shed a light on the problem? Should I open a bug? Could that depend on a mis-use of the API?Thanks in advance!
Regards
Fabrizio
Please sign in to leave a comment.
Comments
36 comments