I am getting a crash when a DLL written in C++ unloads. It looks like this has something to do with heap corruption. Based on the following partial stack trace, I think my application is trying to use Fault Tolerant Heap. I have tried to turn this off to reveal the original source of heap corruption, but I don't think this is working.

ntdll.dll!RtlReportCriticalFailure()  + 0x56 bytes	
ntdll.dll!RtlpHeapHandleError()  + 0x12 bytes	
ntdll.dll!RtlpHpHeapHandleError()  + 0x7a bytes	
ntdll.dll!RtlpLogHeapFailure()  + 0x45 bytes	
ntdll.dll!RtlpFreeHeapInternal()  + 0x4e0 bytes	
ntdll.dll!RtlFreeHeap()  + 0x51 bytes	
AcLayers.dll!NS_FaultTolerantHeap::
	APIHook_RtlFreeHeap()  + 0x431 bytes	
ucrtbase.dll!_free_base()  + 0x1b bytes	
Physics_Output_Suite_DLL.dll!std::_Deallocate(
	void * _Ptr, unsigned __int64 _Count, 
	unsigned __int64 _Sz)
Physics_Output_Suite_DLL.dll!
	std::vector<int,std::allocator<int> >::_Tidy()
Physics_Output_Suite_DLL.dll!
	`dynamic atexit destructor for 'DPSetupStatus''()
	+ 0x19 bytes	C++
ucrtbase.dll!<lambda_f03950bc5685219e0bcd2087efbe011e>::
	operator()()  + 0xa6 bytes	
ucrtbase.dll!__crt_seh_guarded_call<int>::operator()
	<<lambda_7777bce6b2f8c936911f934f8298dc43>,
	<lambda_f03950bc5685219e0bcd2087efbe011e> &,
	<lambda_3883c3dff614d5e0c5f61bb1ac94921c> >()
ucrtbase.dll!_execute_onexit_table()  + 0x34 bytes	
Physics_Output_Suite_DLL.dll!
	dllmain_crt_process_detach(const bool is_terminating)
Physics_Output_Suite_DLL.dll!
	dllmain_dispatch(HINSTANCE__ * const instance, 
	const unsigned long reason, void * const reserved)

Based on the call to NS_FaultTolerantHeap::APIHook_RtlFreeHeap()
I think the application is still using fault tolerant heap. I have verified that the vector being cleaned up is not itself corrupt (the inner pointer passed to Deallocate has the same value as it originally did). I don't see how this vector could possibly have been cleaned up twice, since it is a static global variable and it would only get cleaned up once when the DLL unloads.

Here are the steps I've tried to use to turn off fault tolerant heap:

  • I've stopped and disabled the Diagnostic Policy Service
  • In the registry, I set HKEY_LOCAL_MACHINE\SOFTWARE\
  • Microsoft\FTH\Enabled to 0.

  • I ran the following command as admin:
  • Rundll32.exe fthsvc.dll,FthSysprepSpecialize

  • I restarted my computer several times.
  • In the Event Viewer, I've verified that there's no recent activity in the Fault Tolerance Heap event log (under Applications and Services / Microsoft / Windows).
  • Is there some other service that would use the fault tolerant heap API? What else should I be doing to troubleshoot?

    I suggets you could try to disable it for a single application: delete the FTH of the HKEY_LOCAL_MACHINE. And then set this to zero: HKEY_LOCAL_MACHINE\Software\Microsoft\FTH\Enabled

    Refer to the thread: https://stackoverflow.com/questions/5020418/how-do-i-turn-off-the-fault-tolerant-heap

    I don't know the details of your code, how it is compiled, etc. But please keep in mind that, if you have say DLLs with a C++ interface, that expose C++ objects and STL classes at the interface boundaries, before VS 2015 you are not guaranteed binary compatibility between EXE and DLLs built with different versions of the MSVC++ compiler. Things have changed since VS 2015, as you can read from this article:

    C++ binary compatibility between Visual Studio versions
    https://learn.microsoft.com/en-us/cpp/porting/binary-compat-2015-2017?view=msvc-170

    So, if for example you are trying to use a DLL built with VS2013 from an EXE built with a different version of the MSVC compiler, I would encourage you to either rebuild everything using the same MSVC compiler version (and same flavor of the CRT), or move to a more recent version of MSVC and enjoy the added binary compatibility.

    Other options to use DLLs from EXEs built with potentially different compiler versions are: #1 exposing a pure C interface (like Win32 APIs do), or #2 expose COM interfaces at the DLL boundaries.

    @Giovanni Dicanio
    Our code is a mix of managed and unmanaged code. The unmanaged DLLs are all currently compiled with VS2015, I think. Also, the crash I am seeing appears to be a relatively recent problem. If different DLLs had been compiled with different versions of Visual Studio, I would have expected this problem to appear sooner, since we have a standard (automated) build process. Our core simulation uses the Unity engine (version 5.3.4f1). Some DLLs are loaded directly, as Unity plugins (basically interop). Other DLLs are loaded dynamically by one particular unmanaged DLL loaded early in the startup process.

    @Dan Simkin It's kind of hard to make a good diagnosis without seeing the actual code like in a detailed code review. We are more in the "psychic debugging" domain here. Anyway, as it seems you are not sure that all the modules (EXE and DLLs) are built with the same compiler, I would suggest to double check that and if possible do a rebuild-all for all the modules using the same VC++ compiler version and the same settings and flavor of the CRT (e.g. don't mix debug-build DLLs with release-build EXEs).

    Moreover, if you have a global std::vector variable in your DLL, I would suggest to make it a static local variable, and export an accessor function from the DLL, e.g.:

    // Export the MyDllGetDataVector() accessor function from the DLL
    std::vector<int>& MyDllGetDataVector() 
        static std::vector<int> singletonDataVector;
        return singletonDataVector;
    

    Clients of the DLLs will invoke that function to access the vector object, instead of relying on direct access to a global variable exported by the DLL.

    @Giovanni Dicanio
    I did confirm that the DLLs were being built with VS2015. The vector in which corruption is occurring is private to the DLL, but I could try accessing it internally with an accessor function if you think it would help. I will keep checking recent changes to see if reverting any of them fixes the problem.

    @Giovanni Dicanio
    Mixed news. I realized that my DLL was still briefly holding on to an invalid memory pointer after the problem code was called. Since this is only used read-only, I don't see how this would be causing memory corruption, but I nulled out the pointer anyway, and now I am seeing different behavior. 

    The bad news is I am still crashing on shutdown. The good news is that I am no longer seeing the fault tolerant heap call in the call stack after the crash (just the normal RtlFreeHeap call). Also, the global vector being corrupted is a different one. The fact that a different vector is being corrupted makes me think that the corruption has nothing to do with code that accesses either vector, and the corruption is just collateral damage.

    @Dan Simkin Thanks for the update.

    You may have problems because you are using global (or static) variables that have non-trivial destructors, like std::vector. These can cause subtle bugs.

    I wonder if you could change your code, for example using a constexpr std::array<int, N> instead of a std::vector<int>. In fact, in that case the std::array would have a trivial destructor that would not cause problems.

    You may also export functions from your DLL to access that array, instead of exposing the array object itself. For example:

    size_t MyDll_GetArraySize()
        return theArray.size();
    

    and similarly to access the array's elements for read access.

    In all this theoretical domain of hypotheses (as I don't have access to your real code to verify), you may also try using something like a static local variable that is a reference to a dynamically-allocated std::vector, like this:

    // Inside a function in your DLL use a local static reference (note the &)
    static std::vector<int>& myData = *new std::vector<int>{};
    

    For example:

    std::vector<int> & GetMyData() {
        static std::vector<int>& myData = * new std::vector<int>{};
        return myData;
    			 

    Heap corruption issues can be quite challenging to diagnose and resolve. It seems like you have already taken some steps to disable the Fault Tolerant Heap (FTH) but are still experiencing crashes related to heap corruption. Here are a few suggestions for further troubleshooting:

  • Check for other dependencies: Verify if any other DLLs or components in your application are using FTH. It's possible that another component is triggering the fault tolerant heap, leading to the heap corruption. Review the dependencies and investigate if any of them have FTH enabled.
  • Review your code for potential issues: Examine your codebase for any potential memory-related bugs or incorrect memory management practices that could lead to heap corruption. This includes checking for buffer overflows, uninitialized variables, use-after-free, double-free, or other memory access issues.
  • Enable debug options: Enable debugging options in your development environment and build the DLL with debug symbols. This will provide more detailed information about the crash, including memory addresses and call stack information, which can help pinpoint the source of the problem.
  • Use memory analysis tools: Utilize memory analysis tools such as Valgrind (for Linux) or Application Verifier (for Windows) to detect memory-related issues. These tools can help identify memory leaks, buffer overflows, and other memory errors that might be causing heap corruption.
  • Perform code review and testing: Review the code thoroughly, paying attention to areas that deal with memory management, especially in the context of global variables and static objects. Consider running extensive tests and stress tests to reproduce the issue consistently and narrow down its cause.
  • Consult the community or experts: Seek help from relevant developer communities, forums, or online platforms where experienced developers can provide insights or guidance on similar issues. They might have encountered similar problems and can offer suggestions specific to your programming language, environment, or libraries.
  • Remember that troubleshooting heap corruption issues can be a complex process. It often requires a combination of techniques, including code analysis, debugging, and testing, to identify and resolve the underlying problem.