header image

CRT in DLLs and memory allocation

This is how a debug build behaves, compiled with VS2013. Release build is different and the differences are a bit baffling. I’ll write about that later.

I’ve spent a lot of time hunting an elusive memory corruption bug. At least it looked like memory corruption: release build just crashed on access violation, debug build was throwing a CRT assert on free().

CRT debug heap assertion

CRT debug heap assertion

Evidently something was wrong with the heap when free() was called — debug builds perform heap checking after a certain number of heap operations. You can force heap checking at every (de)allocation with _CrtSetDbgFlag(), it’s a good idea in cases like this when it wasn’t immediately obvious why the free() would cause problems. The memory was good, no double-freeing was occurring. Yet commenting that free() out seemed to remove all problems (memory leaks notwithstanding).

I’ve been scratching my head for a bit, tried the Application Verifier, nothing seemed to bring me closer to diagnosing the core issue. Finally I analyzed where this freed memory was coming from. It was allocated in a call imported from a utility DLL. Gears started to move faster in my head when I checked the DLL’s project settings, specifically how the C runtime was linked. It was linked statically.

Why was that causing problems? Well, when the CRT is statically linked to a DLL, that DLL creates its own CRT heap for your malloc()s and free()s. That means if you malloc() something in a DLL with static CRT you can’t free() it from the main program because the free() will operate on a different CRT heap. You can see the virtual memory breakdown of a process with VMMap, an excellent Sysinternals utility.

Heaps with static CRT DLL

Heaps with static CRT DLL

Heaps with dynamic CRT DLL

Heaps with dynamic CRT DLL

We can see the additional heaps when the DLL contains statically linked CRT.

What’s the moral of the story?

  • Never link the CRT statically unless you really know what you’re doing. It may be tempting to avoid the need of CRT run-time DLLs but it’s almost always not worth it. Why duplicate the CRT code in your executables? Just ship the CRT redistributables with your installer.
  • If you really need static CRT in your DLL, free all DLL-allocated memory in the DLL, not outside of it.

Misleading Windows error messages

I know that these are relics of the 3.x era and of course can’t be changed because ~compatibility~ but come on.

// MessageText:
// Not enough storage is available to process this command.

// MessageText:
// Not enough storage is available to complete this operation.

// MessageText:
// A device attached to the system is not functioning.

// MessageText:
// The file name is too long.

How not to design APIs and other coding horrors

Exhibit one: Xen’s libxc, a control library for the hypervisor.

 * Return an fd that can be select()ed on.
 * Note that due to bugs, setting this fd to non blocking may not
 * work: you would hope that it would result in xc_evtchn_pending
 * failing with EWOULDBLOCK if there are no events signaled, but in
 * fact it may block.  (Bug is present in at least Linux 3.12, and
 * perhaps on other platforms or later version.)
 * To be safe, you must use poll() or select() before each call to
 * xc_evtchn_pending.  If you have multiple threads (or processes)
 * sharing a single xce handle this will not work, and there is no
 * straightforward workaround.  Please design your program some other
 * way.
int xc_evtchn_fd(xc_evtchn *xce);

That’s very cross-platform. I’m sure I can select() on a fd on Windows.

 * Creates and shares a page with another domain, with unmap notification.
 * @parm xcg a handle to an open grant sharing instance
 * @parm domid the domain to share memory with
 * @parm refs the grant reference of the pages (output)
 * @parm writable true if the other domain can write to the page
 * @parm notify_offset The byte offset in the page to use for unmap
 *                     notification; -1 for none.
 * @parm notify_port The event channel port to use for unmap notify, or -1
 * @return local mapping of the page
void *xc_gntshr_share_page_notify(xc_gntshr *xcg, uint32_t domid,
                                  uint32_t *ref, int writable,
                                  uint32_t notify_offset,
                                  evtchn_port_t notify_port);

I love magic parameters! Especially if you compare an unsigned value to -1.

~ ~ ~

Exhibit two: libxenvchan, a inter-domain communication library for Xen. Let’s look at a few function prototypes:

struct libxenvchan *
    xentoollog_logger *logger,
    int domain,
    const char* xs_path,
    size_t read_min,
    size_t write_min

 * Packet-based receive: always reads exactly $size bytes.
 * @param ctrl The vchan control structure
 * @param data Buffer for data that was read
 * @param size Size of the buffer and amount of data to read
 * @return -1 on error, 0 if nonblocking and insufficient data is available, or $size
    struct libxenvchan *ctrl,
    void *data,
    size_t size

/** Amount of data it is possible to send without blocking */
    struct libxenvchan *ctrl

Notice anything strange? All size input parameters are of type size_t. libxenvchan_recv returns -1 on error, 0 if nonblocking and insufficient data is available, or $size. I wonder what does it return if I pass size that’s bigger than INT_MAX?

 * reads exactly size bytes from the vchan.
 * returns 0 if insufficient data is available, -1 on error, or size on success
int libxenvchan_recv(struct libxenvchan *ctrl, void *data, size_t size)
    while (1) {
        int avail = fast_get_data_ready(ctrl, size);
        if (size <= avail)
            return do_recv(ctrl, data, size);
        if (!libxenvchan_is_open(ctrl))
            return -1;
        if (!ctrl->blocking)
            return 0;
        if (size > rd_ring_size(ctrl))
            return -1;
        if (libxenvchan_wait(ctrl))
            return -1;

static int do_recv(struct libxenvchan *ctrl, void *data, size_t size)
    int real_idx = rd_cons(ctrl) & (rd_ring_size(ctrl) - 1);
    int avail_contig = rd_ring_size(ctrl) - real_idx;
    if (avail_contig > size)
        avail_contig = size;

    // ...snip...

    return size;


Internally all sizes are truncated to int without any warning. I’ve also learned that gcc allows pointer arithmetic on void* without any warnings by default:

    ctrl->read.buffer = ((void*)ctrl->ring) + SMALL_RING_OFFSET;
    ctrl->read.buffer = ((void*)ctrl->ring) + LARGE_RING_OFFSET;

Oh, and the only way to set the communication mode to blocking is by setting a field in the control structure directly. Why is the structure exposed to clients instead of being an opaque pointer again?

~ ~ ~

And lastly, a great implementation of __sync_fetch_and_and on Windows.

These builtins perform the operation suggested by the name, and returns the value that had previously been in memory. That is,
{ tmp = *ptr; *ptr op= value; return tmp; }
/* unfortunatelly no 8-bit equivalent for Interlocked{And,Or} */
#define __sync_or_and_fetch(a, b) (*(a)) |= (b)
#define __sync_fetch_and_and(a, b) (*(a)) &= (b)

I don’t even know what to say at this point. And yes, InterlockedAnd8 exists.

Tinkering with Arduino

I decided to play a bit with my old Arduino again. It’s a pretty dated model (Duemilanove). Based on the ATmega328P microcontroller it’s clocked at 16MHz, has 30kB of usable flash and 2kB of SRAM.

Arduino Duemilanove

Arduino Duemilanove

That’s not much to go with but I always wanted to make something that would transform the board into some kind of security device for the PC: an authentication token, encryption device, whatever. Of course the ATmega328P doesn’t have any “secure memory” so it can’t be anything serious, but I like to learn new things. In the past I’ve made some circuits with sensors, LCDs, the usual basic stuff. I’ve never really tried a two-way communication with PC though.

Continue reading ‘Tinkering with Arduino’

Read at your own discretion

MmGetMdlPfnArray macro has an interesting remark in the description:

Changing the contents of the array can cause subtle system problems that are difficult to diagnose. We recommend that you do not read or change the contents of this array.

Right. I didn’t want to read the information I requested anyway.

Serious question: why would reading the PFN array cause problems?

Down with the sickness

Yeah, I like Disturbed (in small doses).

Being sick sucks. Especially when you don’t really get sick that often and this flu-like thing hits you suddenly with full force. Dehydration, weakness, trouble to keep your thoughts together.

Oh well, at least I can go through my backlog of The Old New Thing. Why did I stop following Raymond? I guess when I finally ditched Opera 12 and forgot to update my RSS feeds. I miss you sweet princess.

New friends

Well, not really new — they are with us for a few months now. This post was long overdue but here we go.

The black and white male is Sutomir Stanisław (named after a friend). He was found near a garbage dump in a cage, someone just left him there. He’s the most curious one (well, he and Szafa) and has no survival instinct — goes everywhere, checks everything out.

The small black female is Szafa. She’s probably the smartest of the bunch.

The big black male is Zabrak (named after another friend of ours). He’s a bit… slow, not sure if that’s some mental condition or he’s just lazy.

The two albino twin females are Złomiarka and Kuklux. They are lab rats. They’re either blind or see very poorly, but that doesn’t stop them from being very active (and fast). They were pretty leery of humans at the beginning but now are very friendly.

Kernel debugging Windows HVMs under Xen

Having troubles setting up kernel debugging between two Windows HVMs in a Xen environment? I’ve spent a lot of time lately troubleshooting this, it just didn’t work. Finally it turned out that Xen was corrupting serial port data (yeah, I really neaded that CRLF conversion). Check out this article for details. It’s written with Qubes in mind, but it should be easy to adapt for other environments.

Running processes on the Winlogon desktop

Disclaimer: this is a Bad Idea unless you know exactly what you’re doing, why you’re doing it and there are no alternatives. Please use responsibly.

There may be circumstances where you’d like to programmatically interact with the Winlogon desktop (the one that houses the LogonUI process responsible for displaying logon tiles and the rest of, well, logon UI). Test automation, seamless VM integration, whatever. It’s not easy though, and for good reasons:

  • Logon desktop created by Winlogon has ACL that only grants access to the SYSTEM account. We need a service to access it. Changing that ACL to allow other accounts is a very bad idea.
  • When a user chooses “Switch user” from the Start Menu or when the system first boots and displays the logon UI, it’s all done in a separate session. If a new user logs on, the Winlogon session is reused for that user’s interactive logon. If there is no new logon (but a user switch or unlocking a locked session for example), the Winlogon session is destroyed.
Temporary processes in a Winlogon session after quick user switch

Temporary processes in a Winlogon session after locking/unlocking user session

So, our service needs to monitor session changes and act when a Winlogon session is created (and made interactive). The code below demonstrates how you can create a process that runs on the Winlogon desktop and can interact with it.

Service code Show

You can install the service using following command (mind the spaces):

C:\>sc create QTestService binPath= d:\test\QService.exe type= own start= demand
[SC] CreateService SUCCESS

Then start it with:

sc start QTestService

After starting, choose “Switch user” or “Lock” from the Start Menu. You should see a console window on the logon screen.

A crack on the glass

A paper I’ve been working on lately is now public. In it I dissect various Windows security mechanisms and how effective they really are for providing sandbox-like isolation for groups of arbitrary applications. This work was performed for Invisible Things Lab and Qubes OS.