Detecting Memory Leaks in Kernel & Managed code OSes

This blog post has two sections. Firstly, a section covering "Memory Leak detection in the Linux kernel in 10 easy steps" and the next section is about, "Implementing operating systems in managed code (like C#/Java)". Feel free to goto any section you prefer.

Introduction

A memory leak is a behavior of a program when it consumes memory but never releases it. In user-space, these days, new applications are written mostly in sophisticated, evolved, modern programming languages like C#, Java etc. This releases the burden of memory management from the programmer. Programmers need to manage memory themselves, only when they code in pre-historic programming languages like C ;-) There are nice tools (like Valgrind) that can detect memory leaks, if they happen in user-space. Valgrind won't work in Kernel space.

The Linux kernel is predominantly written in C, so that programmers can stay close to the hardware. Also there were no large-scale managed languages at the time the project was started.

There are some interesting projects that help in writing FUSE filesystems via Mono/C# etc. However your grumpy blogger didn't find them to be active and for all practical purposes, C is the default Kernel programming language.

Few days back, I was tasked with detecting memory leaks in a legacy kernel module that is not in Linus' tree. Code that goes into Linus' tree is usually of high quality and won't have memory leaks because of the rigorous reviews that are performed in LKML. So, any kernel code that is not merged upstream has a high chance of having leaks, among other bad things (so upstream your code, NOW). The simple tutorial below will explain how to detect memory leaks in kernel modules.

kmemleak

There is a nice tool named 'kmemleak' available in the Linux kernel since 2.6.31 to detect memory leaks. This tool is claimed to report a few false positives but that should not stop someone from using it. I was trying to find a kernel-space-leak-detector but did not find any links via Google. Some kernel hackers told me about this tool over IRC and this post is more as a pointer to the "kmemleak" docs, when somone googles for "kernel memory leak detection".

Pre-requisites:
+ You need to know how to build and install your own kernel.
+ You need to have 2.6.31 or newer version of the linux kernel
+ It is good if you know how to compile a kernel module. But don't worry if you don't know. You can refer to my previous tutorial for a simple hello-world kernel module.

So, without further ado, the steps are:

Step 1: Compile kernel with "CONFIG_DEBUG_KMEMLEAK" option enabled. You can get to this option via: make menuconfig, "Kernel Hacking", "Kernel Memory Leak Detector" , while compiling your kernel.

Step 2: Increase the config option "Maximum kmemleak early log entires" value to a sufficiently large number like 1200. The default value of 400 may not work correctly in all configurations.

Step 3: Install this kernel and Reboot to this newly configured kernel. Do not be alarmed if your machine is slow.

Step 4: Upon reboot, Check if your debugfs is mounted. Otherwise mount it. If all is well, you should see a file kmemleak under your debugfs mounted location.

mount -t debugfs nodev /sys/kernel/debug/
cat /sys/kernel/debug/kmemleak

The above /sys/kernel/debug/kmemleak file will contain information about any memory leak that has been detected so far since the machine booted. Ideally there should be none, until this point in time.

Step 5: Now we will see how we can detect a memory leak in a dummy kernel module as follows. Write a dummy kernel module with the following source (hello.c):


#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/vmalloc.h>

/* Never write a function like this ;) */
void myfunc(void)
{
        char *ptr;
        ptr = vmalloc(512);
        ptr = vmalloc(512);
        ptr = vmalloc(512);
}

int hello_init(void)
{
        printk(KERN_ALERT "Hello World");
        myfunc();
        return 0;
}

static void hello_exit(void)
{
        printk(KERN_ALERT "Goodbye World");
}

module_init(hello_init);
module_exit(hello_exit);

MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Your Name");


Now the most important line in the above code snippet is:
ptr = vmalloc(512);
We allocate memory, as above, in the kernel module but never free this memory.

Step 6: vi Makefile
EXTRA_CFLAGS=-g
obj-m := hello-kernel.o
hello-kernel-objs := hello.o

Step 7: Generate the kernel-object file hello-kernel.ko in your current directory:
make -C /lib/modules/`uname -r`/build M=`pwd`

All commands from now on require root permission.

Step 7.5: [Optional] At any stage, if you want to clear the memory profiler output so far created, so that we can focus just on the leaks reported from then on, you can do:
echo clear > /sys/kernel/debug/kmemleak

Step 8: Now insert the kernel object
insmod hello-kernel.ko

Step 9: The memory leak detection thread runs periodically. If you want to perform a test at any instant you want, Do:
echo scan > /sys/kernel/debug/kmemleak

Step 10: Now we will check if the leak is detected. Do:
cat /sys/kernel/debug/kmemleak
You should see:

unreferenced object 0xf9061000 (size 512):
comm "insmod", pid 12750, jiffies 14401507 (age 110.217s)
hex dump (first 32 bytes):
1c 0f 00 00 01 12 00 00 2a 0f 00 00 01 12 00 00 ........*.......
38 0f 00 00 01 12 00 00 bc 0f 00 00 01 12 00 00 8...............
backtrace:
[< c10b0001>] create_object+0x114/0x1db
[< c148b4d0>] kmemleak_alloc+0x21/0x3f
[< c10a43e9>] __vmalloc_node+0x83/0x90
[< c10a44b9>] vmalloc+0x1c/0x1e
[< f9055021>] myfunc+0x21/0x23 [hello_kernel]
[< f9058012>] 0xf9058012
[< c1001226>] do_one_initcall+0x71/0x113
[< c1056c48>] sys_init_module+0x1241/0x1430
[< c100284c>] sysenter_do_call+0x12/0x22
[< ffffffff>] 0xffffffff


As you can see in the bold text above, the leak is detected in myfunc function.

Caution: The memory leak detector code may take some time to identify the leaks. So repeat steps 9 and 10, after few minutes, if you don't get the leaks reported first time. You can try to kiss your hand elbow to pass time meanwhile ;-)

Further Reading
+ LWN Article about kmemleak - http://lwn.net/Articles/187979/
+ Under Kernel sources directory: Documentation/kmemleak.txt

Thanks a lot to Catalin Marinas for kmemleak and the people at kernelnewbies for helping, not just for this problem but for nuuuumerous people.

If you were looking for just kernel memory leak detection, the blogpost is over. If you don't mind reading about some other (un)related projects, continue reading onto Part 2.

Operating Systems in Managed Code

There are some hobby open-source projects aimed at implementing a OS using managed programming languages, like Mono or Java. But none of them have any official corporate backup yet. So they are in experimental state, such as: SharpOS, Cosmos, JNode etc.

+ Sun/Oracle has a product named JavaOS developed along with IBM, but I am not really sure how active this is. Its business model is unclear as well.

+ Singularity - The most high-profile name in this research is Microsoft. However, even they don't seem to be too active either. Singularity is their project aimed at creating a managed-code OS based on a microkernel architecture.

Sad that this is not purely open source. If MSFT uses an (L)GPL-like-license and gives some more technical vision / docs of this project, may be it could generate enough enthusiasm in the student and research communities.

With increasing number of CPU cores and excellent libraries like parallel extensions to C# and functional programming languages like F#, it may be fascinating and easy to do crazy things (like LINQ as an IPC mechanism) if you are implementing a OS in managed language. You can extend your kernel in any language say, Python, Ruby etc. using projects like IronPython, Ironruby.

Or may be it is just that I am over-expecting managed code to do wonders on behalf of the programmer.

The biggest benefit of writing an operating system in managed code is, it will be more Secure. There will be no more buffer-overflow vulnerabilities, pointer exploits etc.

Ten or fifteen years ago, there were far more operating systems, like Windows, Linux, Solaris, Symbian, Mac OS, OS/2, VMS, Haiku, BSD, etc. and the field was rich, competitive and interesting. Now there are just three major players Windows, Linux and Mac, in all devices ranging from Mainframes to Datacenters to Mobiles. As these commercial operating systems mature, they are becoming more boring to learn and do stuff, the students among you (my blog readers) can try to spend time in these managed-code OSes. Who knows, there could be the next Linus Torvalds in you.

Academics love to brag about Microkernel architecture and Linux hackers love to ridicule the cost of implementing a messaging system for such an architecture. However if you use a high-level language, with facilities like LINQ, protocol-buffers; this messaging/interaction system & Interfaces versioning etc. will be far easier to implement, atleast as per my understanding. This is probably the reason why all the managed code OS-es are based on a micro-kernel architecture.

There will be performance problems in such OSes, but performance is not the only criteria for an OS, as there are other benefits like Security (no pointers), Reliability (no null pointer dereference crashes, double free crashes, etc), Extensibility (Extend the OS in Python, C#, F#), etc.

Writing an Operating System may not be the most business-savvy decision but it will definitely help in understanding the science better. And in student life, one can afford to have a hobby project that may not have immediate day-job relevance.

Send in your feedback/comments/opinions about this post or talks/links/research-papers about operating systems in managed code. I would love to hear from you.

12 comments:

Anonymous said...

As a newbie, i would say that your article is very2 good! Thx alot

Sam TH said...

Lots and lots of people has written operating systems in what is now referred to as "managed languages". For example, Smalltalk [1], Haskell [2], Lisp [3] and so on. Safe languages were not invented in 1995.

[1] http://wiki.squeak.org/squeak/1762
[2] http://programatica.cs.pdx.edu/House/
[3] http://linuxfinances.info/info/lisposes.html

Spudd86 said...

I think there's a way to use valgrind on the kernel... but I'm not sure where I read that, or even remotely how it works...

Also I think the other reason managed code OS's would be microkernel style is that it leads to much smaller GC arenas and therefore shorter pauses.It also it makes it easier to bootstrap with drivers from someone else's OS. Plus these are about security/reliability, both of which microkernels have distinct advantages in, if you're willing to sacrifice a bit of performance (mostly latency, throughput shouldn't be much worse than monolithic if you do a good job)

Spudd86 said...

JavaOS is discontinued according to Wikipedia

Spudd86 said...

Also there's the open source JNode: http://www.jnode.org/ which is all Java except some small bits of assembly

Tretle said...

SharpOS has been on hold for a long time now, Its been replaced by mosa-project.

Joseph Cooney said...

Midori is the code-name for the commercialized version of Singularity. A number of high-profile MS developers including Joe Duffy, Rico Mariani, Chris Brumme & Daniel Lehenbauer are on the team. By all accounts they're a long way of shipping.

Radhika said...

Deadly combo; coding n writing..i should re-write your recommendation :P

Sankar said...

@Anony: Thanks

@Sam: Thanks. That is a nice list. Should try to explore it sometime.

@Spudd86: I was told about user-space-linux and that playing around with that could help in using it with valgrind and detecting leaks. I should try that soon. Thanks for visiting and the Java news

@Tretle: Oh okay. Thanks for the information.

@Joseph: Yes, it seems we may not see it in the near future.

@Radhika: Thanks. My minuscule writing skills are as a result of working with doc-writers who have good language skills :-)

Anonymous said...

Sankar I am sorry to bring bad news. JNode and others are not a save from memory issues. Bad code is bad code.

Java and .net both suffer from what is call object leaking. Kinda like memory leaking but its where you end up that the GC colector believes that particular objects are still in us so you simply run out of memory. The same issue memory leaks in Linux will cause at some point. Note it normally takes longer with a Linux kernel leak to be critical compared to a .net or java one. Reason object is a full data struct normally larger than what would be leaking from C.

You are failing to understand what java and .net saves you from. Not memory leaks but pointers pointing to no where. And pointers pointing no where is rare particularly on well maintained code.

Basically your example is bogus. .net and java have exactly the same issue just written a different way.

Also running http://coccinelle.lip6.fr/ or https://sparse.wiki.kernel.org/index.php/Main_Page looking for miss allocations will find those memory allocation faults.

Guess what running coccinelle or sparse over you driver code is Linux kernel recommend prac if you want to main line you driver. If you have not you will have one pissed off subsystem maintainer on your hands for wasting there time if a fault like your example contains exists in your code since its detectable.

So Linux has the tools to deal with this issue. Where are the .net and java to deal with object issues.

Sorry someone has lead you up the garden path. If you get to needing to do kmemcheck and find faults you have already screwed up. Yes a managed code OS still need equal to kmemcheck to detect object leaks.

Also the most scare form of Linux is tccboot Linux. Linux running from pure C source on harddrive built at runtime.

Sankar said...

Anon: Thanks for the comment. I will look into tccboot. I dont clearly understand the object-leakage, gc-brokenness that you mentioned. I will explore more about it.

However, my main point is, a large number of programmers in the world are very ordinary. If the burden of memory management is removed from them, they have less chance of screwing up the whole system. However, managed code OS-es are neither a panacea nor a clean solution.

Anonymous said...

Hi,

I have a query on Kmemleak output. There is some time given in jiffies "jiffies 14401507 (age 110.217s)".
1) What exactly does this time represent?
2) Also please explain the Hex dump?
3) And what are we to deduce from the back trace?
[< c10b0001>] create_object+0x114/0x1db

what does "+0x114/0x1db" represent?