Today, I ran across an interesting problem: I have a large set (10GB) of sql queries that I wanted to filter so I could import a small subset of the data. Obviously, some kind of stream filtering would be appropriate, but because the INSERT queries were in large batches, it seemed to be a bigger mental exercise than I wanted to tackle (mainly, writing a complex regular expression). After ruling that simplest approach out, I thought that lua-based mysql-proxy might be able to do the processing for me, providing a nice place in front of the database to do that filtering. Unfortunately, it doesn’t look like it will deal with these batch inserts in a friendly manner (which is probably a good thing, certainly for the proper use of mysql-proxy).
Finally, I thought about the nuclear option, a vastly over-engineered and rather insane solution: A pluggable storage engine that can take in the data and simply filter the material it chooses to store. I’m positive this breaks all manner of standards, but it would be a delightfully easy solution and give me a chance to play with the new MySQL plugin API. Sure, some people would hack the example c module to do these tasks, nice and easy. Boring!
Sometime back, I ran across Johannes Schlüter’s PHP-based MySQL Storage engine. I come from a PHP background, so this would obviously be the easiest engine to hack to my purposes. But this suffers from a serious lack of documentation, which, yes, could be overcome with a little work, but my first few attempts at compiling this ended in failure.
So, in an effort to add complexity, I realized that lisp/scheme might be a more efficient (read: unique) way to deal with this processing. To that end, I propose a lisp-backed MySQL storage engine (mainly, I also want to have a chance to tinker with lisp). I’ve gathered the documentation, I just need a chance to write some of the glue code.
- Source code for the MySQL PHP storage engine
- MySQL documentation for custom storage engines, actually a very interesting read
- Inside: PLT Scheme C API,which allows the interpreter to be extended by a dynamically-loaded library, or embedded within an arbitrary C/C++ program.
So, all that is missing is another long weekend for hacking this together.. February, I suppose.