Projects

Here is a (most probably out-of-date and incomplete) list of projects I'm currently working on.

Please check out my GitHub for a more complete list and do get in touch if you want to collaborate on something.


Work Projects

I work for OCCRP as a data engineer. We build and maintain a bunch of opensource projects for the journalism and OSINT community. A significant perk of working on opensource projects as my day job is that I'm free to share my work and brag about it 😉 Here's a list if projects that I'm particularly proud of:

Aleph

Aleph is a tool for indexing large amounts of both documents (PDF, Word, HTML) and structured (CSV, XLS, SQL) data for easy browsing and search. It is built with investigative reporting as a primary use case. Aleph allows cross-referencing mentions of well-known entities (such as people and companies) against watchlists, e.g. from prior research or public datasets.

Aleph

I've been involved with building Aleph's data ingestion pipeline, its Redis based task queuing and monitoring system and parts of its API as well as deploying and maintaining OCCRP's very own Aleph instance.

Memorious

Memorious is a distributed web scraping toolkit. It is a light-weight tool that schedules, monitors and supports scrapers that collect structured or un-structured data.

I am a maintainer of the Memorious toolkit. In addition to adding new features to Memorious itself, I also look after a fleet of crawlers called OpenSanctions - an open-source repository of sanctions data, politically exposed persons, and other entities of interest.

PDFLib

PDFLib is a Python library that I wrote to provide Python binding for poppler. PDFlib offers a simple API to extract text and images from PDF files.

urlnormalizer

As you have probably guessed, it's a Python library that normalizes URL strings pragmatically.

Personal Projects

qt5reactor

Twisted and PyQt5 eventloop integration. I no longer maintain the project; it's maintained by other volunteers. Used by a bunch of real world projects like: Splash,Gridsync, VirtScreen,Inkcut and more.

PirateMap

A procedural treasure map generator web app. While this is not the most sofisticated map generator, I had lots of fun building it over a weekend.

... and many more weekend hacks and opensource contributions available on GitHub.