Dr. Vadim Zaytsev

[V. Zaytsev] ::= [About] [Research] [Teaching] [Practice] ;

BabyCobol (2019–2023) is a project in language design aimed at creating a language that is, on one hand, small enough to be quickly implementable (fully or partially) within any framework that can support its features, and, on the other hand, complex enough to cover typical problems of legacy language processing. If you learn how to compile MiniJava, you stand a good chance of implementing a reasonably good compiler for any contemporary programming language. If you show how your language extension works on Featherweight Java, it has a good chance of being applicable to any reasonable modern object-oriented programming language. If you can handle BabyCobol with your tool and with your skills, you are ready to face the challenges of software modernisation, codebase migration and legacy language processing in general. At this day and age, being future proof means being able to handle software of the past.

Engage! (2019–2021) is an event-based parser generator. As far as I know, this is the only event-based parser generator in existence (please tell me if I'm wrong, would be glad to hear of similar projects), even though event-based parsers like SAX or RxParse are not really rarities. It was designed and developed as an experiment and published at the REBLS workshop at SPLASH 2019. The PDF of the paper is freely available, it contains a more detailed and precise description of the idea, some implementation details and empirical comparison of parsers. The code of Engage! got improved a bit further due to a successful research project of Frank Groeneveld.

Software Language Engineering Body of Knowledge, or SLEBoK (2017–2023) is a community-wide effort to provide a unique and comprehensive description of the concepts, tools and methods developped by the Software Language Engineering community. It features artefacts, definitions, methods, techniques, best practices, open challenges, case studies, teaching material, and other components that would help students, researchers, teachers, and practitioners to learn from, to better leverage, to better contribute to, and to better disseminate the intellectual contributions and practical tools and techniques coming from the SLE field. I am the current Editor-in-Chief of the project, and as such maintain the website and a collection of scripts that generate different parts of it from backend data.

High Level Assembler, or HLASM (2015–2020) is a Raincode compiler that takes IBM HLASM programs as input, treats them as if they were written in a properly high level language, and compiles them to the .NET Core or .NET Framework. It uses a sophisticated design to be able to both compile self-modifying code correctly, and produce efficient .NET code that runs only one order of magnitude slower than on a mainframe. Within the project, I was responsible for implementing the instruction set, which resulted in 423 commits and a whole lot of code, models and generators. I have spoken on this topic at several closed meetings (IFIP, Dagstuhl, Shonan), at open conferences, coauthored a list of papers (ECMFA’20, BENEVOL’19, MoreVMs’17, SLE’16) and served as an internal expert (from the Raincode side) for Gartner Group inspection. The release of this compiler led to the Top Performer for Mainframe Migration award received by Raincode from Microsoft in 2016.

TIALAA, or There Is A Life After AppBuilder (2017–2020) is another Raincode compiler proposed as a replacement for Magic Software Enterprises’ AppBuilder which is a 4GL compiling to COBOL on the server-side and Java on the client-side. I was one of the analyst/developers during the consulting phase of the project, and became the tech team lead when we went into design and development. By the time I’ve left the company, the repository with the compiler sources had 4355 commits, 2625 of which were mine. The extreme intensity of the engineering effort demanded by the project, as well as the relative secrecy surrounding the product as such and its main first customer, made it so that very little information escaped the walls of Raincode offices. Yet, there was the GPCE’17 paper on parsing HpsBindFile notation, a PX/17.2 essay generalising on the nature of such projects, the SLE’18 paper and its Dagstuhl variant on testing TIALAA. The project has also led to some insights scattered around other abstracts, papers, lectures and keynote speeches.

Grammar Zoo (2009–2016) is a project to accumulate grammars in a broad sense of various software languages, extracted and recovered from language documentation, parser specifications and other artefacts and make them available in a range of formats. It started as a double project of Grammar Zoo and Grammar Tank, where the latter was meant for smaller grammars (that cannot speak much, hence the name being closer to the fishtank than the zoo), but got merged later with an introduction of a system of navigational tags. The tools contributing to the Grammar Zoo can be found across SLPS, a large contribution is due to the Grammar Hunter (cf. LDTA’12 paper) and the GrammarLab. When the project reached its critical mass, I wrote an extended SCP article about it.

GrammarLab (2010–2013) was a project initially imagined at the SWAT team of the CWI, and was used throughout my employment there as a testbed for many new grammar-based and grammar-driven techniques for testing (SLE’12), notation analysis (SAC’12), coevolution (BX’12), recovery (LDTA’12), convergence (SLE’13), renarration (MPM’12, XM’13), negotiated evolution (XM’12, JOT’14), pending evolution (XM’13), micropatterns (SLE’13), parsing (SANER’14), mutation (SQM’14), maturity (ME’14), etc. The repository contains 248 commits, mostly authored by me. At some point GrammarLab was the largest Rascal library, but has long been overtaken by others. GrammarLab was also occasionally used outside of CWI, independently from any influence or interaction from me (e.g., SPE’19).

Bibliography of Software Language Engineering in Generated Hypertext, or BibSLEIGH (2014–2020) is a project on facilitated browsing of scientific knowledge objects in software language engineering, bridging to other domains of software engineering, computer science and artificial intelligence. It is a work in progress with great potential, and so far only has one publication at SATToSE’15 associated with it.

(Un)Parsing in a Broad Sense, (2014) is a megamodel of 12 classes of artefacts found in software language processing, depicting usual mappings and transformations among them. It can be used in teaching and other forms of knowledge sharing, and in general for technological space travel, where one complex piece of software for language processing (such as a language workbench) is compared to another similarly complex piece of grammarware, and one quickly needs to understand the difference: do you need a separate tokeniser and a parser? are abstract syntax classes generated automatically? is there visualisation? how adjustible is it? The idea resulted in a MoDELS’14 paper and a series of smaller promotional presentations at various workshops, seminars and symposia. Its supporting repository contains ~3200 lines of Rascal code, written by me in more than 100 commits.

Software Language Processing Suite, or SLPS (2008–2012) was the first project of its kind to facilitate exposition and comparison of approaches and techniques on software language processing in a way that is relevant for computer science and software engineering students, teachers, scientists, engineers and practitioners. Projects that can be seen as its conceptual successors, are 101(companies), YAS, softlangbook, GrammarLab, etc.