Sunday, November 16, 2014

Course review: Language Engineering with MPS

Last week I followed a two-day course called "Language Engineering with MPS". The course was given by Markus Voelter.

MPS is a free software (using the Apache 2.0 license) framework built on top of Intellij IDEA. Both MPS and Intellij IDEA are actively developed by JetBrains. MPS can be used for implementing Domain-Specific Languages (DSLs), usually by extending a base language which by default is Java. Extending Java is not a requirement. In fact, Markus is involved in the development of mbeddr, which uses a clean version of the C language as the base for targeting embedded system development.

According to Markus textual-based language development tools such as Yacc, lex, Bison, ANTLR, and so forth are fading out because they lack support of an intelligent IDE. Although I'm not fully convinced about this statement I agree that IDE support when developing DSLs is a big plus. Do not overlook IDE support. It gives you (for free) autocompletion, a nice user interface, very readable error messages, instant deployment and debugging, and much more.

During the course we covered only external (context-free) DSLs, because Markus considers internal (context-sensitive) DSLs hacky, since they usually rely on the metaprogramming features of a specific language (Ruby, Lisp, etc.). This is most times either very limited or too complex (for example you end up with unreadable error messages).

Markus has a good knowledge in language design. He gave us some good tips regarding DSL development, such as forbidding Turing-completeness in the DSL to make the static analysis of a code block possible. Another tip was to support many keywords in the DSL (instead of having as few keywords as possible, which is considered good in general purpose languages like C) for giving the chance to the DSL user to provide hints about the performance and behavior of a code block. For example provide two keywords for for loops: the default for is (or actually tries to be) concurrent, while the alternative forseq is always sequential.

Our main course activity was to use MPS for developing an Entities DSL. An Entity is an abstraction that can have a variable number of attributes with validated types. We created our own typing system for that (using Java's typing system as a basis), which supports strings and numbers. An Entity can also have references to other Entities. Finally, we can define functions inside an Entity using the fun keyword. Here's an example of an Entity:

Notice how we can create custom error messages for informing the DSL users when they are trying to do erroneous things such as define a variable with the same name twice. Another error reported (underlined in red on the picture) is when the user tries to return an incorrect type from a function, in this case a string from a function that should return an integer (notice the :number part).

From what I've seen in the course I feel that MPS is an interesting tool with the following pros and cons.

  • Autocompletion.
  • Readable error messages. Even if a message is not very readable you can jump to the source code immediately using a single click.
  • Nice user interface.
  • In general it offers all the goodies of an IDE. Integrated debugging, many ways of searching, refactoring, and so forth.
  • The DSL user (domain expert) needs to install MPS for using our DSL. This usually requires some effort, because we need to create a customized (clean) version of MPS with all development features hidden/disabled to avoid confusing the user.
  • Like all tools, MPS requires time and effort to feel confident with it. Especially typing in the MPS editor can be confusing and frustrating because it is very different from free-text typing which is the usual way of writing code.
  • Documentation. There is only one book targeting explicitly MPS so far.
  • Lag on Windows. The hired laptops that we used during the course were quite powerful but MPS was still lagging on Windows. I have tested it on GNU/Linux and I don't have any issues (and neither did Markus on his MacBook). It seems that MPS has performance issues on Windows.

Saturday, November 8, 2014

Two less common tricks for improving unexplained slow MySQL queries

Recently I faced an SQL performance issue. What I wanted to do was rather common: Apply the (set) difference operation on two tables.

In relational algebra, the difference operation applied on two tables A and B gives as a result a new table C that contains all the elements that are in table A but aren't in table B.

That's a very common operation. A common example is having the table Students and the table Grades. To find all Students that have not been graded yet you can use the difference operation. Or in terms of set theory and using \ as the notation of difference:

{1,2,3} ∖ {2,3,4} = {1}

Some RDBMS have the difference operation built-in using the EXCEPT keyword. So the query in this case would be something like:

But that's not the case for MySQL. MySQL does not support EXCEPT but we can get the same result using a LEFT (OUTER) JOIN:
OK, so I used a LEFT JOIN to find out that the query was painfully slow. One table had 700 thousand records and the other 130 thousand records. For a relational database that's not a big deal (only a few seconds, let's say maximum 3).

If you search on the Web for slow LEFT JOIN query you'll see that everyone recommends (a) adding indexes and (b) using the SQL optimizer. Well, I had already done both things without achieving my goal. I added the indexes using CREATE INDEX and then used the optimizer by adding EXPLAIN in front of my query and made sure that the indexes were used properly.

So, what's left? Actually there are two other important things to check. The first is to inspect the output of SHOW PROCESSLIST. This will show you the list of active processes on the server. When writing queries killing the SQL client because it crashed/became unresponsive due to a bad query is not unusual. But killing the client does not necessarily mean that the query is killed. There might still be orphan queries that eat the resources of your server but you have no control of them. You can kill them using KILL PROCESS_ID (replace PROCESS_ID with the actual ID of the orphan process).

The second thing that really impacts the performance of MySQL is when trying to join tables that use different collations. MySQL uses by default the legacy latin1_swedish_ci collation, so if one of the tables you are trying to join is using a different collation (for example utf8_unicode_ci, which makes much more sense as a default nowadays) the joins become terribly slow. Just make sure that all database tables use the same collation.

Happy querying!

Monday, September 29, 2014

Book review: SQL Antipatterns

In SQL Antipatterns, Bill Karwin does a great job in explaining how to make efficient use of the relational model, instead of abusing it like it's usually done. Many developers abuse relational databases using antipatterns such as ID required, entity-attribute-value, index overkill, and so forth (you should read this book without second thought). Some developers go as far as trying to implement a search engine based on the LIKE keyword.

All those are examples of not using the relational model properly. If we want to bypass referential integrity or save everything in one table, then we should not use a relational DB in the first place. Cursing on the performance of a relational DB when it's not properly used is very wrong.

Sunday, May 4, 2014

Being a technical reviewer (again)

A few months ago I experienced (for the first time) how it feels to be part of a technical reviewing team. I reviewed a packt introductory book about Design Patterns.

Today I'm glad to see that another packt book, of which I was for once again one of the technical reviewers, has been published. The book is called Mastering Object-oriented Python. It's a book that focuses on writing OOP code in Python 3.

I would recommend this book to all Python programmers, both beginners and advanced. It covers all aspects of the language (to mention a few: special methods, unit testing, decorators, serialization, etc.) and shows different possible designs, explaining the pros and cons of each design. What I really like is that the code in the book is written in a Pythonic style, and the author makes a good job at explaining how Python differs from Java/C++.

A few warnings: This is a big book (~ 600 pages). You can read the whole thing, but I believe that it will be much more useful as a handbook. Also note that the book assumes familiarity with Python 3 and Design Patterns.

You might wonder why I accepted to do again a technical review. Some people find technical reviewing a waste of time. But I disagree. To become a good programmer, you need to read a lot of code instead of just writing. In fact, programmers tend to read much more existing code than write new code. If reading code is important, reading good code is much more important. And I believe that the code in this book is well written.

Saturday, April 19, 2014

BASH: syntax error near unexpected token `('

After making some portability and readability improvements to shell-utils, I used BASH, sh, and dash to test it. While sh and dash were fine, BASH returned the error:

line 358: syntax error near unexpected token `('
line 358: `ls ()'

That is strange. BASH usually introduces shell portability issues because of the extra features it provides, so I would expect to have a problem with the other shells.

It turns out that BASH did a pretty good job with reporting the source of the error. Note that shell-utils redefines a few everyday commands as functions, to make them more verbose and secure safe (eg. ls becomes ls --color=auto, rm becomes rm -i, etc.). But usually those everyday commands are already defined as aliases in .bashrc. Aliases are evaluated before functions, and defining a function that has the same name as an alias is not allowed. And that's what BASH is trying to tell me in the error message. Commenting all aliases in .bashrc fixed the issue :)

Saturday, March 8, 2014

Joy of Coding 2014 - My impressions

I haven't been to a conference for years, but this year I decided to join Joy of Coding. And I don't regret it!

The conference started with a keynote by Dan North: "Accelerating Agile: hyper-performing without the hype". Dan described what he learnt about Agile while working in the trading domain. The most interesting advice that I kept from his presentation is that being a good programmer is not enough: What really makes a difference is to become a domain expert. For example, if you are working as a stock market programmer, your superior programming skills don't matter if you have no clue about what the numbers mean.

Next, I joined the "Let Me Graph That For You: An Introduction to Neo4j" workshop, by Ian Robinson. The first part of the workshop was an introduction to Graph Databases and Neo4j. At the second part we used Neo4j and its query language Cypher to create a few graphs and query them. I am impressed by how easy is to get started with Neo4j. I find its web interface very intuitive. We had a few questions for Ian (S = Sakis, I = Ian, O = Other conference participant):
  • S: Is there any relation between Neo4j and RDF?
  • I: Not really. In RDF you typically end up with more connections because everything is modelled as a triple. But there are libraries that can extract a Neo4j graph as RDF.
  • S: Are all the common graph algorithms (eg. SPF, BFS, Bellman-Ford, etc.) available?
  • I: Most of the well-known graph algorithms are available, and furthermore there is a Java API that is exposed and can be extended with your own algorithms.
  • S: Is Cypher case-sensitive?
  • I: Partly. The identifiers of a query are case-sensitive, but the rest parts of the query aren't.
  • O: Are there any cases where RDBMS should still be used instead of Graph Databases?
  • I: If you have tabular data and you want to focus on set theory operations (eg. union, intersection, etc.) an RDBMS is preferable.
 I'll definitely look more into Graph Databases and Neo4j.

The next keynote was "Contravariance is the Dual of Covariance", by Erik Meijer. I'm not very familiar with Reactive Programming and Rx, thus I couldn't follow everything. But at least I enjoyed the jokes and the funny examples that Erik used. Using Scala as a reference, he explained the meanings of covariance and contra-variance, and showed how they can be used to create reusable code.

The second workshop that I joined was "An Introduction to Actors and Concurrency", by Michel Rijnders and Matthijs Ooms. The first part of the workshop was basically an introduction to Erlang, thus nothing special if you are familiar with Prolog. The fun started at the second part, where we experienced how straightforward is to communicate asynchronously through the network using Erlang's actor model and message-passing primitives.

Last but not least was the keynote "The Tao, of the Joy, of Coding", by Dick Wall. This was by far the most inspiring keynote. Dick, using ancient Chinese philosophy as a reference, talked about many interesting topics, including programming honesty (saying "I don't understand this" and learning from your colleagues is a good thing), looking back as a programmer (eg. if you used your programming skills to find a cure for a disease you really changed the world), and getting a life (being proud about working until 2 AM is a very bad mentality).

All in all, a great conference that I will keep in my agenda every year!

Thursday, November 28, 2013

My first book review

During the last month(s) I participated in the review process of a book about Design Patterns in Python. I am happy to see that the book has been released. The title of it is Learning Python Design Patterns.

Reviewing a book is definitely not harder than writing one, but that doesn't mean that it's not challenging. The comments must be clear in context, useful, and not offensive. The process requires a considerable amount of time since all comments/remarks/notes must be backed by research and reliable references.

Nevertheless, I enjoyed the whole procedure which thanks to the people of packt publishing (packtpub) was flexible and clear. Plus, I'll receive a free hard-copy of the book that I reviewed and a copy of my favourite packtpub ebook. Not bad!

In the future I hope that I'll have the chance (and the time) to write my own book about a topic that I like.