:: Luca Ferrari ::: programming best practice

Visualizzazione post con etichetta programming best practice. Mostra tutti i post

venerdì 3 marzo 2017

Fluent Interfaces e method-chaining

La prima volta che ho visto una "fluent" interface è stato quando tanti anni fa ho studiato Java NIO.

Oggi la fluent interface è una buona prassi di molti linguaggi di programmazione, e in modo un po' ridicolo qualcosa di non accettato in Java stesso.

L'idea di una fluent interface è abbastanza semplice: i mutators dovrebbero ritornare un riferimento all'oggetto che modificano, così da abilitare una method chain, in sostanza:

MyObject foo = ...;

foo.setTitle( 'My Object' ).setVisible( true ).setDescription( 'Here the description' );

che sicuramente risulta piu' compatto rispetto al classico

MyObject foo = ...;

foo.setTitle( 'My Object' );

foo.setVisible( true );

foo.setDescription( 'Here the description' );

Ora, che si tratti di una versione piu' leggibile o meno è una questione di gusti, sicuramente avere linee troppo lunghe non aumenta la leggibilità, ma considerando che molti linguaggi ammettono la gestione flessibile degli spazi, una via di mezzo fra la method-chain e l'approccio classico diventa

MyObject foo = ...;

foo.setTitle( 'My Object' )

.setVisible( true )

.setDescription( 'Here the description' );

abbstanza compatto e leggibile, grazie ad una buona indentazione.

Ma il vero vantaggio della fluent interface si nota, a mio avviso, nell'uso combinato dei conditionals. Si consideri il seguente esempio:

MyObject foo = ...;

if ( init ) {

foo.setTitle( 'My Object' );

foo.setVisible( true );

foo.setDescription( 'Here the description' );

}

che può agilmente essere riscritto eliminando la necessità di piu' istruzioni e le parentesi graffe:

MyObject foo = ...;

if ( init )

foo.setTitle( 'My Object' );

.setVisible( true );

.setDescription( 'Here the description' );

E questo a me personalmente piace molto!

Ahimé i Java Beans, stupidamente, si rifiutano di accettare la fluent interface e abilitare la method-chain. Infatti la specifica impone che i mutators siano di tipo "void", ossia non ritornino nulla. E i framework aderiscono spesso severamente a questa opzione, non riconoscendo quindi i mutators che possono ritornare un riferimento all'istanza stessa.

venerdì 5 agosto 2016

Coerenza dei nomi: un esempio (cattivo) basato su PHP

La coerenza nel nome delle variabili e delle funzioni è molto importante, me ne rendo conto ogni giorno di piu' e soprattutto sono capace ogni giorno di notare errori anche grossolani nelle implementazioni dei vari linguaggi di programmazione.

Uno degli errori, amio avviso, clamorosi nella implementazione di PHP è l'incoerenza dei nomi delle funzioni empty() e is_null(). Mentre l'ultima finisce ha un mnemonico che si legge intuitivamente in un test booleano, la prima no. Mi capita quindi abbastanza spesso di scrivere del codice come:

if ( is_empty( $foo ) ){ ... }

che ovviamente è sbagliato poichè la funzione è empty(). Da cosa nasce questa incoerenza? E' abbastanza semplice da capire: "null" in PHP è parola chiave e quindi la funzione relativa non poteva chiamarsi semplicemente null(), da qui l'esigenza di aggiungere il prefisso "is_". Ma questo doveva portare a trattare la duale funzione empty() con la stessa regola del nome, anche se "empty" non è (al momento) parola chiave. Dopotutto, come Perl Best Practice insegna, è bene che le condizioni booleane abbiano un nome che inizia con "is" o "has", e sicuramente è piu' facile da leggere "if is empty" che non "if empty".

mercoledì 26 febbraio 2014

Lo style puo' evitare bug?

Sono state spese molte parole sull'incidente di sicurezza della verifica dei certificati Apple in iOS, risolto tempestivamente con un aggiornamento dell'iOS da parte della casa madre (si veda qui e anche qui).

Il bug era, come spesso accade, banale: uno statement di troppo in una catena di "if" che causava l'interruzione della logica di valutazione dei condizionali. In pratica:

if ( condition 1 )
goto fail;
if ( condition 2 )
goto fail;
goto fail; // <<< disaster!!!!!!
if ( condition 3 )
goto fail;

fail:
// something nasty happenend

Si potrebbe dire che era necessario maggiore testing, che si dovrebbero usare le parentesi di blocco per ogni condizionale, che non si dovrebbero usare i goto....ma secondo me la cosa principale che ha generato il bug nel codice in oggetto e' stata la mancanza dell'uso degli "else". Le condizioni in esame erano da considerarsi logicamente correlate, ossia:

if ( condition 1 )
goto fail;
else if ( condition 2 )
goto fail;
goto fail;
else if ( condition 3 )
goto fail;

che avrebbe generato un errore di compilazione per un "else" spoglio. Con questo non voglio dire che il compilatore deve essere usato per trovare gli errori, per quello ci sono i test. Ad ogni modo il bug e' emerso da un semplice problema di code-style e non di logica. Ancora meglio sarebbe stato scrivere:

if ( condition 1
|| condition 2
|| condition 3 )
goto fail;

che avrebbe "logicamente" correlato tutte le condizioni di fallimento.

Questo bug rappresenta un esempio di come un code-style errato possa portare facilmente a errori banali, e si noti ancora una volta che il code style da adottare in questo caso non era tanto l'uso delle parentesi di blocco ma l'unione (con else o in un solo if) delle condizioni da testare.

E si noti anche che non sempre il compilatore e' in grado di segnalare il "code-unreachable": come riportato qui GCC non implementa sempre questa funzione mentre CLANG la tiene disabilitata in default.

mercoledì 3 luglio 2013

Do you like booleans?

Booleans: you love them or you hate them.

Personally, I hate them.

What is wrong with booleans? Nothing, except that they are often used in the wrong way.

A boolean (that is a condition that is either true or false) can be used only to test a single, autonomous condition. The key word here is "autonomous": the boolean can indicate only one truth at a time.

And developers, of course, know the truth!

So you can end up with some code like the following:

void foo( boolean conditionA, boolean conditionB ){

if ( conditionA ){ doA(); }
else if ( conditionB ){ doB(); }
else doPanic();

}

that is working fine until conditionA and conditionB are totally indipendent. That means not only that a value affects the other, but also that there must be no priority between conditionA and conditionB.

To better understand, suppose that there is a new constraint that imposes that whener conditionA is false, conditionB cannot be true too. Therefore the code becomes:

void foo( boolean conditionA, boolean conditionB ){
if ( ! conditionA )
conditionB = false;

if ( conditionA ){ doA(); }
else if ( conditionB ){ doB(); }
else doPanic();

}

Due to the above constraint, there is now a less number of cases:

case 1: conditionA = true and conditionB = true;
case 2: conditionA = false and conditionB = false;
case 3: conditionA = true and conditionB = false.

So there are now three possible cases out of four using two booleans.

Sooner or later, you will miss some check against the possible values and your code, by the way, will result difficult to read. Consider someone who wants to use your API and that has access only to the prototype of the code:

void foo( boolean conditionA, boolean conditionB )

You have to clearly mantain the documentation explaining that the case {false, true} cannot be applied.

So what is the solution?

Instead of having to deal with not-so-boolean booleans, consider using other way of representing a finite, well know, set of possibilities, like enumerations or "old-style" hex variables. They will make your code cleaner and easier to read, and will scale once you have to introduce new constraints.

venerdì 19 ottobre 2012

[Mailing List | Forum]-ize yourself!

This could sound weird from me, having been a fan of mailing lists up to almost one year ago. Nowadays I believe much more in technical forums rather than other collaborative spaces.

First of all allow me to explain the need: you are developing/mantaining a piece of code and you get a problem that cannot resolve, so you need for assistance. I'm not going to discuss here the types of support and assistance you can get by proprietary or open source software, I'm supposing you are alone and without any support contract by anyone.

In the programming space there are different ways of getting supports, mainly:

IRC (Internet Relay Chat)
Mailing Lists
Forums
Social Networks

Since I'm not a fan of social networks, in any forms they could come, I discourage you to use them as a way of getting support. Consider that social networks are very dynamic, high traffic and easy to confuse readers, so it is quite difficult someone will note your help request in the ocean of messages that flow thru the social network.

IRC is an old fashion way of chatting, and is very respected in the Unix and Unix-related environments. The idea is to have a virtual room that hosts people that discuss about a specific subject. One room means one subject, users do not have to register anywhere, can gain access to server quite easily and can access more rooms at once. My experience is that traffic can become quickly really high or you could fly to a room without any (interested) guest, so your question is going to be dropped and forgotten in the room. Moreover, room logs could not be saved, so even if someone helped you the answer could not be stored somewhere and you could be unable to access the answer again. Usually IRC is used to synchronize committers and developers and to make virtual meetings, but getting help thru it is in my opinion very difficult. Moreover you are required to stay connected for a while, so it requires you to have access to the Internet (even if the required bandwidth is not that much). These are the main reasons why I don't like IRC very much.

As already stated, I was a fan of mailing lists. Mailing lists are channels to which users have to register in order to receive mails. Then users can drop an email to this channel (that is nothing more than a special e-mail address) and all other users registered to such channel will get the mail into their inbox. It is easy to use, it is safe (email travel as pure text), you are sending a message to anyone who is registered (so chances are someone will read your message) and have to just wait for a reply. Moreover, configuring a right set of filter will allow you to manage a ton of e-mail easily. Last but not least you are getting a reply from a valid user behin a valid e-mail address, so you can contact such user directly. This is wrong since you should try to keep the knowledge on the mailing list, so that it is shared and is not something private, but gives you the chance to directly connect to someone else.
Forums are something very popular, that I start used when working behind a stupid firewall blocking my mailing list accounts. Forums can be very good if they provide very technical and constrained topics and if the moderators are doing their job. Moreover, I found that forums that gives you badges as indicators of how much/how often/with which quality you do the forum are attracting developers and committers, that usually are proud of their public image compared to other people. Another aspect of forums is that they tend to be less traffic bloomed than mailing lists, and therefore are the good place for mid-to-advanced users.
To recap, I suggest you to use forums and mailing lists as often as possible and in the correct way: please specify always the system and properties of the context you are working on, the problem you have, the solution you have tried, and any other detail that can help people solving your problem. Remember that on any channel people are volunteers that are donating their free time to you. The best you can do, is to make sure you are not wasting their time!

sabato 26 maggio 2012

Do not merge the singleton pattern with the factory one!

Do you remember the programming school and what the teacher was repeating any lesson? "Using global state is a bad programming technique!" Singletons are nothing more than doing global state, even if in a more elegant way.

With the advent of many Java framework the singleton pattern was made so popular that pretty much every developer now know it. For those that still have doubts, a singleton is an object that exists in a "single" instance along the whole application/framework. As an example, many GUI libraries provide a "Display" class, which is a singleton (the library needs a single display to draw onto); other examples could be a shared resource or service. The idea is simple: does the program require access to always the same object? If yes is the answer, chances are a singleton will be used.

How is implemented a singleton? Often it is implemented using a factory pattern like in the following snippet:

public class Display{
       // make constructor private
       // so that nobody can build
       // an instance autonomously
       private Display(){ ... }

       // a static reference to the
       // ONLY available instance
       private Display mySelf = new Display();

       // use a factory method to get
       // an instance of the Display
       public final static Display getInstance(){
              return mySelf;
       }

}

The above example is kept simple just to show what and how a singleton is implemented; it is possible to make very complex and elegant implementations as well as to port the above code in any other programming language.

What is wrong with the above piece of code? That two patterns are involved: the singleton one and the factory one.

Now, singleton is bad, factory is good.

To explain the latter statement, allow me to introduce a piece of client code that is going to use the above Display class:

void showPopupWindow(){
Display.getInstance().showPopupDialog( "Hello World" );
}

What is strange with the above code? The method has no parameters, and is not accessing any private instance of the class. Let's see from another perspective: the client code knows where to find and how to get the Display instance, and therefore it is going to obtain it. This is a very strong coupling, and make code hard to read and test. Why? Let's start from the coupling first: the client code is going to access directly the Display class, and therefore there is no way to "inject" another Display instance to use. The two classes are therefore strictly coupling. This leads to the hard-test problem: how are you supposed to test the showPopupWindow() method if you cannot inject a mock-up Display object? The only way is to uninstall the real Display and use a fake one. It is not so simple. Finally, the hard-to-read problem: imagine you have access to the Javadoc documentation, that is something like:

     /**
       * Shows a popup window on the user display.
       */
     void showPopupWindow();

You don't see what the code does in its implementation, so you have to guess that it is going to access the Display class to get an instance. Please note the use of the word "guess".

How is it possible to solve such problems? Let's start from the client code: adopt dependincy injection as follows

void showPopupWindow( Display display ){
display.showPopupDialog( "Hello World" );
}

or use an internal variable to keep an injected instance of Display and access it. Then refactor the Display code to use a separate factory:

public class Display(){
       public Display(){ ... }
}

public class DisplayFactory{
       private Display instance = new Display();

       public static final Display getDisplayInstance(){
              return instance;
       }
}

Note that the instance is now cached into the factory, and that the Display class is pretty normal now. You can use some fancy protection mechanism, like making the Display's constructor package-visible and not public, but you get the idea. Now, no matter how you implement the Display class, the factory pattern is implemented the right way and the coupling among classes is strongly reduced. Moreover, you can also distribute the Display in binary form letting the clients to reimplement the factory pattern, while in the former case you could not. This problem lies in the fact that the first implementation of the Display class was playing a double role: the singleton and the factory for itself. This in particular is very common in many programs (and I did also!) but it is wrong. Note I'm not saying "it could be" wrong. A class must be thought as a unit with as less aims as possible, in particular with a single aim. So the right way to go is using a class as a singleton and one as a factory. This requires much more code and a few effort for initial setup but will pay in the long term.

martedì 6 marzo 2012

Use assertions!

You use assertions, don't you?
Today almost every language/framework/library provides developers with an assertion-like set of tools. Use them!

The point is not that assertions are good for testing critical conditions, it is that they help documenting the code. Allow me to explain with an example:

private String doSomething( String input ){

     assert( input != null );

     return ; 

}

What can you see here? That the input object cannot be null. Trivial. But there's more: developers are working here assuming someone has called doSomething with a null object argument, that is having input null is not only bad, it is a catastrophe! So if you are a doSomething client you are supposed to provide the right value for input arguments.

Now, please remember that assertions are totally different from exceptions. The above method could have been written as:

private String doSomething( String input ){

     if( input == null )

         throw new Error();

     return ;  

}

This is totally different: in the former method the developers clearly stated that they do not want to deal with a null input, it's on you to provide a not-null input object. In the above example instead developers are somehow dealing with null values, even if they are letting you to know that you did something wrong.

So when to use assertions and when to not use them? Assertions should be used for all the internal stuff, e.g., private methods, so things that should be called when input is not tainted. On the other hand, if you think you are going to get bad input, deal with it!

And remember: assertions are going to vanish when you are in release mode, so do not place anything more complex than a boolean test in them. For instance:

private void doSomethingElse(){

    assert( loadDataFromDatabase() != true );

...

}

This is ugly wrong! Once the assertion is disabled, chances are (depending on the assertion mechanism of the language/framework/library), that the call to loadDataFromDatabase disappears too, and with it all your loading-data!

So please use assertions, and use against single variables!

mercoledì 25 gennaio 2012

Avoid downcast!

Who has never written a piece of code like the following (pseudo Java)?

if( myObject instanceof BASE )

  ((BASE) myObject).doBaseBehavior();

else if( myObject instanceof DERIVED1 )

  ((DERIVED1) myObject).doDerivedBehavior1();

else if( myObject instanceof DERIVED2 )

  ((DERIVED2) myObject).doDerivedBehavior2();

It is called downcast and is the opposite of the Liskov Substitution Principle. The idea is that, at run-time, you have to invoke a specialized behavior through a generic base class. Now, read the sentence again and emphasize the words "specialized", "through" and "generic". Note the order of such words: it is not correct! While it is true that a specialized implementation should offer a generic behavior, this does not mean that you can change ordering to the words and obtain something that can work. "Hey, Java offers instanceof operator and C++ offers dynamic_cast" - I hear you screaming that. And so what? Do you think that just because there is such an operator in your language you should use in such a way? Not at all!
What happens when you are going to add a new derived subclass? You have to add another branch to your selection statement. That is a really bad looking piece of code, and trust me, sooner or later, you will forget to add such a branch and find your code is buggy.

I've seen this code more and more in every kind of project, and it seems to be nested also in libraries, especially commercial ones (at least this is my experience). You have to avoid this kind of programming. Let consider again your problem and create more abstractions, since what you are really expressing with the above code is that a few instances are pretty the same except for a single behavior. So they are not the same, do not manage them as they could be considered the same, or generalize such behavior so that all instances have the same common interface or prototype and are free to implement it as they need. So that instead of downcasting an instance you are simply casting it at one of its generic interfaces:

if( myObject instanceof MyPrototype )

  ((MyPrototype) myObject).doBehavior();

Interestingly my experience is that this kind of programming template (I would not call it a pattern!) arises from a bad comprehension of the factory pattern: you have a factory that creates different instances of the same interface, but since your interface lacks of specialization and you don't have the control on which implementation is going to be created, you try to downcast it. Again, this is not the solution, or at least, it is a solution that could work in the short term, but that you need to refactor to have robust code.

Finally, please consider that while doing casting the language operators are usually smart enough to inform you about an error in the cast. For instance, the dynamic_cast operator returns 0 if the cast was impossible. Similarly the instanceof operator returns false if the object is null. Take advantage of this information to write better quality code!

sabato 21 gennaio 2012

Use late declaration

While developing code you will have to declare one or more variables before using it. If the language allows you to do so, declare them only when you are going to use! There are languages as C that require you to declare a variable at the begin of a code block, that could be far from when it is firstly used. There are other languages that allow you to declare a variable exactly where you are going to use it (Java, C++, Perl,...), and if possible you should get used to this habit. To explain why let us consider the following piece of pseudo Java code:

AuthToken login( String username, String password ) throws AuthException{
AuthToken token;

if( username == null || username.isEmpty() )
throw new AuthException();
else if( password == null || password.isEmpty() )
throw new AuthException();
else{
// validate login and return a new AuthToken
return token;
}
}

The code is quite simple: the method receives two parameters as input, check them againsta some rules and then wrap them into another object called AuthToken that is then returned. What is wrong with such piece of code? Nothing. But that is because it is Java. Now imagine the same code expressed in C++ language. Can you see what is wrong? The AuthToken object is build immediatly as it is declared and then there are two paths that will not use it (the two path that throw an exception). In such case the object will be wasted.

There is something more beside avoiding a memory waste: placing the variable exactly when you are going to use it makes it simpler to comment out a whole piece of code. In the above example, if you refactor the method to return a void, you have to comment out the return statement and also the variable declaration, and the two instructions are quite far. Now, while commenting out the declaration will mark the return statement as not valid, doing the vice versa will not and could lead to a situation where the code is correct even if the variable is still there.

Finally, declaring the variable where it is going to be used makes it simpler to refactor the code introducing new code blocks, and therefore new scope contexts.

mercoledì 28 settembre 2011

Do not be public!

This is a well known rule of the OOP: encapsulate!
What this really means is that, as in all computer science fields, you should start giving no-rights and then adding a few allowance. In other words, each property you declare in a class should be private, each method should be final/const and so on. Of course I'm excluding the struct-alike objects from this paradigm.
Why be so reluctant to use protected? Well, you will be able to give up your rights later, and to convert your private field to protected, or even to public. You will be able to give others the capability to overload your methods, but if you don't have a real need to allow them to, don't!
Consider the following example:

    public class Foo{
         protected int counter = 0;

        public void sendEmail(){ // do stuff }

        public Foo(){
               counter = 10;
               sendEmail();
        }
    }

What is wrong with this kind of code? A lot of things... First of all the counter variable can be accessed directly from Foo subclasses, and this could not be what you want. Imagine you want to assure that all instances of Foo (and of its subclasses) have a counter that is initialized to 10, how can you impose this? You have to declare counter as private and provide no setter method for it. Leaving the field protected is a call for troubles.
A worst error in the above code is the sendEmail method signature, which is not final, and therefore can be overriden. Why is this wrong? Because the constructor of Foo will call a polymorphic method, and trust me, this can lead you to endless debugging sessions!
Summarizing, I can say that you should declare everything private, and when it is not private, declare at least final, and only when you are sure of what are you doing, allow direct access to methods/fields.
Now consider what happens with C++ with regard to methods: each method declared not virtual is automatically declared as final in Java. In other words, C++ does it right: it gives you the stricter behaviour on methods. I agree with you when you say that having to write "virtual" for each method is an extra-typing that can be avoided, but it is the only way to avoid awkward errors and is a good way to prevent you to release an API that is broken.

domenica 25 settembre 2011

Do not be afraid of using structs for you (internal) implementations!

With the advent of OOP languages, developers seem to have forgotten that not everything need to be a fully implemented class. In a typical program you have bunch of data that must be collected together, but that not always needs to expose all the OOP features such as encapsulation (i.e., having hidden properties), accessors (i.e., getters and setters) and a polymorphic behaviour. Sometimes you need only to store a few fields together, for internal uses and just for your own convenience. A typical example is when you deal with external resources, for instance something that comes from an underlying software/library. In such cases, it is probably you will not be able to change the data that is passed back to you, but you have only to read and take actions depending on its value. In this scenario, it does not make any sense to build a fully object with a lot of getters/setters/constructors/destructors, a simple struct-like packaging will suffice.
Another example is when you have to deal with a few annotations at the same time; instead of accessing each annotation to get each value, you can pack values into a struct-like object and access such fields directly. Since you are not supposed to change an annotation at run-time, creating setters does not make sense, and therefore creating getters does not make sense either. So just skip them and use a struct!
I know that your OOP-soul is screaming about the above, but trust me, each method you add requires a new stack allocation (if not inlined, of course) and does not provide you with much more control than you require. After all, consider the following simple example (Java code):

class Person{
       private String name;
       private String surname;

       public String getName(){ return name; }
       public void   setName(String n){ name = n; }

       public String getSurname(){ return surname; }
       public void   setSurname(String n){ surnname = n; }
}

How often have you changed the accessor methods? I guess that for almost the 95% of you beans you don't have to overload the generic logic of getters and setters. In such a scenario, having the following is much simpler to write and to use at run-time:

class Person{
       public String name;
       public String surname;
}

There is of course a problem: if you need to change your accessing rules, you will not be able without breaking compatibility. In other words, once you expose a struct public property, you will never get it back!
However, as you can see, the class is not public, that in Java it means that it cannot be accessed outside the current package. In other words, the class does not represent a public API, but a private one, and you are always able to change it depending on your wills without even having clients to notice it.

There are also other cases when using struct-like objects is allowed, and an example is when your OOP API is an adaptor of an external API. An example can be found into the SWT Java GUI API, that being a kind of reimplementation of the Operating System API, it make a deep use of structures because they maps really well into the Operating System data structures.

In conclusion, the main idea of this article is that structs are good, but must be used with caution to avoid to make public something that could requires soon or later a different control path. But if you are sure you are using objects for your internal implementation and/or to map other languages/libraries structs, use them without being scared of!

venerdì 23 settembre 2011

Avoid the use of this, super and in general class qualifiers when not stricly required!

This can sound odd from me, that since my early days of OOP programming have always advocated the use of such qualifiers. The main reason I liked such qualifiers, and the main reason I presented to my students when teaching them OOP, was that the resulting code would be easier to read and, most notably, a lot of IDEs will bring up popups and code helpers to show you available completions. In other words, simply typing "this" will activate a popup that presents you with a list of choices between methods and fields to place in.
So what did change my mind? Why am I now advocating to avoid the use of qualifiers? Well, it is really simple: I switched back to some C code, and found myself comfortable while writing "unqualified" code. So I thought that in order to keep code clean and to get a more coherent syntax, avoiding qualifiers can help.
As an example consider the following piece of (imaginary) Java code:

   this.description = this.computeANewDescription();
   this.setTimeAndDate( super.getTodayDate() );
   this.setStatus( Status.STATUS_OK );

Now the above code can be rewritten in the following way, assuming you also use static imports:

   description = computeANewDescription();
   setTimeAndDate( getTodayDate() );
   setStatus( STATUS_OK );

Isn't it really much like a C code snippet? Apart keeping your fingers relaxed, due to not extra-typing keywords, the code is much more clean and can be read by any C developer.

mercoledì 21 settembre 2011

Avoid naming variables with the name of their type

Often you can see code where variables are named after their type, such as for instance:

      CrudDAO crudDAO = new CrudDAO();
      String string = "Hello";

and so on, you get the idea. While this makes sense if you are writing an "Hello World" program, it does not scale when you have complex and long listing sources. In particular it becomes harder to refactor such code, because if you want to change such variable types, you have to change their names too. While this can be performed by an automatic refactoring tool, it almost always requires a manual intervention. Therefore, choose mnemonic names that have a correct meaning regardless of the specific type they belongs too, such as:

      CrudDAO dao      = new CrudDAO();
      String message = "Hello";

In the above code, it becomes very easy to change the type of the "dao" variable to another instance letting it representing what it means: a Data Access Object. Similarly, if the "message" variable is changed to a Char array, you don't have to change its name to reflect such change, since the name is much more meaningful than a simple "string".

martedì 20 settembre 2011

The Java Beans specification sucks!

Ok, the title is quite harsh, but it is so true...
The Java Beans specification imposes that each object property (i.e., variable) is accessed thru a couple of setter and getter method. In particular the method must be named with set/get prefix (lowercase) followed by the name of the property (with the first letter capitalized and the remaining as the original name). The return type and the arguments depends on the type of the method (often called accessor): in the case of a getter there are no arguments and the return type is of the same type of the property; in the case of the setter there is no return type and there is a single argument of the same type of the property.
As an example, the following is a correct Java Bean class definition:

class Person{
      private String name;
      private String surname;

      public void setName( String newName ){ name = newName; }
      public String getName(){ return name; }
      public void setSurname( String newSurname ){ surname = newSurname; }
      public String getSurname(){ return surname; }
}

As you can see the "name" property generates the "setName" and "getName" methods, as well as the "surname" property generates the "setSurname" and "getSurname" methods. All the getXX methods have a return value of the same type of the property and accept no arguments; all the setXX methods do not return any value and accept a single argument of the same type of the property they refer to.

Why I don't like this naming convention?

Let's start with the naming schema: while it is clear that I have to separate a getter name from the setter one, why should I have the get prefix? I mean, if I want to access the "name" property, that is exactly what I want to type. What is the simpler and beautiful code between the following?

      person.getName();     // Java Bean style
      person.name();        // Qt style

The latter does exactly what I want: I want to access the name property, so I don't want to explicity say "get the name property", but simply "the name property". Easier to type, simpler to read: I'm accessing the person name, I don't need to emphasize that I wan to "get" the person name because it is implicit that I am asking the person's name!
On the other hand the setter can remain the same, so that my bean becomes:

class Person{
      private String name;
      private String surname;

      public void setName( String newName ){ name = newName; }
      public String name(){ return name; }
      public void setSurname( String newSurname ){ surname = newSurname; }
      public String surname(){ return surname; }
}

A drawback of this approach is that using a standard prefix I have all the method grouped together: all the getXX and all the setXX. Removing the get prefix I find only the setters grouped together, while the other methods are sorted as the property names. This can be boring only if you are used to inspect a bean by means of reading the getter method names and not by reading its documentation.
Moreover, it is not clear how to behave when you have boolean properties: should you use "is" or "has" prefixes in the method names? A lot of framework seems to deal with the "is" convention, but I'm sure that a getter method called "isMoney" to return true in the case has still money sounds awkward even to the stricter Sun's engineer.

Anyway, another big, even huge problem of the Java Bean specification is that it does not consider side-effects. How can you test a setter method? You have to create a getter method! While this can sound ok to you, it does not look very good to me. When you execute a method call like:

    person.setName( "Luca" );

how can you assure that the name has really been set to "Luca"? You have to call getName() and test the two values. While setter and getters are usually simple and straightforward methods (those that can even be inlined), a lot of things can go wrong while in the execution path of one or the other. What is the solution? To have a setter that can return the actual value of the property, so that it is defined as returning the same type of its parameter. In other words:

class Person{
      ...
      public String setName( String newName ){

name = newName; return name;

}
public String setSurname( String newSurname ){

surname = newSurname; return surname;

}
}

In this way the setter methods represent a single unit of testing, therefore can be tested as they are without requiring anything else (in particular a gettere method). Moreover in this way the classes are exposing to the outside world a clear side effect of the setter methods.

In conclusion I disagree with the Java Bean specification mainly for:
- their naming convention;
- their definition of setter method.
The worst thing in the whole story is that you cannot even think to change the Java Bean convention to use a different name schema or even to change the return type of the setter methods. This is due to the fact that a lot of Java frameworks adopt reflection to inspect and access properties, and while doing so they will stupidly search for setter methods that return void!

sabato 17 settembre 2011

Return zero on success!

Imagine you have a method that will return a state marked as success or failure, something that can be easily accomplished by returning true or false if the system supports booleans natively. Now imagine that you programming language does not support booleans, so you have to return another kind of value. What value should you return?
The first consideration is about the type of value: integer or string? While strings are easier to debug (they can easily be printed in the logs or in the console), integers are the best choice since they are easier to compare and require less memory than a string.
Ok, so you hav decided to return an integer; which value should represent the "success"? Well, for a strange convention, in programming languages like C each value different from zero represents a true condition, therefore success, while zero represents a false condition, so failure. In other words you method could look like:

int createFile( char* name ){
    // do stuff here
    if( ok )
      return 1;    // success
    else
      return 0;    // failure
}

and you method could be used like the following:

if( createFile( "/tmp/blog.txt" ) == 0 )
    perror("Error! Failure!");
There are few problems with this kind of approach. The first is a semantic problem: you usually have only one success condition, and could have different failures. In the above example the success is reached when the file is created, and the failures could range from a permission problem to a disk space limitation. So you have a single success condition and several causes of failures. But in the above code only one failure is available (return value 0) and several success conditions are possibile (each value different from 0), so the semantic is flipped!
This is also the cause why a lot of system calls and library methods exploit a global variable, ERRNO, to set a much detailed error code in case of failure. In other words, in case of failure (0) another variable contains the detail of the failure itself. While this can be beauty and elegant from a syntactic point of view, requires a double effort in writing and testing the code.
There is also another, technical, problem with the above approach: while 0 is represented the same on all machines of all architectures, values different from zero can be represented differently (little endian, big endian?). This is really important when dealing with very low level code, such as those of the operating systems (system calls especially).
For this reason, it is better to return a zero value on success, so that the above method can be rewritten as:

int createFile( char* name ){
    // do stuff here
    if( ok )
      return 0;    // success
    else
      return error; // can value 1 for permission problem, 2 for disk space, ...
}

and the code that uses the method call becomes:

if( createFile( "/tmp/blog.txt" ) != 0 )
    perror("Error! Failure!");

So there is not any extra-typing in the caller code, but there is in the callee, since you have to keep a very good documentation in order to explain any return value which meaning has (when not zero, of course). Note also that the use of a global variable, like ERRNO, become obsolete with this approach, since the method can immediatly return a value that describes the exact cause of failure.

As a sidenote, consider that while C takes zero as failure, Bourne Shell and derivates consider zero as success. This is awkward since shells are much higher level interpreters than the C language!

venerdì 16 settembre 2011

Avoid Booleans!

Booleans are a tempting thing introduced with modern languages, and that have been always emulated in ancient languages. After all, who haven't written at least once a couple of C macros like the followings?

#define TRUE 1
#define FALSE 0

A lot of libraries, including GNU, defines them too!
The problem with booleans is that they can easily become a source of mess, making the code a lot less readable and understandable. Of course booleans are easy to understand when they are used as conditionals, such as in if, while, do-while, and so on, but they can hide a lot of behaviour when they are placed as arguments in method calls. To start with a simple example, consider the following method call:

createNewFileOverwriting( "/tmp/blog.txt", true );

What does the method do? Well, we can expect that the method will create a new file overwriting it if the file already exists. Only a good documentation can say what that "true" really means. But what if the method call was:

createNewFile( "/tmp/blog.txt", true );

Less clear to understand, huh? Here the "true" could mean "overwrite the file if it exists", or "create the directory if it does not exists", or something else. There are also worst examples:

repaint( false );

What does it mean? That it should not execute repaint? Most probably that it should not repaint immediatly, but as soon as possible. In this case a more comprensive method name would help:

repaintImmediatly( false );

The above are just simple examples, but libraries are full of such conditions. How can the above code be improved? Simply by creating constants with meaningful names, substituting booleans with enumerations. Now read the code that follows and consider if it is clearer to understand that the above examples:

   repaint( REPAINT_IMMEDIATLY );   // corresponds to repaint( true );
   repaint( REPAINT_LAZY );            // corresponds to repaint( false );
   createNewFile( "/tmp/blog.txt",

CREATE_DIRECTORY_IF_NOT_EXISTS | OVERWRITE_FILE_IF_EXISTS );

So with a little extra typing you can produce code that is really much clear to read and to understand, and therefore to mantain.

fluca's programming best practices

I'm starting a new set of articles about my programming experiences and what I believe are programming best practices. You may like them or not, or just find them quite useful for you.