sabato 20 maggio 2017

PostgreSQL abbandonerà il supporto al download FTP

Dal giorno 15 Agosto 2017 non sarà piu' possibile scaricare PostgreSQL tramite FTP!
Come spiegato nella mailing list pgsql-announce, visto il basso traffico
del protocollo FTP, nonché la vetustà del protocollo e dei relativi programmi,
il team che mantiene l'infrastruttura ha deciso di spegnere l'accesso FTP.
Questo non dovrebbe risultare in un particolare disservizio per gli utenti finali, quanto
forse solo per alcune applicazioni e script automatici.

Noisli: un servizio di suoni per aumentare la concentrazione al lavoro

E' molto importante rimanere concentrati durante lo svolgimento del proprio lavoro, penso sia un concetto noto a tutti.
Io ho la fortuna di svolgere un lavoro ad una scrivania, senza pubblico, quindi per estraniarmi dall'ambiente circostante utilizzo
la musica e, raramente, la radio. Il mio cervello aggancia la melodia e non si fa distrarre da altre cose e rumori, lasciandomi
quindi proseguire nel mio lavoro. Tuttavia non indico questa come possibile strategia di studio, che invece richiede maggior silenzio
e in generale sonorità che il cervello non è in grado di "comprendere" razionalmente (es. parole).

Ecco, ho scoperto che esiste un servizio Noisli che consente di comporre un suono ambientale che possa aumentare la produttività
individuale. Ad esempio mescolando il suono della pioggia battente con quello di un caminetto. E c'è anche chi ha pensato
di integrare il servizio dentro a Gnome.

venerdì 19 maggio 2017

Arabic to Roman number converter in Perl 5

Puzzled by a post on the Planet KDE about GCompris and roman numbers, and needing an easy way to explain to my six-years old son about roman numbers, I thought it would be an easy task to make a simple program in Perl to convert an arabic number into its roman notation.
Not so easy, pal!
Well, it's not a problem about Perl, of course, rather I found it required a quite brainpower for me to write down rules to convert numbers, and I did not search for the web for a copy-and-paste alghoritm. Please note: if you need a rock-solid way to handle conversions, have a look at CPAN that is full of modules for this particular aim.
Here I'm going to discuss the solution I found and how I implemented it. It is not supposed to be the best one, or the faster one, it's just my solution from scratch.

The program

I split the problem of converting an arabic number into a roman one into three steps, with one dedicated subroutine for each step, so that the main loop reduces to something like the following:
say "$_ = " . $roman_string->( $reassemble->( $disassemble->( $_ ) ) )
              for ( 1..30 );
that produces the following output:
1 = I
2 = II
3 = III
4 = IV
5 = V
6 = VI
7 = VII
8 = VIII
9 = IX
10 = X
11 = XI
12 = XII
13 = XIII
14 = XIV
15 = XV
16 = XVI
17 = XVII
18 = XVIII
19 = XIX
20 = XX
21 = XXI
22 = XXII
23 = XXIII
24 = XXIV
25 = XXV
26 = XXVI
27 = XXVII
28 = XXVIII
29 = XXIX
30 = XXX
The steps must be read from the inner subroutine to the outer, of course, and therefore we have:
  • disassemble that translates an arabic number into roman basis, that is computes how many units, tens, hundreds and thousands are required. In this phase there is no application of roman rules, so numbers are decomposed into a linear string of letters. As an example the number 4 is translated into IIII, which is of course a non-existent roman number.
  • reassemble applies roman rules, in particular promoting numbers so that groups are translated, when needed, into higher order letters. For instance IIII is promoted into two groups: I and V.
  • roman_string compose the promoted groups into the final string. The main difficulty of this part is to understand when a letter has to be placed on the right (addition) or on the left (subtraction) of another letter. For instance, having the groups I and V the function must understand if the output have to be VI (6) or IV (4).
To speed up the writing of the code, I placed main roman letters and their correspondance with arabic numbers into a global hash:
my $roman = {
    1    => 'I',
    5    => 'V',
    10   => 'X',
    50   => 'L',
    100  => 'C',
    500  => 'D',
    1000 => 'M',
};
Each method references $roman when needing to convert from an arabic number to its roman letter. In order to allow method to cooperate together, they accept and return an hash keyed by a roman letter and the number of occurences such letter must appear in the final string. The following is an example of the hash for a few numbers:
# 4 (IV)
{ 'I' => 1, 'V' => 1 }
# 19 (XIX)
{ 'I' => 1, 'X' => 2 }
# 5 (V)
{ 'V' => 1 }
# 17 (XVII)
{ 'X' => 1, 'V' => 1, 'I' => 2 }

The disassemble function

The following is the code for the disassemble function, that accepts as only input the arabic number.
# Accepts the arabic number and provides an hash
# keyed by each letter, with the value of how many times
# such letter should be summed in order to obtain the
# starting number.
my $disassemble = sub{
    my ( $number ) = @_;
    my $items = {};

    # sort the keys, that are arabic thresolds, from
    # the greater to the smaller one
    for my $current_value ( sort { $b <=> $a } keys $roman->%* ){

        my $how_many = int( $number / $current_value );
        next unless ( $how_many );

        my $letter = $roman->%{ $current_value };
        $items->{ $letter } = $how_many;

        $number -= $current_value * $how_many;
    }

    return $items;
};
The first thing the method does it to create the hash $items that is what it will return to allow other methods to consume. Each key of the $roman hash is passed ordered by the bigger to the smaller (please note that sort has $b first!). In this way we can surely scompose the number from the thousands, hundreds, tens, and units in this exact order. The $how_many variable contains the integer part of each letter. For example the number 29 is processed as follows:
  1. 29 / 10 that drives $how_many to be 2 and the remaining to be a 9;
  2. 9 / 5 that makes $how_many to be 1 and the remaining to be a 4;
  3. 4 / 1 that makes $how_many to be 4 and there's nothing more to do.
At each step the roman letter and the $how_many value is inserted into the $items has, that in the above ecample becomes:
# 29 (XIX)
{ 'X' => 2,
  'V' => 1,
  'I' => 4
}

The reassemble method

The reassemble method takes as input the hash produced by disassemble and checks if any letter requires a promotion. Here it is the code:
# Accepts an hash with keys the letters and values the number
# of times each letter should appear.
# Traverse the hash from the smaller to the greater
# in order to "promote" smaller aggregates. For instance
# 'IIII' (4) is aggregated and therefore the hash is modified
# so there's only an 'I' and another 'V', in such case
# the quantity of the promoted letter is negative to indicate
# it has been promoted.
my $reassemble = sub{
    my ( $items ) = @_;
    my @sorted_thresolds = sort { $a <=> $b } keys $roman->%*;

    for ( my $i = 0; $i < @sorted_thresolds; $i++ ){
        my $current_value = $sorted_thresolds[ $i ];
        my $key      = $roman->%{ $current_value };
        my $how_many = $items->%{ $key };

        next unless ( $how_many );

        my $greater_value = ( $i + 1 > @sorted_thresolds ? 1000 : $sorted_thresolds[ $i + 1 ] );
        my $greater_key   = $roman->%{ $greater_value };


        my $need_to_promote = $how_many == 4
            || ( $greater_value / $current_value == $how_many );

        if ( $need_to_promote ){
            $items->{ $greater_key }++;
            $how_many = $greater_value - $how_many * $current_value;
            $items->{ $key } = $how_many * -1;

        }

    }

    return $items;
};
The promotion must be done from the smaller letter to the greater one, so this time the letters are walked in ascending order (i.e., sort has $a first!). Since to promote a letter I need to access the following one, I need a C-style for loop.
A letter requires to be promoted if its quantity is 4 or /it is 2 and the right bigger value is exactly the double of the current one~, that is while ( $greater_value / $current_value == $how_many ). This makes, for instance IIII to be promoted (the quantity is 4), and VV to be promoted into X (because the quantity is 2 and the X is exactly the double of V). The promotion manipulates the hash increasing by one the right bigger letter and leaving a single current letter. In order to flag the promoted letter, I decided to use a negative quantity (where the absolute value is the exact one).
So for instance, the 29 hash of the previous paragraph is passed as follows:
# input to the method
{ 'X' => 2,
  'V' => 1,
  'I' => 4
}


# first for step (I)
{ 'X' => 2,
  'V' => 2,
  'I' => -1  # promoted, keep 1 and increase 'V'
}

# second step (V)
{ 'X' => 3,
  'V' => 0,  # promoted, increase X by one
  'I' => -1
}
At the end of method we know the final string will be made by three X and one I, the point now is to understand how to render them in the correct order. This is the aim of the roman_string method.

The roman_string method

The method accepts the normalized hash (i.e., groups are already formed) and compose the final string placing letter on the left or the right of each other depending on their quantity. The following is the code of the method:
# Do the hard work of composing
# each group of letters in order to compose the roman string.
my $roman_string = sub {
    my ( $items ) = @_;
    my @chars;

    for my $current_value ( sort { $b <=> $a } keys $roman->%* ){
        my $letter   = $roman->%{ $current_value };
        my $how_many = $items->%{ $letter };

        next unless ( $how_many );


        if ( $how_many > 0 ){
            push @chars, $letter for ( 1 .. $how_many );
        }
        else{
            # this is promoted, so it has to be inserted as last-to-last
            # in the previous chain
            # example: @chars( X, X ) and here I've 'I' to make XIX (19)
            push @chars, ( $letter, pop @chars );
        }

    }

    return join "", @chars;
};
In order to be able to manipulate easily the final string, moving letters from left to right and vice-versa, I decided to place each single letter into the @chars array, that is then join -ed into a single string.
Let's suppose we need just to add letters: in this case we need to write letters from the greater to the smaller from left to right, and this is the order I traverse the letters of $roman (again, note that sort has $b first!). If the quantity of the letter is positive the letter has not been promoted and therefore it will not be placed to the left of another letter, so just insert into @chars the $letter for the $how_many quantity. On the other hand, if $how_many is negative, the letter has been promoted and therefore have to be printed on the left of the last printed letter. This is as easy as doing:
push @chars, ( $letter, pop @chars );
that inserts into @chars the $letter and the previous last character that has been removed via pop.
With regards to the previous example of 29 we have that:
# method input
{ 'X' => 3,
  'I' => -1
}

# first step: prints X
# with quantity 3 (not promoted)
@chars = ( 'X', 'X', 'X' );

# second step: prints I
# that has been promoted
# and must be inserted ascending
# as last-to-last
@chars = ( 'X', 'X' ,
           ( 'I', # $letter
             'X'  # pop @chars
         ) );

Conclusions

Well, it has been much code that I expected to write. Using an object notation, instead of plain hashes, could surely make the program more robust. I'm pretty sure there's a way to shrink the code down and to avoid that ugly C-style for loop, as well as the promotion part could be simplified keeping in mind that it often reduces to -1 for the current letter and +1 for the greater one. Anyway, it does what I need and seems correct!

PostgreSQL 10 beta 1!

Ci siamo!
PostgreSQL 10 fa finalmente capolino nel mondo con il rilascio, ieri, della prima beta release.
Il download comprende pacchetti binari per le maggiori distribuzioni, oltre ovviamente alla possibilità
di compilare i sorgenti, anch'essi scaricabili come archivi.

sabato 13 maggio 2017

KDE e cgit

Il progetto KDE utilizza cgit come frontend web per i repository git.
Il progetto cgit fornisce un accesso web super fast ai repository git, con
alcune funzione di utilità specifiche come il riconoscimento dei repository on-the-fly.
Sicuramente rappresenta una valida alternativa alle molteplici interfacce web
già presenti sul mercato.

venerdì 12 maggio 2017

Postfix dereference operator

Starting from Perl 5.20 it is allowed to use a postfix dereference notation, first as an explicit feature, and since Perl 5.24 it is enabled by default.
The postfix dereference notation allows you to use the arrow operator -> on a reference (as often used when combined with references) specifying the type of the deferencing you want to. Types are indicated by their well know sigils, so for instance a $ means a scalar, a @ an array, & is for a subroutine, % for an hash and * for a typeglob. The sigil on the right of the arrow operator is enhanced by a star, so it really becomes type and *, as in $*. In the special case of arrays and hashes the star can be dropped in favor of a slicing operator, e.g., the square brackets.
The following is a sample program that prints out the values of an array using full access (->@*) and slicing (->@[]):

#!env perl
use v5.24;

my @array = ( 'a' .. 'z' );
my $ref   = \@array;
say "Here comes the array @{ $ref }";
say "And post dereferences is $ref->@*";

say "Here comes a slice @{ $ref }[ 0 .. 5 ]";
say "And using it as post dereference is $ref->@[ 0 .. 5 ]";
 
 
As you can see, $ref->@* is the equivalent of @{ $ref }, while $ref->@[] is the same as @{ $ref }[]. The same applies to hash references.
Code and subroutine references are a little more controversial, at least to my understanding of references. First of all there is only the star operator, so ->&*, and the behavior is such that $ref->&* is the same as &{ $ref }. This makes me think there is no short way of passing method arguments thru a postfix deference, as the following code demonstrates:

#!env perl
use v5.24;

my $sub_ref = sub { say "Hello sir @_ !" };
$sub_ref->( 'Luca' );         # Hello sir Luca !
&{ $sub_ref }( 'Luca' );      # Hello sir Luca !
$sub_ref->&* ;                # Hello sir  !
{ $sub_ref->&* }( 'Luca' );   # Hello sir  !
&$sub_ref( 'Luca' );          # Hello sir Luca !

If you try to pass arguments in any way to the ->&* dereferencing, you got either a compile time error or a code that does not what you expect:

{ $sub_ref->&>* }( 'Luca ');   # Hello sir  !
$sub_ref->&( 'Luca' );        # syntax error at tmp/ref.pl line 19, near "->&"
$sub_ref->&*( 'Luca' );       # syntax error at tmp/ref.pl line 20, near "&*( "
{ $sub_ref->& }( 'Luca ');    # syntax error at tmp/ref.pl line 21, near "->&"

The only catch-all way of passing arguments, as suggested on irc.perl.org, is to use the @_, so I suggest to localize it:


  local @_ = qw( Luca ); 
  $sub_ref->&*; # Hello sir Luca ! 
}

I must admit this is a little too much for my poor brain in order to efficiently use the references, so if anyone has something to comment is truly welcome!
Now, back to the main topic, it seems to me that the postfix dereference notation is a way of managing reference more like objects, at least it reminds me a kind of as_xx methods (yes, Ruby-ers, I'm looking at you). If we read the arrow operator like as and the type on sigil on the right as the type name, we have that, for instance:
  • $ref->$* is $ref as scalar;
  • $ref->@* is $ref as array or $ref>@[] becomes $ref as array slice
and so on. While I'm pretty sure this is not the real meaning of this notation, it is sure most readable than the "usual" one, where we have the type on the left and the reference on the right (or better in the middle), as in @$ref or @{ $ref }.

mercoledì 10 maggio 2017

Diamonds in Perl

A diamond ascii is a geometric structure represented as, ehm, a diamond. For example, in the case of letters it becomes something like:


      a
    b  b
   c    c
  d      d
 e        e
f          f
 e        e
  d      d
   c    c
    b  b
     a

How hard can it be to build a Perl program to represent a diamond like the above one?
Well, not so much hard, but we have to observe some geometric properties:

  • the diamond is simmetric (in the sense it becomes and ends with the same letters), but the
    central row is reproduced only once (that is, the f appears only on one line, not two!);
  • each letter or couple of letters is vertically centered around the number of letters in the whole diamond, that is
    the letter a (vertical centre) is shifted to right of 6 chars (the total number of letters is a..f = 6);
  • each couple of letters has a left and right position, and both are equally distant from the vertical
    centre of the diamond.

Ok, so here it comes my shorter solution:



#!env perl

use v5.20;

my @letters = qw( a b c d e f );

my %index;
@index{ @letters } = ( 0 .. $#letters );


    say {*STDOUT}
  " " x ( $#letters - $index{ $_ } )
 , $_
 , " " x ( $index{ $_ } * 2 )
 ,( $index{ $_ } > 0 ? $_ : '' )
     for  ( ( @letters, reverse @letters[ 0 .. $#letters - 1 ] ) );

Allow me to explain it in all its pieces.


First of all, @letters contains the letters to be printed in the right order, and this of course could come from user's input, a sequence, an array slice, or whatever, it does not mind here. Since I have to place letters depending on where they are in the array of @letters, I need to have an handy way to get the index within the array for each letter, so I build up an hash where the keys are the letters themselves and the values are the positions of such letters. In other words, $index{a} = 0, $index{f] = 5 and so on.


Finally, I print a line every time I need with say. Let's dissect the say statement:

  • " " x ( $#letters - $index{ $_ } ) shifts to right a number of spaces required to reach the vertical centre or the right distance from it, in other words it is the left position. For example, for the letter a it comes
    down to 5 - 0 = 5, while for b it comes to 5 - 1 = 4 and so on.
  • then I print $_ that is the current char;
  • then I move again to the right of " " x ( $index{ $_ } * 2 ), that is in the case of a nothing, in the case of b by 2, and so on;
  • then if required I print again the char. Here "required" means the char is not the first one (i.e., not the one at index 0), since that is the only one character printed exactly one time per line.

The say is repeated over the whole @letters array, so this produces the first part of the diamond:


      a
    b  b
   c    c
  d      d
 e        e
f          f

then I need to get the bottom, so I need to iterate over the reverse @letters with the exception of the last element, that is I need to iterate over a reversed slice of @letters with the missing f: reverse @letters[ 0 .. $#letters - 1 ] ). This provides me the whole bottom of the diamond.

L'importanza dell'operatore qq

Non si finisce mai di imparare!
Guardando uno script Perl per la gestione di backup testuali PostgreSQL
sono rimast incuriosito dall'uso massivo di printf legato all'operatore qq:
A double-quoted, interpolated string. Ebbene una utilita' di questo operatore
è nell'utilizzo delle virgolette doppie, che non devono ovviamente essere
legate a sequenze di escape. Quindi invece che scrivere:


printf "Un esempio di stringa \"%s\" ", 'quotata';

si può scrivere la versione molto piu' semplice e leggibile


printf qq( Un esempio di stringa "%s" ), 'quotata';

Sembra una banalità, ma non essendo abituato ad usare qq come invece
lo sono per q e qw, non mi sono mai posto il problema di come
semplificare ulteriormente le mie stringhe con doppi apici.


Ora diventerò anche io un avido utilizzatore di qq!

Applicare la geolocalizzazione massivamente in Digikam 5.5

digikam1.pngDigikam è veramente uno strumento di gestione delle immagini potentissimo, ma paradossalmente pecca di alcune carenza (a mio avviso)
proprio nella gestione massiva delle immagini, in particolare metadata e geolocalizzazione.
Con quest'ultima mi sono dovuto scontrare per inserire la località massivamente partendo dalle coordinate.
Ecco quindi come fare (versione attuale 5.5.0):


  1. selezionate una sola immagine e dal menu' Item selezionate Edit Geolocation… che aprirà la finestra di dialogo per
    la georeferenziazione. Da qui abilitate il pannello Details sulla destra e, dopo aver riselezionato l'immagine nella lista
    che compare all'interno della finestra, inserite le coordinate.

  2. cliccate sul tasto Apply nel pannello Details (non è quello che compare a fianco di Close!).
  3. dalla lista delle immagini della finestra di dialogo, dove ve ne è sempre e solo una, click destro e selezionate dal menu'
    contestuale Copy coordinates e infine chiudete la finestra di dialogo.



  4. selezionate ora tutte le immagini alle quali volete applicare le modifiche e nuovamente dal menu' Item scegliete
    Edit Geolocation…. Nella finestra di dialogo che appare selezionate nuovamente tutta la lista di immagini
    e fate click destro, poi dal menu' contestuale scegliete Paste coordinates.



  5. cliccate su Apply, questa volta il pulsante in fondo alla finestra, e il sistema inizierà a inserire le coordinate
    in tutte le immagini.

Nuovo sito web per Digikam!

Digikam, uno dei software che uso di piu' attualmente, ha un nuovo sito web completamente rinnovato e sicuramente molto piu' accattivante
rispetto a quello precedente!
Dopotutto, anche il sito web è importante per un progetto, perché consente di attirare nuovi utenti e potenziali sviluppatori.
Veramente un lavoro elegante e ben fatto.

martedì 9 maggio 2017

L'importanza della descrizione dei ticket...

La corretta gestione dei ticket, si sa, è una questione difficile. O meglio, non è certo la difficoltà di compilare
qualche campo in una form web, bensì è lo shift mentale che vi è richiesto per organizzare
bene i singoli ticket e le varie attività.


Un errore che vedo fare molto spesso, anche da persone che si definiscono dei professionisti informatici è quello
di usare la descrizione del ticket come il testo di una email. Ecco un esempio inventato ma non molto lontano
da diversi ticket che vedo regolarmente:


Ciao Mr.X,
avrei bisogno che mi controllassi il problema di stampa dei pdf prodotti in automatico, che hanno
l'orientamento orizzontale e non verticale. Questo non è comunque urgente, piuttosto è molto
urgente risolvere il problema di autenticazione del nuovo collega G., che non riesce
ad accedere al sistema.

Saluti,
Mr. Y

Quanti errori ci sono nella descrizione del suddetto ticket?


Anzitutto il tono "discorsivo" e quasi formale con saluti di apertura e chiusura, cosa corretta in una email, ma non
in un ticket (che ha già un owner (Mr. Y) e un assignee (Mr. X). Tali dettagli fanno solo perdere tempo e spazio,
e non sono assolutamente rilevanti per la risoluzione del ticket stesso.


Il secondo errore è la specificazione, verbale, della priorità dell'attività: […]non è comunque _urgente[…].
Tutti i sistemi di ticketing includono la possibilità di specificare la priorità delle singole attività collegate
al ticket, e il modo corretto di farlo è ovviamento usando tali campi. Perché? Beh, semplicemente perché chi prende in
mano il vostro ticket lo potrebbe fare basandosi su un filtro per priorità, dunque se il vostro ticket risulta a priorità
"normale", il fatto che voi inseriate la dicitura "urgente" non farà balzare il ticket nei primi posti della lista.


Infine, errore ancora piu' subdolo, nella descrizione di cui sopra si specificano due attività distinte (con due distinti
gradi di priorità):

  1. risolvere il problema dell'orientamento dei PDF, priorità bassa;
  2. risolvere il problema di autenticazione di un singolo utente, priorità alta.

La modalità corretta è quindi quella di aprire due ticket differenti, con priorità differente e descrizione limitata
al problema specifico.
Ecco cosa avrei prodotto io:


Ticket 1
Priorità: bassa

Problema layout stampa PDF: attualmente tutti i PDF escono con orientamento orizzontale,
mentre l'orientamento deve essere VERTICALE.

Ticket 2
Priorità: Alta

L'utente G. (username=g) non riesce ad autenticarsi. Il messaggio di errore
che riceve è "username o password errati".
Questo è per lui bloccante, non può collegarsi al sistema.


Come si può notare, oltre a "spaccare" un singolo messaggio in due, si usano livelli di priorità differente
e si specifica in maniera dettagliata ma senza fronzoli (saluti, ecc.) i problemi e i messaggi di errore.


Usare un sistema di ticketing può risultare in un meccanismo particolarmente potente e strutturato per la gestione
delle attività, ma occorre una impostazione mentale adeguata e rigorosa.
Uno dei consigli che mi sento di dare a chi si vuole "formare" e preparare adeguatamente a questo approccio è quello
di usare un sistema di ticketing anche per progetti mono-sviluppatore. In questo sistemi integrati come
Fossil consentono di unire agilmente la gestione del codice e dei ticket.

lunedì 8 maggio 2017

API Java per la replicazione PostgreSQL

Con l'avvento della versione 42 del driver JDBC per utilizzo di applicativi Java su database PostgreSQL è stato fornito
il supporto alla replicazione.
E' stata creata una API apposita per la gestione della replicazione, il cui punto di ingresso è getReplicationAPI() sull'oggetto PGConnection, come
descritto nella documentazione:


The entire replication API is grouped
in org.postgresql.replication.PGReplicationConnection
and is available via org.postgresql.PGConnection#getReplicationAPI.

Questa feature permette di controllare e gestire la replicazione anche da applicativi esterni (Java).
Ahimé il corrispondente driver Perl (DBD::Pg) non supporta la gestione della replication, nonostante
l'infrastruttura DBI consenta una gestione generalizzata della replica. Anche il framework
DBIx non mi pare offra una soluzione, seppur esista un "minimale" supporto
alla replica logica a livello di tabella. La cosa strana è che Bucardo è un sistema di replica implementato in Perl!

domenica 7 maggio 2017

Ti ricordi dei formati Perl?

Chi si ricorda della variabile $~?
perlvar viene in aiuto:


HANDLE->format_name(EXPR)
$FORMAT_NAME
$~      The name of the current report format for the currently selected
        output channel. The default format name is the same as the
        filehandle name. For example, the default format name for the
        "STDOUT" filehandle is just "STDOUT".

Da piccolo usavo molto i formati Perl, anche se li trovavo poco leggibili. Probabilmente
proveniendo dal mondo della carta stampata su modulo continuo avevo l'abitudine mentale di
voler "immaginare" l'output in formato tabellare (non saprei pensare ad altro
utilizzo dei formati).
Ecco quindi che spesso i miei programmi assumevano la forma:


format most_called_format=
|@<<<<<<<<<<<<<<<<<<<<<|@>>>>>>|
$number,                  $call
.

Ormai, almeno per me (ma penso anche per moltissimi sviluppatori Perl), i formati non sono altro
che un ricordo!

sabato 6 maggio 2017

Toolchain e qualità PostgreSQL

Ho trovato una presentazione interessante sul mantenimento del code base di PostgreSQL, che come è ben noto,
ha una dimensione piuttosto estesa e una qualità del codice elevata.
La presentazione, non particolarmente dettagliata nelle sue slide (e quindi di facile lettura),
percorre ed elenca i principali strumenti e termini usati all'interno del progetto, dalle notizie e le mailing list,
al metodo di invio di una patch e alla relativa revisione e test automatico del codice.
Ritengo sia sempre utile valutare quali strumenti i grossi progetti utilizzano e il workflow, anche di alto livello,
che adottano.

venerdì 5 maggio 2017

perlcritic and allowing simple double sigils dereferencing

I found out, while not searching for it, an interesting policy for Perl::Critic
named Perl::Critic::Policy::References::ProhibitComplexDoubleSigils. The idea is quite simple and I find myself agreeing with it:
when dereferencing a reference you can omit braces; while this is often not a good habit (because it can become
quite difficult to read a complex reference), it can work out on simpler cases:



#!env perl
use v5.20;

my %hash = (
    key1 => "value1"
    , key2 => "value2"
    , key3 => "value3"
    );

my $hash_ref = \%hash;

while ( my ( $k, $v ) = each %$hash_ref ){
    say "$k --> $v";
}

In the above code it is quite simple to understand that @$hash_ref refers to the whole hash without the need to use
the braces around.


I recognize the advantages of using the braces as:

  • a good habit;
  • a way to do a good refactoring in the case the reference points to a multidimensional structure.

Anyway simple idioms should stay simple and so should the checkers manage them.
If I run perlcritic against the above piece of code, the Perl::Critic::Policy::References::ProhibitDoubleSigils
will report


% perlcritic --single-policy=ProhibitDoubleSigils p.pl
Double-sigil dereference at line 12, column 30.  See page 228 of PBP.  (Severity: 2)

Now, using the above mentioned Perl::Critic::Policy::References::ProhibitComplexDoubleSigils the result becomes:


% perlcritic --single-policy=ProhibitComplexDoubleSigils p.pl
p.pl source OK

Ok, let's refactor the program and transform the plain hash in a multidimensional one:




#!env perl
use v5.20;

my %hash = (
    key1 => "value1"
    , key2 => "value2"
    , key3 => "value3"
    );

my %outer_hash = ( config1 => \%hash );
my $hash_ref = \%outer_hash;

while ( my ( $k, $v ) = each %{ $hash_ref->{config1} } ){
    say "$k --> $v";
}

The above part is the right way of dereferencing the reference to the inner hash, and it does use curly braces
and makes Perl::Critic, both policies, not complaining at all.


If, instead, we pretend to use double sigils (that is removing the arrow operator) the code becomes:



#!env perl
use v5.20;

my %hash = (
    key1 => "value1"
    , key2 => "value2"
    , key3 => "value3"
    );

my %outer_hash = ( config1 => \%hash );
my $hash_ref = \%outer_hash;

while ( my ( $k, $v ) = each %{ $$hash_ref{config1} } ){
    say "$k --> $v";
}

and of course both policies complain about:


% perlcritic --single-policy=ProhibitDoubleSigils p.pl
Double-sigil dereference at line 13, column 33.  See page 228 of PBP.  (Severity: 2)

% perlcritic --single-policy=ProhibitComplexDoubleSigils p.pl
Complex double-sigil dereferences at line 13, column 33.  Found complex double-sigil dereference without curly braces.  (Severity: 2)


Something similar happens when you try to extract a single scalar value from the nested hash:



...
# line 20
my $k1 = $$hash_ref{ config1 }->{ key1 };
say "key1 is $k1";

# line 23
$k1 = $hash_ref->{ config1 }->{ key1 };
say "key1 is $k1";

both the above instructions place the correct value into the $k1 variable, and the policies both complain about the first line
(the one with the $$):


% perlcritic --single-policy=ProhibitDoubleSigils p.pl
Double-sigil dereference at line 20, column 10.  See page 228 of PBP.  (Severity: 2)

% perlcritic --single-policy=ProhibitDoubleComplexSigils p.pl
The value for the global "single-policy" option ("ProhibitDoubleComplexSigils") did not match any policies (in combination with other policy restrictions).


Summing up the idea is to allow double sigils only when you are dereferencing a single level, not for complex data structures.

Ruotare un file PDF da linea di comando

A volte mi trovo con dei documenti PDF ruotati sottosopra causa una cattiva acquisizione.
Come fare per modificarli? Niente di piu' facile con pdftk:



% pdftk fileSbagliato.pdf cat 1-enddown output ok.pdf

Con il comando qui sopra si prende il file ruotato male, denominato fileSbagliato.pdf
e lo si ruota di 180 gradi salvando il risultato in ok.pdf. La rotazione avviene
mediante il comando cat (concatenate) che accetta tre argomenti (vedere pdftk(1)):

  • la pagina iniziale;
  • la pagina finale o la speciale stringa end per indicare l'ultima pagina senza conoscerne il numero;
  • una strnga che indica l'eventuale rotazione (se necessaria), con valori come north, down, east, left, ecc.

Quindi la stringa 1-enddown si spessa in 1, end e down ovvero "dalla prima pagina alla fine, ruotando di down sottosopra
gradi ogni pagina.

Ancoa sul "caso" PostgreSQL e Uber...

Chi si ricorda il caso di Uber che, dopo attenta valutazione decise di passare a MySQL e abbandonare PostgreSQL?
I commenti sulla vicenda si sono sprecati, e la community PostgreSQL ha da subito mostrato
un approccio costruttivo alla vicenda sezionando e spiegando in dettaglio
come le limitazioni presentate da Uber erano in realtà scelte progettuali e/o
ostacoli superabili con estensioni (si veda per esempio il post di Simon Riggs).


Ora Christophe Pettus ha prodotto una presentazione molto interessante e dettagliata
che riassume quanto affermato da Uber e come questo, seppur vero in teoria, non sia mai stato documentato opportunamente e,
soprattutto, non sia mai stato valutato adeguatamente.
Non voglio difendere PostgreSQL a spada tratta, penso che una compagnia come Uber abbia sicuramente
personale qualificato per prendere la decisione migliore, anche se come spiegato nella presentazione
di cui sopra spesso si è fatta confusione confrontando cose differenti negli stessi ambiti (ad esempio
la replica, comparando la logical replication con la streaming replication).

domenica 30 aprile 2017

Baby-Perl

Ecco un altro esempio, questa volta in italiano, di uso infantile di Perl e in generale scorretto della formattazione di stringhe.

L'idea è semplice: occorre stampare in un file a formato fisso delle linee di valori. Si noti bene che il formato fisso si presta bene all'uso della printf, e sicuramente all'uso dei formati Perl, anche se con campi molto lunghi questi diventano scomodi e francamente poco leggibili.

Ma l'ignoranza e l'inesperienza facevano da padroni nei primi anni della programmazione, così seguendo il fantastico principio del reinventare la ruota ecco che si implementava una sorta di sprintf del poveraccio:


my $INIZIO_CAMPO = "|";
my $FINE_CAMPO   = "|";

sub formatta($$$){
    my ($stringa, $lunghezza, $fineRiga) = @_;

    $stringa =~ s/,/./g;

    if( length($stringa) > $lunghezza ){
 $stringa = substr($stringa, 0, $lunghezza);
    }


    if( $fineRiga != 1 ){
 return $INIZIO_CAMPO . $stringa . $FINE_CAMPO . $SEPARATORE_CAMPO;
    } else {
 return $INIZIO_CAMPO . $stringa . $FINE_CAMPO . $FINE_LINEA;
    }


}

Piuttosto elementare:
  1. si sostituisce il carattere , con il ., e questo ha poco a che fare con la stampa stessa;
  2. se la stringa di partenza $stringa supera la lunghezza del campo (specificata in $lunghezza)
    si tronca la stringa (mentre non c'è un padding qualora la stringa sia piu' corta);
  3. si concatena la stringa con i delimitatori di campo e si aggiunge, eventualmente, un carattere di
    fine linea.

Come si potrebbe fare una versione compatta? O meglio, come affronterei il problema oggi?
Beh, come già detto, printf(3p) è la salvezza:


sub formatta{
    my ( $stringa, $lunghezza, $fine_riga ) = @_;

    my $format_string = sprintf "%%s%%%ds%%s%%s", $lunghezza;
    return sprintf $format_string, $INIZIO_CAMPO, $stringa, $FINE_CAMPO, ( $fine_riga ? "\n" : "" );
}

L'unica complicazione evidente è la gestione della lunghezza della stringa, che deve essere specificata nella stringa di formato.
Si supponga di chiamare la funzione formatta come segue:

say formatta 'foo', 5, 1;


Ora quello che succede è che:
  1. $format_string viene valorizzato come %s%5s%s%s ovvero sprintf "%%s%%%ds%%s%%s", $lunghezza; converte la stringa passata
    come argomento
    • %%s viene convertito come %s, il primo è il separatore di inizio campo, il penultimo di fine campo, l'ultimo è l'evetuale
      fine riga;
    • %%%ds viene convertito in %$lunghezzas ovvero %5s;
  2. la stringa composta in $format_string è ancora una stringa valida per la printf, che quindi viene riempita con
    i valori appositi.

Sicuramente questa versione è piu' compatta e piu' manutenibile, a patto di leggere correttamente il formato della pritnf e il "doppio formato"
costruito nel primo passaggio (es. %%s).

E' anche possibile compattare ancora di piu' cecando di evitare i doppi passaggi, o meglio di limitarli al massimo:


sub formatta{
    my ( $stringa, $lunghezza, $fine_riga ) = @_;

    my $format_string = sprintf "%1s%%%ds%1s%%1s", $INIZIO_CAMPO, $lunghezza, $FINE_CAMPO;
    return sprintf $format_string,  $stringa, ( $fine_riga ? "\n" : "" );
}

Il concetto rimane il medesimo, ma la $format_string in uscita dalla prima invocazione di sprintf vale qualcosa come |%5s|%1s
e quindi richiede solo due parametri (la stringa e il fine riga).

E' poi possibile allineare la stringa a sinistra (invece che a destra) semplicemente usando un padding negativo (es -5%s).

La lezione imparata è dunque: la printf è molto versatile, e nella maggior parte dei casi può risolvere parecchi grattacapi di formattazione!

sabato 29 aprile 2017

Another baby Perl program from a backup in the attic

This is a simple script that takes all arguments on the command line and returns a list of directories for each argument.
I don't remember at all what the purpose of this was, probably something to process a set of directories via a batch starting from
a set of filenames.
By the way, let's say how baby I was at that time…


1.1 Baby Code

Here it is, please note how long the code is…



#!/usr/bin/perl

# This script manipulates all its arguments returning a list of string which
# are the directory of all files. For example if the argument is
# /home/luca/file.txt only /home/luca/ is returned (note the last backslash).


# return array
@ret;

foreach $name (@ARGV){


    if( -f $name ){

 # check if the name contains a /, if no the directory must be the current one (./)
 if(not  $name =~ /\// ){
     $ret[++$#ret] = "./";
 }
 else{

     @parts;
     $dir;

     # split into the section with /
     @parts = split("/",$name);


     if( defined(@parts) ){

  for($i=0; $i < $#parts; $i++){
      $dir .= $parts[ $i ]."/";
  }
     }


     # put the result inot the array
     $ret[++$#ret] = $dir;
 }
    }
    elsif( -d $name ){


 # this is a directory, simply check if ends with the slash
 if( not $name =~ /\/$/ ){
     $name .= "/";
 }

 $ret[++$#ret] = $name;
    }
}


# all done, print the array
foreach $name (@ret){
    print $name."\n";

}

1.2 The teen-ager code

What if I have to write a similar service today?
That is the very first code that comes into my mind:



#!env perl

use v5.20;
use File::Spec;

my @dirs;
for ( @ARGV ){
    push @dirs, $_ if ( -d $_ );
    push @dirs, ( File::Spec->splitpath( $_ ) )[ 1 ] if ( -f $_ );
}

{
    local $" = "/\n";
    say "@dirs";
}

A lot less, uh?
It could be simpler, but so far I don't know any module to return the directory part of a path when a file is provided as argument, and
this is the reason I need to distinguish between the -f and -d cases.


The adoption of the list separator $" is specific to add the trailing backslash and to place every single directory on its own line. Of course this means, as $" suggests, that the array has to be printed via doubled quoted print. Here the adoption of say
or print is totally equivalent.


Let be honest here: the adoption of local and the surrounding block could be avoided, since the script exits implicitly right after
such couple of instructions, and this means the code could even be simpler, or better, shorter, but let's keep some good habits!

venerdì 14 aprile 2017

A baby Perl program from a backup in the attic!

…I was assigned one of my very first program to develop: a way to assure an user can run no more than a specific number of instances
of the same executable.


The real story was this: a company was using an Unix ncurses based ERP, and users were forced to connect via remote shell and
launch it automatically. The problem was that several users were opening more terminals (and therefore ERP clients) limiting
resources to other users.


I wanted to demonstrate my skills being able to implement the solution in a couple of hours, and instead of spending time
searching for a production-ready solution on the Internet, I decided to do it all on my own.


It was clear to me I was needing a kind of wrapper to control the execution of external programs. In fact, since users were
automatically switched from the remote shell into the ERP client, it did suffice to place the wrapper as the
user shell.


But how to implement it? At that time I knew Java, C++, C, Perl and some Bourne Shell. Making it in Java was too heavy, and
most notably required to install a JVM on the server (it was the time of the Red Hat 5 branch). C and C++ required too much time
to come with a working solution, and Perl was the most promising allowing me to do everything I was able to do with
other languages, overcoming the limitation of the shell (e.g., arithmetic).


But at that time I was not a fluent Perl developer.


A few days ago I found the above program out of an old backup, and so I spent five minutes to read it again. Not surprisingly, I found
it awful with respect to the code I write today, and moreover I see a lot of code uglyness due to baby steps. But hey, it was fun
to see how many things I've changed in my coding style, idioms, adoption of the language and of its operators, and most notably
about how I'm today toward compact code instead of long and blown code. At that time I was convinced that as long the code was, as much
it must be simpler to read, today I believe it is the opposite (with a very few exceptions I tend not to adopt until I
can remember very well).


In this article I will dissect a few parts of the above program to show you how I was writing code back in those days, and it was
a huge time ago (at least in computer timeline): 15+ years ago.


2 Baby Steps

Well, as you can imagine when I started doing a real job I was nothing more than a programmer-to-be, and my knowlegde
about Perl was really tiny. I mean, I read the Camel Book (of course!), and I was experimenting with Perl as much as I could,
but I was not an avid consumer of the Perl community (as well as other communities) and so I did not have a lot of chances
to learn from other people experience and code.


More than that, I just finished my university degree, and so my mind was literally full of theories about writing
correct and beautiful code, but that were just theories without any implementation! And last but not least, I was convinced
I could do everything by my own (that is, who needs modules?).


Today I recognize the following as baby steps in the Perl world.

2.1 Method Prototypes

Yes, I was using prototypes everywhere.


Why? I suspect they were comfortable to me, since I was used to languages where each method has a prototype (C, C++, Java), and
so I was trying to use a kind of common approach to every language, applying to Perl prototypes even when I was not needing
them at all!


Of course, by time and experience, I found that prototypes are usually not useful at all in my programs, and made refactoring
a lot harder.

2.2 Method instead of Operators

Perl allows operators to be used without braces, but that was something my eyes were not able to parse!
Again, coming from languages where methods and operators both need parentheses, I tried to bend Perl to my will
and use operators the very same way.

2.3 Untrusting the Thruth

Perl has the great ability to cast a variable depending on the context, but it was something too much complex for me.
So for instances, instead of testing a scalar against a not-true value, I was using the defined operator:


# if ( $scalar )
if ( defined $scalar ){ ... }

Worst: I was not using the ability of an array to be scalar-contextualized in tests:


#if ( ! @array )
if ( not defined @array || $#array <= 0 ){ ... }

2.4 Global Variables

my what?


I was not using scoped variables, and the reasons were:

  1. it was not clear to me the real difference between my and local, and I have to confess I thought local was what should be the
    name for scoped variables (maybe because of the way some shells declare local variables), but since local was usually a bad idea
    I decided not to use either;
  2. my scripts were quite small and self contained, so there was no risk at all about variable clash.

2.5 Backtick as a Rescue!

I knew pretty much well a kind of command line Unix tools (e.g., grep, awk, sed, find), and sometimes I needed
to do something in Perl without knowing how to do but with shell tools. So, how could I merge the both?


Easy pal: backtick the Unix tools and manage the result in Perl!

2.6 No Modules, please

I was not using modules, both by design and by fear.


With by design I mean that I was writing scripts for machine loosely connected to the Internet (just think I was working with server behind an ISDN router!),
and therefore even dowloading a module and install it could be quite a pain.


By fear means I was quite scared about importing other people's code. I was still learning about how to use correctly the CPAN, how to read
the documentation, and so on. And I did not want to install things in the wrong way on code that must run. After all, I was half-an-hour
away from implementing some kind of module functionality by my own (ah, the ego…).

2.7 Regexp Sometimes…

I tried to use regexp, but I was too less experienced. Therefore I usually did the substitution with friendly tools, like
split and alike.


I fear regexp nomore, and even if I'm not a master at all, I find interesting placing them in programs not only because
they allow me to write clear and compact code, but also because they still are some criptic stuff other developers
have hard time reading.


3 The Code

3.1 University Distortion

There are a couple of places where you can see evidence of the teachings at my university, most notably the
command line argument loop:


sub parseCommandLine()
{

    # The script must be at least two arguments.
    if ( $#ARGV < 1 or $#ARGV > 4 )
    {
        help();
        exit(0);
    }

    $POLICY_FILE = $DEFAULT_POLICY_FILE;

    foreach $argument (@ARGV)
    {
        if ( $argument =~  /(-program=)/ )
        {
            $argument =~ s/(-program=)//;
            $PROGRAM = $argument;
        }
        elsif ( $argument =~ /(-policy=)/ )
        {
            $argument =~ s/(-policy=)//;
            $POLICY_FILE = $argument;
        }
        elsif ( $argument =~ /(-username=)/ )
        {
            $argument =~ s/(-username=)//;
            $USERNAME = $argument;
        }
        elsif ( $argument =~ /(-message=)/ )
        {
            $argument =~ s/(-message=)//;
            $MESSAGE_FILE = $argument;
        }
    }

    # check main parameters
    if ( not defined $USERNAME or $USERNAME eq "")
    {
        warn("\nCannot find username !!\n\n");
        help();
        exit(1);
    }
    elsif ( not defined $PROGRAM or $PROGRAM eq "")
    {
        warn("\nCannot find the program name \n\n");
        help();
        exit(2);
    }
    elsif ( not defined $POLICY_FILE or $POLICY_FILE eq "")
    {
        warn("\nI need a policy file to run!\n\n");
        help();
        exit(3);
    }


}

As you can see I was hardwiring the arguments directly into the program, as well as I was not using any getopt-like module.
That was by culture: university never told me there was a getopt way of getting parameters!


Please note also that I was checking the argument numbers as well as exiting from the program with a different exit code for each branch.
The style reminds pretty much the C-style I was used to work with during my exams.

3.2 Ready? Go!

How can I execute another program from within the Perl wrapper?


Well, the simplest way that I was aware of is to call execute, in other words, to work as a low level C:


sub launch($)
{
    # take arguments
    my ($program) = @_;

    # execute the program
    sleep(2);
    exec($program);
}

3.3 How Many Instances can you Run?

How to define how many instances an user can run?


Here I decided to take inspiration from the sudo policy file, and so I invented my own policy file with a pretty
simple syntax:


username@executable=instances


where:

  • username was the Unix username;
  • executable was the name of the executable to run;
  • instances was the number of the maximum allowed instances of the executable for the specified username.

And to make it looking mor eprofessional, I allowed the policy file to include a star in any field
in order to mean anything or any instance.


Of course, the above was full crap. For instance, if you were able to create a link from the executable to another
with the right name you could change you allowance. But luckily, no user was able to do that, and to some
extent even the other sysadmins!


Having defined the policy file syntax, reading the file was as simple as:


sub getAllowedInstances($$$)
{
    # take params
    my ($policy_file, $username, $program) = @_;
    my $instances = -1;

    # try to open the policy file
    open (POLICY_FILE_HANDLER, "<".$policy_file) || die("\nCannot parse policy file <$policy_file> !\n");
    print "\nParsing configuration file $policy_file for program $program, username $username...";

    $instances = 0;           # by default do not allow a user to execute a program

    # get each line from the policy file, and then search for the username
    while ( $line = <POLICY_FILE_HANDLER>  )
    {

        print "\n\t Configuration line: $line ";


        # take only lines with the program name specified
        if( grep("$program", $line) ){

            @lineParts = split("@",$line);
            $configUsername = $lineParts[0];
            print "\ncontrollo se $username == $configUsername";

            if ( $username eq $configUsername )
            {
                # this is the right line
                # take the instances number
                @pieces = split("=" , $line);
                $instances = $pieces[1];
                # remove the '\n'
                chomp($instances);
                print "\n\t\t\tUser allowance: $instances";
                return $instances;
            }
            elsif ( $configUsername eq "*" )
            {
                # a valid entry for all users
                # take the instances number
                @pieces = split("=" , $line);
                $instances = $pieces[1];
                # remove the '\n'
                chomp($instances);
                print "\n\t\t\tGlobal allowance: $instances";
            }
        }
    }


    # no lines found, the user has no restrictions
    return $instances;
}

What an amazing piece of code, uh?


Instead of reading the policy file once (or seldom), I was reading it fully every time I needed to check an user; it was
a very non-optimzed code.
Moreover, I was reading and grepping it a line at time, a very keyboard effort.

3.4 How much are you Running?

How to get the number of runnable instances an user was running?


Piece of cake: let's do it in Unix, backtick in Perl and increase by one:



sub getRunningInstancesCount($$)
{
    # get parameters
    my ( $program, $username ) = @_;

    # get all processes for this program and the username
    @processes = `ps -u $username | grep $program | grep -v "grep"`;


    # get all lines count
    $running_instances = 0;
    if ( $#processes >= 0 )
    {
        $running_instances = $#processes + 1;
    }
    else
    {
        return 0;
    }

    return $running_instances;
}

Please note also the adoption of literals instead of variables: for example return 0; instead of return $running_instances where
the latter has been already initialized to zero.

3.5 All Pieces Together

Now, putting the places together made the main loop as simple as:


$RUNNING_INSTANCES = getRunningInstancesCount($BASE_NAME, $USERNAME);
print "\nYou're running $RUNNING_INSTANCES $BASE_NAME\n";

# if the user can, launch the program
if ( $RUNNING_INSTANCES < $ALLOWED_INSTANCES )
{
    print "\nExecuting $PROGRAM...";
    launch($PROGRAM);
    print "\nBye!\n";
}
else
{
    print "\nYou can't run no more instances of $PROGRAM";
    print "\nBye\n";
    displayMessage();
}


4 Lesson Learned

There is no lesson to learn here, but a lot of lessons learned in the between.


The only thing I could say is that you should never, never, throw away your code. Keep it, it is cheap, and someday you could
see how you progressed from a green programmer to the very guru developer you probably are today.

Dimensioni delle tabelle e dei dati dump di testo, qualche insignificante esperimento

Mi sono ritrovato per le mani una vecchia e obsoleta istanza PostgreSQL 8.4 piuttosto grossa, lo spazio disco del tablespace
risultava essere di circa 13 GB! Ho quindi preso spunto per fare una piccola indagine su cosa occupasse tanto spazio,
concentrandomi solo sulle relazioni (tabelle):


SELECT c.oid,nspname AS table_schema
      , relname AS TABLE_NAME
      , c.reltuples AS row_estimate
      , to_char( pg_total_relation_size(c.oid)::real/(1024*1024), '99G999D99' ) AS MB
FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE
      relkind = 'r'
      AND
      nspname = 'public'
ORDER BY 4 DESC, 3 ASC;

Ebbene sono saltate subito all'occhio tre tabelle in particolari (nomi di fantasia):


  oid   | table_schema | table_name | row_estimate |     mb
---------+--------------+------------+--------------+------------
   63740 | public       | tab1      |  8.74153e+06 |   2.248,58
   66161 | public       | tab2      |   2.9728e+06 |   1.192,00
   65032 | public       | tab3      |  2.44735e+06 |   1.280,77

Come si nota queste tre tabelle superano di slancio ciascuna una occupazione di 1 GB, arrivando fino a 9 milioni di tuple circa!
Insomma, non una cosa eccezionale per PostgreSQL, ma sicuramente nemmeno una cosa di routine, e che comunque indica forse la necessità
di una riprogettazione o di un partitioning.
Comunque, le stesse tabelle su disco quando occuperebbero in formato testo?



% pg_dump -h localhost -U luca -t tab1 testdb > tab1.sql

Effettuando il dump, con molta pazienza, di quelle tre tabelle sopra indicate si ottiene che:


% ls -lh tab?.sql
-rw-r--r-- 1 luca luca 579M 2017-04-12 11:32 tab1.sql
-rw-r--r-- 1 luca luca 494M 2017-04-12 11:37 tab2.sql
-rw-r--r-- 1 luca luca 571M 2017-04-12 11:36 tab3.sql

e quindi lo spazio occupato all'interno di PostgreSQL risulta da 2 a 4 volte superiore allo spazio disco dei dati testuali.
Chiaramente questa non rappresenta una inefficienza di PostgreSQL, quanto una naturale esigenza del database di tenere i dati allineati,
avere le pagine dati (8kB) con spazio sufficiente per consentire aggiornamenti, ecc.


Se si effettua un vacuum full sulle tabelle di cui sopra si ottiene il seguente risultato:


> vacuum full verbose tab1;
...
0 index pages have been deleted, 0 are currently reusable.

ad indicare che il database era già "buono", e abbastanza compattato. Ovviamente i dati che PostgreSQL riporta sul numero di tuple e dimensione
dei dati sono rimaste invariate.

lunedì 10 aprile 2017

On learning thru Open Source Software

I read an interesting blog post on adopting open source at university as a way of teaching computer science, and I posted also a comment
on such article.
Here I would like to extend my point of view about the subject.
Having worked in an university for a few years, having done computer science at both sides of the desk, and having a quite good
experience in what became my day-to-day work, I believe I can express some opinion about.

Adopting open source as a teaching methodology should just be.
Period!
Instead of asking students to produce the same crappy software all over the time (a database layer, a GUI layer, a music store,
a calculator), put them on a real piece of software that someone in the world could use and adopt.
Why?
  1. motivation: students will be happy to work on some real stuff with the main author(s) thanking them for their help, time,
    and work;
  2. learn something useful: real programs do things in the real world, and things in the real world must work, and work fast, and work
    in accurate way. That is:
    • work on real data: nobody will ever notice your crappy homework do a full table scan each time you need to display one of your
      fake music titles out of ten, but a real program will freeze once you try to do a full table scan just to display out a detail
      out of a million records. This is something university cannot teach you, trust me!
    • deal with problems: university will teach you a lot of theory about using natural keys, algorithms to sort structures, avoid
      data replication, and so on. Not always these approaches will drive you to a manageable software: learn to deal
      with surrogate keys, duplicate data when it makes sense (e.g., network latency, historical reasons and so on).
    • learn the tools: developers around the world need to coordinate. Learn to use bug reports, stay on mailing lists, IRC channels,
      and alike. Don't ask stackoverflow for someone to do your homework, learn how to find documentation and search for answers. Become
      acquainted with revision control, compilers, linkers, and different platforms.
    • document for someone else: it is easy for you to explain what you have done to your teacher, in particular if you did it in the very
      previous period of time (typically a semester). But can you document something so that another developer, even another student like
      you, can understand one year later why and how you did a task?
  3. do not start a project from scratch: typically the university cycle during semesters is something like design-implement-compile-run-explain-throw away_
    and then you start over and over again every time you got an assignment, homework, or project. This is wrong! Real life does not work as such:
    in real life you are often assigned to maintain or refactor an existing piece of code without having to throw it away.
  4. learn idioms: developers around the globe are smarter than you. It is not they are more intelligent, it is just they are more expert
    and know better the subject. Reading someone else (smarter) code is a great way to improve your knowledge and become smarter too. Learning idioms,
    seeing patterns applied to real world code is a great start to become a professional coder.
  5. fun: developers have their habits. They meet together in so called conferences, and usually got beers while talking about code, travel
    around the world, and have a lot of fun. And even if you stay closed in your room, doing your homework, having a chat, a video call
    or an email in your in-box with a "thank you" or "good job!" is really fun.

There are other reasons, but the above are my main choices to push students to the open source world.
So far, it seems that you are going to have only advantages and no drawbacks, but that's not true.
Becoming an open source contributor you are going to become smarter than your own university teacher, and this is a drawback so far as
the teacher signs your curriculum. It is something that is really hard for a teacher to keep in mind, but it is so true.
I was always saying to my student, in the very beginning of each class, that at the end they will know better than me the subject, and the reason
for that is that "I'm not going to spend whole nights studying before the exam!".
So, if you are a teacher, simply accept that.
Accept that a student could prove a better algorithm, or an idiom you don't know that works. Ask him for the details, learn from him.
Learning is not a one-way process, with a god-like teacher and an idiot-like student; learning is a cooperation where someone expert provides the sparkle to someone else.
Would not be nice to see if both can grow during the path?

There is also another drawback: open source is not something you can teach without knowledge. You have to know the tools: revision control, IDEs,
bug tracking, issue tracking, wiki, testing and alike.
Most teachers urge teaching C pointers arhitmetic instead of basic tools, and that's not good.
Allow me to provide my very own example: I spent five years in a computer science degree, and nobody told me about revision control. While
doing my master thesis, being afraid of loosing some change or to mistakenly do a single line change (that will not blow up your project, right?),
I did my very own backup shell script that was running every hour to keep an historical copy of my work.
Shame on me, in the very same time I could have learnt rc or cvs (no, it was before git).

So my advice for students is be a part of an open source community, you will surely learn something that will make the difference in
your real job.
And my advice for teachers is accept smarter students and promote the adoption of an open source code base. There are a lot of "mini-hackers"
initiatives around the world (CPAN Pull Request, Hacktoberfest, etc.), pick one and try let your student do the rest.
You'll be happier, your student will be happier, the open source community will be happier and, who knows, your employer could also
become a partner in an open source community.