Puzzled by a post on the Planet KDE about
GCompris and roman numbers,
and needing an easy way to explain to my six-years old son about roman
numbers, I thought it would be an easy task to make a simple program in
Perl to convert an arabic number into its roman notation.
Not so easy, pal!
Well, it's not a problem about Perl, of course, rather I found it
required a quite brainpower for me to write down rules to convert
numbers, and I did not search for the web for a
copy-and-paste alghoritm. Please note: if you need a rock-solid way to handle conversions, have a look at
CPAN that is full of modules for this particular aim.
Here I'm going to discuss the solution I found and how I implemented
it. It is not supposed to be the best one, or the faster one, it's just
my solution from scratch.
The program
I split the problem of converting an arabic number into a roman one
into three steps, with one dedicated subroutine for each step, so that
the main loop reduces to something like the following:
say "$_ = " . $roman_string->( $reassemble->( $disassemble->( $_ ) ) )
for ( 1..30 );
that produces the following output:
1 = I
2 = II
3 = III
4 = IV
5 = V
6 = VI
7 = VII
8 = VIII
9 = IX
10 = X
11 = XI
12 = XII
13 = XIII
14 = XIV
15 = XV
16 = XVI
17 = XVII
18 = XVIII
19 = XIX
20 = XX
21 = XXI
22 = XXII
23 = XXIII
24 = XXIV
25 = XXV
26 = XXVI
27 = XXVII
28 = XXVIII
29 = XXIX
30 = XXX
The steps must be read from the inner subroutine to the outer, of course, and therefore we have:
disassemble
that translates an arabic number into roman
basis, that is computes how many units, tens, hundreds and thousands
are required. In this phase there is no application of roman rules, so
numbers are decomposed into a linear string of letters. As an example the number 4
is translated into IIII
, which is of course a non-existent roman number.
reassemble
applies roman rules, in particular promoting numbers so that groups are translated, when needed, into higher order letters. For instance IIII
is promoted into two groups: I
and V
.
roman_string
compose the promoted groups into the final
string. The main difficulty of this part is to understand when a letter
has to be placed on the right (addition) or on the left (subtraction)
of another letter. For instance, having the groups I
and V
the function must understand if the output have to be VI
(6) or IV
(4).
To speed up the writing of the code, I placed main roman letters and their correspondance with arabic numbers into a
global hash:
my $roman = {
1 => 'I',
5 => 'V',
10 => 'X',
50 => 'L',
100 => 'C',
500 => 'D',
1000 => 'M',
};
Each method references
$roman
when needing to convert
from an arabic number to its roman letter. In order to allow method to
cooperate together, they accept and return an hash keyed by a roman
letter and the number of occurences such letter must appear in the final
string. The following is an example of the hash for a few numbers:
# 4 (IV)
{ 'I' => 1, 'V' => 1 }
# 19 (XIX)
{ 'I' => 1, 'X' => 2 }
# 5 (V)
{ 'V' => 1 }
# 17 (XVII)
{ 'X' => 1, 'V' => 1, 'I' => 2 }
The disassemble
function
The following is the code for the
disassemble
function, that accepts as only input the arabic number.
# Accepts the arabic number and provides an hash
# keyed by each letter, with the value of how many times
# such letter should be summed in order to obtain the
# starting number.
my $disassemble = sub{
my ( $number ) = @_;
my $items = {};
# sort the keys, that are arabic thresolds, from
# the greater to the smaller one
for my $current_value ( sort { $b <=> $a } keys $roman->%* ){
my $how_many = int( $number / $current_value );
next unless ( $how_many );
my $letter = $roman->%{ $current_value };
$items->{ $letter } = $how_many;
$number -= $current_value * $how_many;
}
return $items;
};
The first thing the method does it to create the hash
$items
that is what it will return to allow other methods to consume. Each key of the
$roman
hash is passed ordered by the bigger to the smaller (please note that
sort
has
$b
first!). In this way we can surely scompose the number from the thousands, hundreds, tens, and units in this exact order. The
$how_many
variable contains the integer part of each letter. For example the number
29
is processed as follows:
29 / 10
that drives $how_many
to be 2
and the remaining to be a 9
;
9 / 5
that makes $how_many
to be 1
and the remaining to be a 4
;
4 / 1
that makes $how_many
to be 4
and there's nothing more to do.
At each step the roman letter and the
$how_many
value is inserted into the
$items
has, that in the above ecample becomes:
# 29 (XIX)
{ 'X' => 2,
'V' => 1,
'I' => 4
}
The reassemble
method
The
reassemble
method takes as input the hash produced by
disassemble
and checks if any letter requires a promotion. Here it is the code:
# Accepts an hash with keys the letters and values the number
# of times each letter should appear.
# Traverse the hash from the smaller to the greater
# in order to "promote" smaller aggregates. For instance
# 'IIII' (4) is aggregated and therefore the hash is modified
# so there's only an 'I' and another 'V', in such case
# the quantity of the promoted letter is negative to indicate
# it has been promoted.
my $reassemble = sub{
my ( $items ) = @_;
my @sorted_thresolds = sort { $a <=> $b } keys $roman->%*;
for ( my $i = 0; $i < @sorted_thresolds; $i++ ){
my $current_value = $sorted_thresolds[ $i ];
my $key = $roman->%{ $current_value };
my $how_many = $items->%{ $key };
next unless ( $how_many );
my $greater_value = ( $i + 1 > @sorted_thresolds ? 1000 : $sorted_thresolds[ $i + 1 ] );
my $greater_key = $roman->%{ $greater_value };
my $need_to_promote = $how_many == 4
|| ( $greater_value / $current_value == $how_many );
if ( $need_to_promote ){
$items->{ $greater_key }++;
$how_many = $greater_value - $how_many * $current_value;
$items->{ $key } = $how_many * -1;
}
}
return $items;
};
The
promotion must be done from the smaller letter to the greater one, so this time the letters are walked in ascending order (i.e.,
sort
has
$a
first!). Since to promote a letter I need to access the following one, I need a C-style
for
loop.
A letter requires to be promoted if its quantity is
4
or /it is 2 and the right bigger value is exactly the double of the current one~, that is while
( $greater_value / $current_value == $how_many )
. This makes, for instance
IIII
to be promoted (the quantity is 4), and
VV
to be promoted into
X
(because the quantity is 2 and the
X
is exactly the double of
V
).
The promotion manipulates the hash increasing by one the right bigger
letter and leaving a single current letter. In order to flag the
promoted letter, I decided to use a negative quantity (where the
absolute value is the exact one).
So for instance, the 29 hash of the previous paragraph is passed as follows:
# input to the method
{ 'X' => 2,
'V' => 1,
'I' => 4
}
# first for step (I)
{ 'X' => 2,
'V' => 2,
'I' => -1 # promoted, keep 1 and increase 'V'
}
# second step (V)
{ 'X' => 3,
'V' => 0, # promoted, increase X by one
'I' => -1
}
At the end of method we know the final string will be made by three
X
and one
I
, the point now is to understand how to render them in the correct order. This is the aim of the
roman_string
method.
The roman_string
method
The method accepts the normalized hash (i.e., groups are already
formed) and compose the final string placing letter on the left or the
right of each other depending on their quantity. The following is the
code of the method:
# Do the hard work of composing
# each group of letters in order to compose the roman string.
my $roman_string = sub {
my ( $items ) = @_;
my @chars;
for my $current_value ( sort { $b <=> $a } keys $roman->%* ){
my $letter = $roman->%{ $current_value };
my $how_many = $items->%{ $letter };
next unless ( $how_many );
if ( $how_many > 0 ){
push @chars, $letter for ( 1 .. $how_many );
}
else{
# this is promoted, so it has to be inserted as last-to-last
# in the previous chain
# example: @chars( X, X ) and here I've 'I' to make XIX (19)
push @chars, ( $letter, pop @chars );
}
}
return join "", @chars;
};
In order to be able to manipulate easily the final string, moving
letters from left to right and vice-versa, I decided to place each
single letter into the
@chars
array, that is then
join
-ed into a single string.
Let's suppose we need just to add letters: in this case we need to
write letters from the greater to the smaller from left to right, and
this is the order I traverse the letters of
$roman
(again, note that
sort
has
$b
first!). If the quantity of the letter is positive
the letter has not been promoted and therefore it will not be placed to the left of another letter, so just insert into
@chars
the
$letter
for the
$how_many
quantity. On the other hand, if
$how_many
is negative, the letter has been promoted and therefore have to be
printed on the left of the last printed letter. This is as easy as
doing:
push @chars, ( $letter, pop @chars );
that inserts into
@chars
the
$letter
and the previous last character that has been removed via
pop
.
With regards to the previous example of
29 we have that:
# method input
{ 'X' => 3,
'I' => -1
}
# first step: prints X
# with quantity 3 (not promoted)
@chars = ( 'X', 'X', 'X' );
# second step: prints I
# that has been promoted
# and must be inserted ascending
# as last-to-last
@chars = ( 'X', 'X' ,
( 'I', # $letter
'X' # pop @chars
) );
Conclusions
Well, it has been much code that I expected to write. Using an object
notation, instead of plain hashes, could surely make the program more
robust. I'm pretty sure there's a way to shrink the code down and to
avoid that ugly C-style
for
loop, as well as the promotion
part could be simplified keeping in mind that it often reduces to -1 for
the current letter and +1 for the greater one. Anyway, it does what I
need and seems correct!