Jim Hague’s code, unraveling the mystery

Renato Leon
4 min readMar 9, 2021

--

There is a time in the life of a C programer when has to face MACROS, the understanding of them could be useful and sure it is, that’s why today I’m gonna explain an example code created by Jim Hague that uses MACROS to create a C program.

For starters let’s take a look at the code(hague.c).

#define DIT (
#define DAH )
#define __DAH ++
#define DITDAH *
#define DAHDIT for
#define DIT_DAH malloc
#define DAH_DIT gets
#define _DAHDIT char
_DAHDIT _DAH_[]="ETIANMSURWDKGOHVFaLaPJBXCYZQb54a3d2f16g7c8a90l?e'b.s;i,d:"
;main DIT DAH{_DAHDIT
DITDAH _DIT,DITDAH DAH_,DITDAH DIT_,
DITDAH _DIT_,DITDAH DIT_DAH DIT
DAH,DITDAH DAH_DIT DIT DAH;DAHDIT
DIT _DIT=DIT_DAH DIT 81 DAH,DIT_=_DIT
__DAH;_DIT==DAH_DIT DIT _DIT DAH;__DIT
DIT'\n'DAH DAH DAHDIT DIT DAH_=_DIT;DITDAH
DAH_;__DIT DIT DITDAH
_DIT_?_DAH DIT DITDAH DIT_ DAH:'?'DAH,__DIT
DIT' 'DAH,DAH_ __DAH DAH DAHDIT DIT
DITDAH DIT_=2,_DIT_=_DAH_; DITDAH _DIT_&&DIT
DITDAH _DIT_!=DIT DITDAH DAH_>='a'? DITDAH
DAH_&223:DITDAH DAH_ DAH DAH; DIT
DITDAH DIT_ DAH __DAH,_DIT_ __DAH DAH
DITDAH DIT_+= DIT DITDAH _DIT_>='a'? DITDAH _DIT_-'a':0
DAH;}_DAH DIT DIT_ DAH{ __DIT DIT
DIT_>3?_DAH DIT DIT_>>1 DAH:'\0'DAH;return
DIT_&1?'-':'.';}__DIT DIT DIT_ DAH _DAHDIT
DIT_;{DIT void DAH write DIT 1,&DIT_,1 DAH;}

Wait, what? What is this?, well, this is the program that we’ll unravel, let’s go step by step, we can see that the program defines MACROS at the top to replace instruccions and reserved words such as “char”. We know that MACROS are replaced by the preprocesor by the token they are asigned at it’s definition, so we can generate the preprocessed output of this code to see how it looks like and hopefully it gives us a better understanding of it self.

gcc -E hague.c -o hague.i

And we end up with the preprocessed code, let’s format it to make it more understandable:

# 1 "hague.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "hague.c"
# 9 "hague.c"
char _DAH_[]="ETIANMSURWDKGOHVFaLaPJBXCYZQb54a3d2f16g7c8a90l?e'b.s;i,d:";main()
{
char *_DIT,*DAH_,*DIT_,*_DIT_,*malloc(),*gets();
for (
_DIT = malloc(81), DIT_ = _DIT++;
_DIT == gets(_DIT);
__DIT('\n')
)
for (
DAH_ = _DIT;
*DAH_;
__DIT(*_DIT_? _DAH(*DIT_ ): '?'), __DIT(' '), DAH_++
)
for (
*DIT_ = 2,_DIT_ = _DAH_;
*_DIT_&&(*_DIT_!=(*DAH_>='a'? *DAH_&223: *DAH_ ));
(* DIT_ ) ++,_DIT_++
)
*DIT_+= (*_DIT_ >= 'a' ? * _DIT_-'a': 0);
}
_DAH ( DIT_ )
{
__DIT (DIT_>3? _DAH( DIT_>>1 ): '\0');
return DIT_&1? '-': '.';
}
__DIT ( DIT_ ) char DIT_;{
(void) write(1, &DIT_, 1 );
}

If we examine the code, we can see that there is an array variable holding a string of chars, what could it be?, taking a look at the entire program, we can see 3 fucntions: main, _DAH and __DIT, mhhh, what could this functions do?, let’s examine them:

  • Starting with the function _DAH, we can see that is a recursive function, and also by the dashes and dots on the return statement, we can asume that it transforms a given char into its morse representation.
  • The next function __DIT, seems easier to guess, it only prints a char.
  • Finally the main function has 3 nested loops, the first one allocates memory, waits for the user’s input and prints a new line in each iteration, the second, iterates trough the user’s input and prints its morse code or ‘?’ if a char is not recognized and also prints a space and moves to the next char, the final loop helps mapping the user’s input to the array of strings defined in _DAH_.

Now that we have saw the code by its surface, let’s test if we are right by compiling and executing the program, so, to compile we run the following:

gcc haguer.c -o h

Then we end up with an executable named “h”, if we execute that file, we’ll get an empty screen waiting for input, so we type something to test it, in our case we can type “abc” and see if the program gives us back the corresponding morse code:

$./h
abc
.- -... -.-.

Wow!!, looks like it does what we assummed it did, but to be sure we can make a quick search and see if the output matches each char with it’s corresponding morse code.

morse code equivalences

It matches!!, so it seems like we were right about the code, but what was the string at the start about?. Well, searching deeper into morse code we can find something know as dichotomic search, and it’s represented by a tree that looks like this.

Graphical representation of the dichotomic search table. The graph branches left for each dot and right for each dash until the character representation is exhausted. (image taken from wikipedia)

Paying close attention to each node, we can see that if we take each value top to bottom left to right and we put them together, we end up with:

ETIANMSURWDKGOHVFÜLÄPJBXCYZQÖCH … You get the point.

Comparing this, to the value that holds the array we can confirm that is the same(avoiding special chars), so this helps us be sure that the code at the end gives us back a morse representation of a string and uses this kind of mapping to achieve it.

This was a simple way of de-obfuscating Jim Hague’s code, hope this helped you in some way, take care, and see you on the next post!.

--

--

No responses yet