DIKU - Hello world in Java bytecode
The next milestone for my language is being able to compile down to target the JVM.
To do this, I’ve been trawling through Java class file documentation, looking to get a better handle on the file layout.
The following is presented as a reference for anyone attempting to do the same.
Oracle’s official Java class file format documentation can be viewed here.
Below, I have worked to annotate the class file generated by javac HelloWorld.java
:
Magic Number cafe babe
Minor Version Number 0000
Major Version Number 0034 // Java SE 8
Constant Pool Count 001d // How many constants? - (+1 by design)
1 Method Reference 0a // java/lang/Object."<init>":()V
# 6 00 06
# 15 00 0f
2 Field Reference 09 // java/lang/System.out:Ljava/io/PrintStream;
# 16 0010
# 17 0011
3 String Reference 08 // Hello, world!\n
# 18 00 12
4 Method Reference 0a // java/io/PrintStream.print:(Ljava/lang/String;)V
# 19 0013
# 20 0014
5 Class Reference 07 // HelloWorld
# 21 00 15
6 Class Reference 07 // java/lang/Object
# 22 0016
7 String Reference 01
Length 00 06
String UTF8 3c 696e 6974 3e // <init>
8 String Reference 01
Length 0003
String UTF8 2829 56 // ()V
9 String Reference 01
Length 0004
String UTF8 436f 6465 // Code
10 String Reference 01
Length 00 0f
String UTF8 4c 696e 654e 756d 6265 7254 6162 6c65 // LineNumberTable
11 String Reference 01
Length 00 04
String UTF8 6d 6169 6e // main
12 String Reference 01
Length 0016
String UTF8 285b 4c6a 6176 612f 6c61 6e67 2f53 7472 696e 673b 2956 // ([Ljava/lang/String;)V
13 String Reference 01
Length 00 0a
String UTF8 53 6f75 7263 6546 696c 65 // SourceFile
14 String Reference 01
Length 000f
String UTF8 4865 6c6c 6f57 6f72 6c64 2e6a 6176 61 // HelloWorld.java
15 Name & Type 0c // "<init>":()V
# 7 0007
# 8 0008
16 Class Reference 07 // java/lang/System
# 23 00 17
17 Name & Type 0c // out:Ljava/io/PrintStream;
# 24 0018
# 25 0019
18 String Reference 01 // java/io/PrintStream
Length 00 0e
String UTF8 48 656c 6c6f 2c20 776f 726c 6421 0a // Hello, world!\n <- This string is printed
19 Class reference 07 // java/io/PrintStream
# 26 001a
20 Name & Type 0c // print:(Ljava/lang/String;)V
# 27 00 1b
# 28 00 1c
21 String Reference 01
Length 000a
String UTF8 4865 6c6c 6f57 6f72 6c64 // HelloWorld
22 String Reference 01
Length 00 10
String UTF8 6a 6176 612f 6c61 6e67 2f4f 626a 6563 74 // java/lang/Object
23 String Reference 01
Length 0010
String UTF8 6a61 7661 2f6c 616e 672f 5379 7374 656d // java/lang/System
24 String Reference 01
Length 00 03
String UTF8 6f 7574 // out
25 String Reference 01
Length 00 15
String UTF8 4c 6a61 7661 2f69 6f2f 5072 696e 7453 7472 6561 6d3b // Ljava/io/PrintStream;
26 String Reference 01
Length 00 13
String UTF8 6a 6176 612f 696f 2f50 7269 6e74 5374 7265 616d // java/io/PrintStream
27 String Reference 01
Length 00 05
String UTF8 70 7269 6e74 // print
28 String Reference 01
Length 00 15
String UTF8 28 4c6a 6176 612f 6c61 6e67 2f53 7472 696e 673b 2956 // (Ljava/lang/String;)V
Access Flags 0021 // Public + superclass
This Class 0005 // Reference to Class name in Constant pool
Super Class 0006 // 0 or super class of this class (ie object)
Interface Count 0000 // How many interfaces?
Field Count 0000 // How many superinterfaces of this class?
Method Count 0002 // How many method_info structures in the methods table
METHOD 1 // Constuctor
Access Flags 0001 // Public
Name Index 0007 // <Init>
Descriptor Index 0008 // ()V
Attributes Count 0001
ATTRIBUTE STRUCTURE
Attribute Name Index 0009 // Code
Attribute Length 0000 001d
Max Stack 0001
Max Locals 0001
Code Length 0000 0005
aload_0 2a
invokespecial b7
# 1 0001
Return b1
Exception Table Length 00 00
Attributes Count 00 01 // This code attribute has one attribute
ATTRIBUTE STRUCTURE
Attribute Name Index 00 0a // LineNumberTable
Attribute Length 00 0000 06
Line Number Table Len.00 01 // One entry
Start PC 00 00
Line Number 00 01
METHOD 2 // Main
Access Flags 00 09 // Public Static
Name Index 00 0b // main
Descriptor Index 00 0c // ([Ljava/lang/String;)V
Attributes Count 00 01
ATTRIBUTE STRUCTURE
Attribute Name Index 00 09 // Code
Attribute Length 00 0000 25
Max Stack 00 02
Max Locals 00 01
Code Length 00 0000 09
Get static b2
# 2 0002
ldc 12
# 3 03
Invokevirtual b6
# 04 00 04
Return b1
Exception Table Length 0000
Attributes Count 0001 // This code attribute has one attribute
ATTRIBUTE STRUCTURE
Attribute Name Index 000a // LineNumberTable
Attribute Length 0000 000a
Line Number Table Len.0002
Start PC 0000
Line Number 0003
Start PC 0008
Line Number 0004
Attributes Count 0001
ATTRIBUTE STRUCTURE
Attribute Name Index 000d // SourceFile
Attribute Length 0000 0002
Sourcefile Index 000e // HelloWorld.java
Running through the above, it becomes clear (if there was ever any doubt) that the Java compiler and the language grammar is much more complex than that of this project.
Method names, strings and other variables are stored in the constant pool, combined and the pushed/popped from a stack as the methods are run.
Moving forward, we will look to have our compiler generate the above file programatically.
I partiularly like the magic number at the start of these class files - CAFE BABE
:
“We used to go to lunch at a place called St Michael’s Alley. According to local legend, in the deep dark past, the Grateful Dead used to perform there before they made it big. It was a pretty funky place that was definitely a Grateful Dead Kinda Place. When Jerry died, they even put up a little Buddhist-esque shrine. When we used to go there, we referred to the place as Cafe Dead. Somewhere along the line it was noticed that this was a HEX number. I was re-vamping some file format code and needed a couple of magic numbers: one for the persistent object file, and one for classes. I used CAFEDEAD for the object file format, and in grepping for 4 character hex words that fit after “CAFE” (it seemed to be a good theme) I hit on BABE and decided to use it. At that time, it didn’t seem terribly important or destined to go anywhere but the trash-can of history. So CAFEBABE became the class file format, and CAFEDEAD was the persistent object format. But the persistent object facility went away, and along with it went the use of CAFEDEAD - it was eventually replaced by RMI.”
James Gosling