The next milestone for my language is being able to compile down to target the JVM.

To do this, I’ve been trawling through Java class file documentation, looking to get a better handle on the file layout.

The following is presented as a reference for anyone attempting to do the same.

Oracle’s official Java class file format documentation can be viewed here.



public class HelloWorld {
    public static void main(String[] args) {
        System.out.print("Hello, world!\n");
    }
}

Below, I have worked to annotate the class file generated by javac HelloWorld.java:

  Magic Number            cafe babe  
  Minor Version Number    0000  
  Major Version Number    0034    // Java SE 8
  Constant Pool Count     001d    // How many constants? - (+1 by design)
1 Method Reference        0a      // java/lang/Object."<init>":()V  
  # 6                     00 06     
  # 15                    00 0f  
2 Field Reference         09      // java/lang/System.out:Ljava/io/PrintStream;  
  # 16                    0010   
  # 17                    0011  
3 String Reference        08      // Hello, world!\n  
  # 18                    00 12  
4 Method Reference        0a      // java/io/PrintStream.print:(Ljava/lang/String;)V  
  # 19                    0013  
  # 20                    0014  
5 Class Reference         07      // HelloWorld  
  # 21                    00 15  
6 Class Reference         07      // java/lang/Object  
  # 22                    0016  
7 String Reference        01    
  Length                  00 06  
  String UTF8             3c 696e 6974 3e    // <init>  
8 String Reference        01   
  Length                  0003  
  String UTF8             2829 56            // ()V  
9 String Reference        01  
  Length                  0004  
  String UTF8             436f 6465          // Code  
10  String Reference      01  
  Length                  00 0f  
  String UTF8             4c 696e 654e 756d 6265 7254 6162 6c65   // LineNumberTable  
11  String Reference      01  
  Length                  00 04  
  String UTF8             6d 6169 6e         // main  
12  String Reference      01  
  Length                  0016  
  String UTF8             285b 4c6a 6176 612f 6c61 6e67 2f53 7472 696e 673b 2956  // ([Ljava/lang/String;)V  
13  String Reference      01  
  Length                  00 0a  
  String UTF8             53 6f75 7263 6546 696c 65     // SourceFile  
14  String Reference      01  
  Length                  000f  
  String UTF8             4865 6c6c 6f57 6f72 6c64 2e6a 6176 61   // HelloWorld.java  
15  Name & Type           0c                // "<init>":()V   
  # 7                     0007  
  # 8                     0008  
16  Class Reference       07                // java/lang/System  
  # 23                    00 17  
17  Name & Type           0c                // out:Ljava/io/PrintStream;  
  # 24                    0018  
  # 25                    0019  
18  String Reference      01                // java/io/PrintStream  
  Length                  00 0e  
  String UTF8             48 656c 6c6f 2c20 776f 726c 6421 0a // Hello, world!\n  <- This string is printed  
19  Class reference       07      // java/io/PrintStream  
  # 26                    001a  
20  Name & Type           0c      // print:(Ljava/lang/String;)V  
  # 27                    00 1b  
  # 28                    00 1c  
21  String Reference      01  
  Length                  000a  
  String UTF8             4865 6c6c 6f57 6f72 6c64  // HelloWorld  
22  String Reference      01  
  Length                  00 10  
  String UTF8             6a 6176 612f 6c61 6e67 2f4f 626a 6563 74  // java/lang/Object  
23  String Reference      01  
  Length                  0010  
  String UTF8             6a61 7661 2f6c 616e 672f 5379 7374 656d   // java/lang/System  
24  String Reference      01  
  Length                  00 03  
  String UTF8             6f 7574   // out  
25  String Reference      01  
  Length                  00 15  
  String UTF8             4c 6a61 7661 2f69 6f2f 5072 696e 7453 7472 6561 6d3b  // Ljava/io/PrintStream;  
26  String Reference      01  
  Length                  00 13  
  String UTF8             6a 6176 612f 696f 2f50 7269 6e74 5374 7265 616d  // java/io/PrintStream  
27  String Reference      01  
  Length                  00 05  
  String UTF8             70 7269 6e74        // print  
28  String Reference      01  
  Length                  00 15  
  String UTF8             28 4c6a 6176 612f 6c61 6e67 2f53 7472 696e 673b 2956  // (Ljava/lang/String;)V  

Access Flags              0021    // Public + superclass  
This Class                0005    // Reference to Class name in Constant pool  
Super Class               0006    // 0 or super class of this class (ie object)  
Interface Count           0000    // How many interfaces?  
Field Count               0000    // How many superinterfaces of this class?  
Method Count              0002    // How many method_info structures in the methods table  

METHOD 1 // Constuctor  
Access Flags              0001    // Public  
Name Index                0007    // <Init>  
Descriptor Index          0008    // ()V  
Attributes Count          0001  
  ATTRIBUTE STRUCTURE  
  Attribute Name Index    0009    // Code  
  Attribute Length        0000 001d 
  Max Stack               0001 
  Max Locals              0001
  Code Length             0000 0005  
aload_0                   2a  
invokespecial             b7  
# 1                       0001  
Return                    b1  

  Exception Table Length  00 00      
  Attributes Count        00 01   // This code attribute has one attribute
    ATTRIBUTE STRUCTURE 
    Attribute Name Index  00 0a   // LineNumberTable
    Attribute Length      00 0000 06
    Line Number Table Len.00 01   // One entry
      Start PC            00 00
      Line Number         00 01  

METHOD 2 // Main  
Access Flags              00 09   // Public Static  
Name Index                00 0b   // main  
Descriptor Index          00 0c   // ([Ljava/lang/String;)V  
Attributes Count          00 01  
  ATTRIBUTE STRUCTURE  
  Attribute Name Index    00 09   // Code  
  Attribute Length        00 0000 25 
  Max Stack               00 02
  Max Locals              00 01
  Code Length             00 0000 09  
Get static                b2  
# 2                       0002  
ldc                       12  
# 3                       03   
Invokevirtual             b6  
# 04                      00 04  
Return                    b1  
                          
  Exception Table Length  0000 
  Attributes Count        0001    // This code attribute has one attribute
    ATTRIBUTE STRUCTURE
    Attribute Name Index  000a    // LineNumberTable
    Attribute Length      0000 000a
    Line Number Table Len.0002
      Start PC            0000
      Line Number         0003
      Start PC            0008
      Line Number         0004

Attributes Count          0001
  ATTRIBUTE STRUCTURE
  Attribute Name Index    000d    // SourceFile
  Attribute Length        0000 0002
  Sourcefile Index        000e    // HelloWorld.java 


Running through the above, it becomes clear (if there was ever any doubt) that the Java compiler and the language grammar is much more complex than that of this project.

Method names, strings and other variables are stored in the constant pool, combined and the pushed/popped from a stack as the methods are run.

Moving forward, we will look to have our compiler generate the above file programatically.

I partiularly like the magic number at the start of these class files - CAFE BABE:

“We used to go to lunch at a place called St Michael’s Alley. According to local legend, in the deep dark past, the Grateful Dead used to perform there before they made it big. It was a pretty funky place that was definitely a Grateful Dead Kinda Place. When Jerry died, they even put up a little Buddhist-esque shrine. When we used to go there, we referred to the place as Cafe Dead. Somewhere along the line it was noticed that this was a HEX number. I was re-vamping some file format code and needed a couple of magic numbers: one for the persistent object file, and one for classes. I used CAFEDEAD for the object file format, and in grepping for 4 character hex words that fit after “CAFE” (it seemed to be a good theme) I hit on BABE and decided to use it. At that time, it didn’t seem terribly important or destined to go anywhere but the trash-can of history. So CAFEBABE became the class file format, and CAFEDEAD was the persistent object format. But the persistent object facility went away, and along with it went the use of CAFEDEAD - it was eventually replaced by RMI.”

James Gosling