August 22, 2015

Creating AndroidDeviceNames

A few days ago, I open sourced AndroidDeviceNames, a tiny Android library that transforms the device model into a user-friendly name.

For example, SM-N910W8 becomes Samsung Galaxy Note 4 with a simple method call:

DeviceNames.getDeviceName("SM-N910W8", "Unknown Device");

Currently, the library recognises about 400 devices. In case the device is not on the list, the above example returns Unknown Device.

The idea for the library came while working on a side project where I need to show the device name to the user. Obviously, showing SM-N910W8 is subpar. Searching for a solution, all I could find online was a somewhat obsolete project with a single .properties file containing model = name pairs. I guess the idea is to load the file into a Properties object.

I don’t particularly like this. My side project manipulates bitmaps regularly and I’d like to avoid any unnecessary memory overhead. Since Properties extends Hashtable, overhead is expected. Here’s the hprof dump:

It finds the model name in ~O(1) time but takes a whopping 59 kB of memory to do it. In grand scheme of things, this is no problem for Android, but it can be improved. For instance, we could read the file, line by line and:

discard the pair and continue, if it’s not what we’re looking for, or
save it to a database, or
remember the pair in a data structure with hopefully less memory overhead than Properties

Option 1. will only allocate a constant chunk of memory, i.e. the buffer size, but there’s a high penalty to pay if we plan to search often and with different models, as we have to read the file over and over.

Option 2. is good, but messy. I’d have to be smart about database opening and closing, there’s suddenly more code involved, I might clutter my API, etc. Hopefully I can avoid it.

Could Map, ArrayMap or String[] perform better than Properties? Investigating option 3. produces the following:

Data Structure	Retained Heap (kB)	Search time
`Map`	62	`~O(1)`
`ArrayMap`	53	`O(log)`
`String[]`	51	`O(n)`
`String[], String[]`	51	`O(n)`

Besides a nice insight in memory vs. time tradeoff, there’s still quite a bit of memory involved. It does seem an ArrayMap represents the middle ground between memory and performance, but watch this before jumping to conclusions.

Can things be improved if we don’t need to be as flexible? For example, if the model-name pairs are embedded in a class as constants and reading a file is not required? Well, it makes quite an impact:

Data Structure	Retained Heap (kB)	Search time	Factor
`Map`	14	`~O(1)`	4
`ArrayMap`	5	`O(log)`	10
`String[]`	3	`O(n)`	17
`String[]`,`String[]`	3	`O(n)`	17

If our goal is to save memory, we should definitely avoid reading files. The factor column shows that, for example, String[] implementation with constants consumes 17x less memory than its file-reading counterpart.

So how do I effectively move the model-name pairs from a file to Java constants, because I’m sure as hell not writing them by hand?

It took me a few minutes to write a simple Python script that reads the model-name file and creates a populated Java data structure for me to copy and paste. Sure, more involved than plain Java, but straightforward.

Sacrificing a bit of flexibility for a memory efficient implementation, I could stop here. But would it be possible to eliminate memory overhead completely? This was more of an exploration question, born not out of necessity, but rather curiosity.

I’d have to find a way to eliminate the data structure holding the model-name pairs. A naive but workable approach could be the following:

if ("ASUS_Transformer_Pad_TF700T".equals(model)) {
    return "ASUS Transformer Pad TF700T";
} else if ("ASUS_T00J".equals(model)) {
    return "Asus ZenFone 5";
} else if ("ADR6350".equals(model)) {
    return "HTC Droid Incredible 2";
} // etc.

Ugly, but with no memory overhead and an okay search performance of O(n).

I have to be careful though. When a .class file is loaded, it resides in the method area of a JVM. The method area is logically part of the heap, so it’s futile trying to avoid overhead if the bytecode occupies the same amount of heap (or more) than a data structure would.

A neat way to look at bytecode is javap. For example, on Java(TM) SE Runtime Environment (build 1.7.0_79-b15), compiling

public class Hello {
    public static void main(String[] args) {
        System.out.println("Hello World");
    }
}

then invoking javap -c Hello produces

Compiled from "Hello.java"
public class Hello {
  public Hello();
    Code:
       0: aload_0       
       1: invokespecial #1
       4: return        

  public static void main(java.lang.String[]);
    Code:
       0: getstatic     #2                 
       3: ldc           #3
       5: invokevirtual #4
       8: return        
}

Each bytecode instruction consists of 1 byte, plus a fixed number of operand bytes. -c flag ensures the output includes byte offsets for each instruction. Therefore, the method size is roughly the same as the last byte offset. Above, main method size is 8 bytes, as this is the offset at the last instruction: 8: return.

Extending the Python script, I now generate a simple Java method, getDeviceName, which includes the if-else comparisons. I need to ensure that:

the generated method size mG is smaller then the method size mD querying a data structure plus the total data structure memory, sD.
mG size is less than 65536 bytes

Benchmarking against the least consuming data structure, String[] gives

File	Bytecode	Bytecode Size	Memory	Total
`String[]`	bytecode	75 + 6233	3416	9724
`Generated`	bytecode	5691	0	5691

The generated file is a clear winner and passes the two requirements with flying colors. The only thing left to worry about is the search time. Being O(n), it might prove to be unsatisfactory.

A quick and simple optimization we can do right away is simple branching: taking the first model letter and compare it only against the models with the same starting letter. We’re still stuck in O(n), but for example, instead of 361 comparisons to get to SPH-D600, we now only do 61, which is about 6 times less.

Empirically, I’d like to see the worst case run below 16 milliseconds. And playing around with Traceview, the average running time is well below 5 milliseconds, which makes me at ease using the generated .java file in my side project.

← → Top