I've been putting off posting some of these thoughts for a long while (longer than this site has been up, really), wanting to make it sound just right. But better is the enemy of good, and the lack of traffic allows me to make mistakes. So in that aspect, I want to talk about Hungarian Notation, and variable naming in code.
Hungarian notation is something that is pretty polarizing. The hungarian notation that is known an loathed is familiar to any Win32 coder appears like this:
int APIENTRY WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, DWORD iCmdShow)
The letters at the front are the notation. H is for handle. L is long (32 bit), P is pointer, STR is string. i is an integer. It's a design pattern for keeping standardized variable information in the name itself. So far, it's okay. But the downside becomes very obvious with the other most common function prototype:
LRESULT CALLBACK WinProc (HWND hwnd, UINT msg, WPARAM wParam, LPARAM lParam);
L is long (32 bit), W is word (16 bit), except that in reality, both of these are 32 bit, and for 64-bit Windows, it's 64 bits. This is because function prototypes can't evolve with the processors without breaking API compatibility. It's confusing, misleading, and an entire mess. So storing details like the variable size in the variable (and datatype) name is a disaster waiting to happen.
This wasn't the original Hungarian notation. The little letters were to display the intent of the variable, not the data type. In that light, the concept is rather respectable. But there's still one problem. For example:rwMax
That variable means maximum row. It doesn't reveal the integer nature of the variable, but the intent, which is good. Just as long as you know the secret code of rw means row, not read/write or a Real stuffed into a Word.
Now, let's enter the world of Cocoa. Cocoa's variable and function naming has its own flavor (pun not intended), and it's been documented a few times. The focus has always been:
- Full words.
- Verbose.
- A specific order of naming.
- Prefixes to avoid naming conflicts.
And Cocoa code is quite verbose. I actually like it that way. Win32 would never have a function prototype such as: + (void)setKeys:(NSArray *)keys triggerChangeNotificationsForDependentKey:(NSString *)dependentKey;. It would have been named something like CObject::SetKeyCascade(array<HKEY> src_arry, HKEY dst_key);
So what of Hungarian? Good variable names serve a purpose in answering some of the same questions a journalist asks: who, what, where, how and why.
- Who: the creator of the code. For frameworks, the two-to-three letter prefix convention works. NSapplication. CFarray.
- What: Not the variable type, but its purpose. For example, index, count, offset are all different purposes for an integer.
- How: Here is where I'd classify the variable type. Integer, dictionary, and window are obvious examples, although I'd also include the Cocoa conventions of using a plural to imply an array, and the verbs "is" and "has" as a prefix to indicate a BOOL.
- Why: Of course, these are all somewhat nebulous, as you could argue that variable types are really what, etc. But if how tells us it's an integer, and what tells us it's an index, why tells us the reason we have an integer index. It's an integer index of fields, for example. What, how, and why are all the most common elements in a variable name.
- Where: This is the intended scope or origin of the variable data. The most common example, in both C++ and Obj-C, is to use an underscore prefix to indicate instance/object variables. I suppose that Cocoa's habit of member functions and variables lower cased, while globally accessible functions and class names capitalized could also fit in. And when setting an instance variable, the argument passed in is prefixed with the word "new".
But I'd like to add to this. I've noticed that words like the, our, this, and current also have a scope-defining purpose. If we talk about theFileManager, we know it's a global singleton. Inside a function, we talk about our variables, values exclusive to this function call. Inside a small loop, where we travel over an array or dictionary that are all ours, we point to one, and say it's this variable, or the current value.
I want to stress that "where" describes the intended scope, not the actual one. Despite our pointer's local scope, we still use the when talking about the singleton. And when looping through data, a variable that points to the current data might be allocated and deallocated outside the loop for speed reasons, yet "this" or "current" should still be used.
So here's how the variable names are composed: Who or Where, Why, What, How.
NSObjectEnumerator: Who is NextStep, Why we care is because we want objects, and what it does is enumerate.
ourMailboxes: Where is local, so it's ours. Why we care is to store mailboxes. How it does this is an array.
thisMailbox: A single mainbox, probably used in a loop going through ourMailboxes above.
ourMailboxesEnumerator: Our is local, and how is that it's an NSObjectEnumerator. Adding to the end turns our previous how into a what. Best of all, initialization would be straightforward.ourMailboxesEnumerator = [ourMailboxes objectEnumerator];
I'm still trying to hammer out details and better solidify some details. I've used this naming style in my own code, and the style has evolved over time, resulting in inconsistencies. In some areas, I've got variables prefixed with "our" when "this" would be better. In other areas, I use both the plural and the data type. ourMailboxesArray, ourMailboxesDict. And it can lead to large variable names. These come from my current version-tracking project. In it, I have to do a lot of tracking of each revision of a file in the repository.
fileRevisionsToEnter
There are two storage formats for file revisions. The first is a core data managed object class called BTHFileRevisionMO, which itself uses the naming technique (BTH is who, File Revision is why, and MO is how in the form of a Managed Object). But since core data is finicky about multithreading, the code builds NSDictionary objects in a background thread instead, only using BTHFileRevisionMO in the main thread. So to differentiate, fileRevisionObject implied the MO, while fileRevisionToEnter was the Dict.
This is the entire collection of dicts to enter. It's an instance variable, so no our, this, or the. However, the name doesn't tell that the variable is a Dict, with keys of file names, and each entry was an array of fileRevisions (dicts) for that file name.
fileRevisionsToEnterForThisRow
This was a loop variable, but I didn't prepend "this". It's an entry in fileRevisionsToEnter for the key stored in "thisRow". "thisRow" is a loop variable, naturally. As a result, fileRevisionsToEnterForThisRow already has "this". Furthermore, it's an dictionary, but has fileRevisions, which would be for just arrays. A smaller loop goes through this dictionary, making the lack of "this" even more useful. In the inner loop, there is the codethisFileRevisionEntry = [fileRevisionsToEnterForThisRow objectForKey:thisVersionName];
thisFutureEntriesArrayCount
An integer that is a count, counting the entries in thisFutureEntriesArray. Of course, thisFutureEntriesArray is an array of future entries, which I mentioned before, is a dictionary. And all of it is done within a loop.
I still don't know what to call this naming convention. Verbose Hungarian is one possible one, or Englisharian, just to wreck perfectly good words. And like all good designs, it should only be used where appropriate, as it could make the code unreadable. To avoid a sea of this, our, and current flooding the code, it's best used in areas that can be potentially confusing, lest we forget where, why, how and what. But it is another tool in our vast collection of design patterns, and I offer it out as such.

Comments (1)
Your brain pwns.
Also, I have Viagra for sale if interested. It will overclock your "processor."
Posted by Robin | July 18, 2007 3:42 PM
Posted on July 18, 2007 15:42