Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Programming Books Media Operating Systems Java Book Reviews Technology BSD

Code Reading: The Open Source Perspective 464

nazarijo writes "You can usually tell someone who's been writing a lot of code by how they write code. That may sound like a tautology, but it's got a deeper meaning than that. What editor they use, what idioms they use to avoid common pitfalls, and what organization patterns they employ all tell you what kind of programmer you're meeting. When you first start writing code, so many things are inconsistent and just plain wrong that it's almost embarrassing. I know that when I look over older code that I've written I feel sheepish about it. But how do you grow as a programmer, and what really makes a good programmer beyond language familiarity?" Read on for Nazario's review of Code Reading: The Open Source Perspective, a book which attempts to instill deeper knowledge about programming than just "knowing how."
Code Reading: The Open Source Perspective
author Diomidis Spinellis
pages 499
publisher Addison-Weslet Longman
rating 7
reviewer Jose Nazario
ISBN 0201799405
summary A tour of large-scale development projects from code to organization

A few books are tackling this subject, including Coder to Developer and Programming Language Pragmatics. These books don't teach you much about a particular language in the way that an introductory text would. Instead, you grow as a skilled developer by studying them and learning from them. That's one of the key things that people are talking about lately, that to be a strong developer requires more than a working knowledge of a language. It requires a familiarity with the strengths, weaknesses, and core features of a language and the base libraries to be efficient.

Code Reading: The Open Source Perspective is one of these books in this small but growing library. In it, Diomidis Spinellis takes you through a large body of code and focuses on several languages, techniques, and facets of development that differentiate strong developers from weak ones. What I like about this book is how much it covers, how practical the information is, and how much Spinellis teaches you. You wont learn a language, which is the complaint of some people who read this book, but if you know one or two you'll be a better programmer.

Perhaps one of the most telling things about the book is that it draws heavily from NetBSD source code, and features over 600 examples to make the point. Examples are often annotated using NetBSD as a reference. This makes sense, because NetBSD is a large project that's relatively stable and mature. Everything from how to define a C structure consistently and sanely to UML diagrams and build systems are covered, making this truly a developer's book. However, even Windows and Mac OS X developers will benefit, despite the BSD focus.

Chapter 1 introduces some of the basic tenets of the book, namely that code is literature and should be read as such. All too often people only read code when they have a specific problem to solve or want to get an example of an API. Instead, if you read code frequently you'll always be learning things and improving your skills. Also, Spinellis discusses the lifecycle of code (including its genesis, maintenance, and reuse), which simply must be taken into account if code is to be good. Poorly skilled developers forget these things and just slap it together, never thinking ahead.

In Chapter 2, a number of concepts basic to any programming language are covered, including the basic flow-control units common to many languages. The book focuses on C, with additional coverage given using C++, Java, and a few other things thrown in for good measure. As such, these chapters -- in fact the whole book -- focuses on concepts common to these languages but absent in some other languages, like Scheme or LISP. One neat section is called "refactoring in the small." It illustrates the real value of the book nicely, in showing you various ways to organize your code and your thoughts for various effects. Oftentimes a book will only teach you one way (which doesn't always suit your needs), and Spinellis' examples do a nice job of escaping that trap, not just here but throughout the book.

Chapter 3, "Advanced C Data Types," focuses on some language-specific matters. These are pointers, structures, unions and dynamic memory allocation, things that most people who code in C may use but only some truly understand well. Again, a somewhat basic chapter, but useful nonetheless. Make sure you read it; chances are you'll learn a thing or two.

In Chapter 4, some basic data structures (vectors, matrices, stacks, queues, maps and hash tables, sets, lists, trees and graphs) are covered. This is an important chapter since it helps you see these structure in real-world use and also helps you understand when to chose one structure over another. While Knuth, CLRS, or other algorithms and data structures texts cover these, they often do so in isolation and at a theoretical level. While their coverage is short, it's to the point and usable by anyone with a modest understanding of C.

Chapter 5, "Advanced Control Flow," the last chapter that deals with actual programming information, is another useful one. Again, short but to the point, this chapter covers things like recursion, exceptions, parallelism, and signals, all topics that have warranted their own books (or major sections in other books) but which are covered in a single chapter here. Still, seeing them side-by-side and in the context of each other and in real-world use provides some justification for the compact presentation.

The remaining chapters of the book go well beyond a normal programming book and focus on projects. These chapters complement the first bunch nicely by focusing on the organization of your code and projects. Chapter 6 deals specifically with many of the commonly identified (but rarely taught) things like design techniques, project organization, build processes, revision control, and testing. A number of things that aren't covered include defining and managing requirements for a release and their specifications, basics on how to use autoconf and automake, and instead rips through a whole slew of topics quite quickly.

Chapter 7 is sure to be controversial for some people: it covers "Coding Standards and Conventions." Some people seem to be big fans of the "if it feels good, do it" style of programming, and instead of writing sane, usable code, what they produce is buggy and messy. This chapter teaches you tried and tested methods of naming files, indentation (and how to do so consistently using your editor to help), formatting, naming conventions (for variables, functions, and classes), as well as standards and processes. The style and standards are (as you would expect) based on NetBSD, which differ slightly from GNU and Linux standards, as well as commonly found Windows practices. However, I think you'll agree that the style is readable with minimal effort, and that goal, coupled to consistency, is paramount in any standard.

Chapter 8 introduces you to documentation, including the use of man pages, Doxygen, revision histories, and the like. Also included are hints at using diagrams for added value. One thing I don't like about this chapter is the opening quote, which sets a bad precedent. It blithely suggests that bad documentation is better than none, which is highly questionable. Misleading docs can be worse than no docs at all, since someone without docs will have to dig through the code in front of them to understand it. Someone with bad docs will rely on the docs and wonder what's broken when things go awry.

Chapter 9 focuses on code architecture, such as class hierarchies, module organization, and even core features like frameworks to chose. This chapter covers a lot of material, and is, despite its size, simply too terse on many of these subjects. It serves as a decent introduction, but doesn't go very far in some places, considering the importance of the material. However, like much of the book, it's a good introduction to the topics at hand.

Chapter 10 also features a lot of good things to know. Granted, you could pick them all up with a lot of hard work and scouring for information, but it's easier to have them presented to you in a cohesive format. The chapter discusses code reading tools, things that you use to help you dig around a large body of code. One you get over a few source files, even if you have well-organized code and interfaces, many changes can require that you inspect the data path. You can do this manually, or you can be assisted with tools. Tools like regular expressions, grep, your editor -- Spinellis shows you how to make use of all of them when you write code. A lot of tools I've never used (but have heard about) are featured, and their use is demonstrated, but of course many tools are simply ignored, focusing on popular ones that will work for most people.

Finally, all of the above is brought together in Chapter 11, "A Complete Example." A small tour of a large, complex piece of code is taken (34,000 lines of Java) as the author makes changes. It's unfortunately in Java, when so much of the book focused on C (why couldn't they have been consistent examples?), but it works. The example itself could have covered a few more things, such as a proper JUnit example, but overall I'm pleased with it.

Overall, Code Reading: The Open Source Perspective is ambitious and worthwhile, both as a complement to a bookshelf of study that includes The Practice of Programming and Design Patterns, and to someone who is growing tired of books on learning a language. At times it feels like the author promised more than he wound up delivering, but it serves as an introduction to a large number of topics. You wont learn a language, and you wont be able to get as much out of the book if you don't engage it with practice, but it's a useful book to get started on the road from being someone who knows a language or two to someone who is a developer, ready to contribute to a team and work on large projects. Never underestimate the skills required to be a good developer, because they go well beyond knowing how to use a language.


You can purchase Code Reading: The Open Source Perspective from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
This discussion has been archived. No new comments can be posted.

Code Reading: The Open Source Perspective

Comments Filter:
  • Re:comments (Score:3, Interesting)

    by nkh ( 750837 ) on Tuesday March 08, 2005 @06:38PM (#11882286) Journal
    with full knowledge that it WILL be useful someday.

    That's why you need Literate Programming! [stanford.edu] A very good book for all the family ;)
  • Amen (Score:2, Interesting)

    by MHobbit ( 830388 ) <mhobbit09.gmail@com> on Tuesday March 08, 2005 @06:39PM (#11882288)
    Amen.

    I've been coding for a couple of years now, and always have been using comments. Mind you, I only started using a lot of them when I started with PHP code and other open-source, interpreted languages.

    Right now, I'm still a better C coder than PHP, and you can tell by what approach I take to solve various things.
  • by Anonymous Coward on Tuesday March 08, 2005 @06:40PM (#11882303)
    One big reason for putting in comments is to define what your doing at that step in the code. Often having the comment followed by readble code helps catch logic errors and make it easier for someone less familiar with the language, libraries, or the project involved. This also means, the comments need to address the concept involved, not just saying in text what the code does.

    For example, don't say /* increment i by 1 */ when /* advance to the next record */ is conceptually better.
  • MOD PARENT UP (Score:2, Interesting)

    by guitaristx ( 791223 ) on Tuesday March 08, 2005 @06:45PM (#11882357) Journal
    Believe me, this is ever so often the truth. I can't count the number of times where I thought "WTF was this programmer thinking?" while maintaining legacy code, only to find out that the offending code, after tracing through the authors and check-in logs, was driven solely by a manager cracking the whip for a quick-and-dirty solution.
  • by Arimatheus ( 779497 ) on Tuesday March 08, 2005 @06:57PM (#11882478)
    Pseudocode.

    The other day I was looking at some code a peer of mine wrote and noticed a few places where there were comments with no code associated with them, he (like me and I'm sure a few others) builds the frame of his project and then fills in the code. I think it really helps to think the project through before hand, build your comments and then write.

    And oddly, you'll notice that a lot of the coders who do this know a large multitude of languages and probably developed this habit over the course of doing personal projects, it helps you quantifiably say how far along you are in the project, remember what direction you were taking with the project (because of course we NEVER let there go a ~3 week gap in personal projects :-P ) and allows you to look at the fundabmentals of a project without forcing yourself into a language before you can address where your language of choice might fail.

    At least that's the way I see it.
  • by TheSpoom ( 715771 ) * <{ten.00mrebu} {ta} {todhsals}> on Tuesday March 08, 2005 @07:07PM (#11882580) Homepage Journal
    Damn right. My new job involves making modifications to a Russian-developed PHP shopping cart, and while it's decently programmed in some places, others are *hellish* to try to work through and/or edit. /me keeps telling himself that it's all for the experience... ;^)
  • Re:Comments (Score:4, Interesting)

    by xee ( 128376 ) on Tuesday March 08, 2005 @07:08PM (#11882598) Journal
    There's some nice calculus lurking in there. Something about approaching a number from either side. Oh wait! You might be saying there's an ideal amount of commenting and in large numbers programmers tend to converge on that amount. Interesting. This would make a good research topic.
  • by Anonymous Coward on Tuesday March 08, 2005 @07:17PM (#11882685)
    The title says it all.
  • by msgregory@earthlink. ( 98641 ) on Tuesday March 08, 2005 @08:28PM (#11883416)
    I go the opposite route. I write the code and then a week later go in and write the comments. Doing it this way gives you time to get some perspective on the code, so when you go back to do the comments, you can get a sense of what's clear about the code and what isn't. So it not only helps you write more pertinent comments, it also forces you to think through the code again with a little emotional distance from it, which helps in finding flaws. Actually, it's best to read code and rewrite comments on a continual basis. You can find a lot of bugs by doing this.
  • Re:Coding style (Score:3, Interesting)

    by waveclaw ( 43274 ) on Tuesday March 08, 2005 @08:46PM (#11883587) Homepage Journal
    The problem with hard fast rules like that is they're frequently not right.

    While I'm sure you've seen McCabe Metrics and the evil that is KLOC, hard and fast rules like
    If a function has grown beyond a hundred lines of (real) code, it is almost certainly too large. If it has more than 4 levels of nesting it is too large.

    are actually well-tested and well-known observations about the complexity of software code when read and written. Remember that unlike a computer, which knowns only its current state, a human can only track so many multiple states at once.

    Take a state machine for example. A simple one with 6 or 7 states will go over 100 lines, and will go over 4 nestings.

    Ah, the switch based state machine. You remind me of my college days learning formal automata for AI and compilers. If I saw a huge switch statement in the middle of parser code, I can be sure of how little training and experience the coder had when he/she wrote that. There are alternatives and they are so much more elegant.

    Swtich-based state matchers were a poor design then, and are a poor design now. By using maps, state tables, or even hashes your state machine can be 50-100 lines with thousands of states and easily readable code. Of course, the documentation of state transitions becomes very important. This information will be forced from the switch/case statement labels and into secondary documentation, such as *gasp* pictures of your state transition network. So, your state names will not be readily visible in the code (unless you are very good at reading sparse matrices and compressed linked-lists.) The compact and simple table-based code will be easier to maintain than a monstrous switch.

    Take the state machine example. If it needs to wait on a semaphore before each iteration, the wait should be its own function- waiting on a semaphore is a logical operation. The logic for each state should NOT be separate functions, they are part of the state machine and make no sense without the whole of the machine for context.

    From formal automata theory we know this to be incorrect. It sounds right, but leads to statements such as the intermingling of different functions as proposed here. Each state is independent of all other states. Each state does have incoming transitions and outgoing transitions. These transitions are completely isolated from the internal facts of those state from which they originate or terminate.

    Mealy machines perform actions only on transition. Moore machines perform actions only while in a state. Hybrids exist that do some of one and some of another. Even simple automata such as those show can have no well-defined state- or transition-based actions. None of these require implementations in code that break the 100-line and 4-depth rule. All of them can be implemented with a table and a transition function containing *gasp* no switch statements.

    There are a slew of Computer Science courses that delve deeper into a topic such as state machines and code readability. (I'm certain they must be terribly boring and useless to be overlooked by so many professional coders.) But, state transition is just as well known as other logic operations like array index math and -dare I say- waiting on a semaphore. These courses cover a lot of techniques, but all that have implementations that benefit from the 100-line and 4-depth rule above.
  • by benja ( 623818 ) on Tuesday March 08, 2005 @09:18PM (#11883916)
    Umh -- your comment would make sense if the author had indeed written missing, rather than the neutral absent. I can't detect the bias in saying that C includes concepts LISP doesn't and vice versa.
  • Re:Code format (Score:1, Interesting)

    by Anonymous Coward on Tuesday March 08, 2005 @10:32PM (#11884460)
    Funny, the argument works better the other way:

    No way! I have been coding for 30 years...

    It makes more sense to code

    if (test)
    {
    statement1
    }
    else
    {
    statement2
    }

    because the eye can immediately see from the BRACES alone that a block of conditional code follows. Having the braces vertically aligned also marks the contained lines as special: flow control code. I find that

    if (test) {
    statement1
    } else {
    statement2
    }

    renders most control-flow logic completely unreadable since there's no white space to set off conditionals from other forms of code. White-space is cheap, if you're still using a 25-line green-phosphor terminal to edit code I highly suggest you look into retirement; let those crazy youngsters with 1600x1200 color bitmapped displays write the code (with all due respect).
  • by gidds ( 56397 ) <slashdot.gidds@me@uk> on Tuesday March 08, 2005 @11:28PM (#11884841) Homepage
    Agreed. Full Hungarian notation is an abomination. A variable's type has no business being in its name; not only does that preclude changing the type at some point, but the type is usually clear from the context anyway.

    Whether the scope should be there probably depends on the language and system. In C, like you I've found p_ (for function parameters), g_ (for system-wide globals) and m_ (for module-level variables) useful; also s_ (for 4GL screen fields). That way, you know where to look for the definition, so you can find anything else out there. (Local variables get no prefix.)

    In Java, though, I find scope is usually obvious. Methods tend to be short enough that parameters are usually obvious; class names should start in upper case, which should make them obvious (and you can track them down from import statements); fields in other classes are always obvious from context; and there are no globals to worry about. The only confusion is between local variables and class fields; for a while I used a single underscore prefix for fields, but I really don't find it necessary.

    I'm sure that other languages have their own needs.

    While we're here, might as well repeat my own principle for meaningful variable names: The length of a variable name should be directly proportional to its scope. A variable that's only used for a couple of lines after it's defined is best kept to a few characters; but one that's used throughout the system should have a name long enough to make its function clear.

  • Worse than pointless (Score:3, Interesting)

    by bluGill ( 862 ) on Tuesday March 08, 2005 @11:40PM (#11884891)

    or simple counters I find that using i or j or similar works quite well. After all, everybody understands a for loop with an i index -- calling it array_index_counter or some-such is (IMHO) pointless.

    No, array_index_counter is worse than pointless, using i or j is recognizable to any good programmer, no though required. Using array_index_counter implies that there is something other than a loop index going on with the variable and causes all good programmers to pause for a moment to figure out what that is. (Only to be annoyed when they realize it is just a loop index)

    There is one other program with descriptive loop indexes: they take up too much space. The example you gave it 19 letters, and in a typical C for statement will be repeated three times. That works out to 57 letters. Add in a few for the other required parts, and indentation, and you are longer than the typical terminal line, and all hope of easy readability is lost. (Unless you expand the window, but that assumes you do all your editing in a graphical environment, which isn't always true)

  • Re:some newbies (Score:2, Interesting)

    by corysama ( 776329 ) on Wednesday March 09, 2005 @12:19AM (#11885145)
    It is my understanding that it wasn't even K&R's choice to lay out the code that way. It was a demand from the publisher to save paper. It may be a urban legend, but assuming it's true it makes me laugh when I see programmers get zealous defending a standard designed by a bean counter who was just trying to trim his dead tree budget.
  • Re:Code format (Score:2, Interesting)

    by Kwil ( 53679 ) on Wednesday March 09, 2005 @12:33AM (#11885230)
    I've only seen this done a few places, but I find the format
    if(test)
    { statement1
    }
    else
    { statement2
    }
    works very well.

    Especially when you get into nested..
    if(test1)
    { statement1
    statement2
    statement3
    if (test2)
    { statement 4
    statement 5
    }
    statement6
    }
    Bringing the brace down, but not adding the extra line conserves the space lost with the latter method and keeps the connection between the test and the block, but at the same time makes visual "chunking" of the code blocks that much easier by having the open and closing braces in the same column.
  • Re:Code format (Score:1, Interesting)

    by Anonymous Coward on Wednesday March 09, 2005 @01:51AM (#11885686)
    The problem is that it's not consistent. Don't forget that a single statement (without brackets) is also a valid C conditional. It extends nicely to the bracketed form. For example:
    if (condition)
    function();

    if (condition)
    {
    function();
    }
    Your dangling use of { } brackets tends to hide them and any version control system will mark the conditional line as having changed simply by adding the brackets. Eg:
    if (condition)
    function();

    if (condition) {
    function();
    }
    The distinction between bracketed and unbracketed form is far less clear. Since brackets are important to code flow, I believe they should be treated with importance and given their own lines. And you shouldn't care if this pads your source file out a bit (given that the world has code folding editors, high resolution screens - and almost nobody uses hard copy anymore).

    Because your style of brackets merges the bracketed code and the conditional into one blob, you also can't quickly comment out the conditional (leaving the code). For example:
    //if (condition)
    {
    function();
    }
    ..vs having to mess with the end bracket as well:
    //if (condition) {
    function();
    //}
    Anyway, my 2 cents.
  • by God! Awful 2 ( 631283 ) on Wednesday March 09, 2005 @04:56AM (#11886437) Journal
    ALL code NEEDS commenting at ALL times.

    You, sir, are correct. The developers I work "with" who never write comments (and in fact argue against them), are the same ones who believe that all code not written by them is a big kludge and needs to be rewritten, and they are also the same ones who are always introducing subtle flaws because they modify code without understanding all the consequences.

    I also need to take issue with the submitter's comment that inaccurate documentation (or comments) is worse than none at all. At least a document/comment tells you what someone *thought* the code was supposed to do.

    An incorrect comment is verifiably false. Once you discover that the code doesn't match the comment, you can make an educated guess about which one is correct. On the other hand, when you are faced with suspicious code and no comments at all, you typically have no idea whether the code is broken or whether you just don't understand what it was intended to do (or in many cases *both*).

    -a
  • by andr0meda ( 167375 ) on Wednesday March 09, 2005 @06:01AM (#11886672) Journal

    There is a lot of emphasis on comments here, and while I agree that the commenting style and opportunity seized by a programmer gives away a lot about the insight of programming, I just felt it necessary to add that commenting is not the holy grail either, and certainly not the final and decissive way to judge a programmer. Comments have the annoying property to become outdated. Especially when code ownership is blurry, comments in code tend to become obsolete or unapplicable. Despite all good intentions of the programmer.

    It is better to 'read the code' instead. And if you 'can`t read the code', it means that the code sucks! Plain and simple. Then it`s time to refactor, and make sure that you can read the code as if it tells you a story about what happens with data, and why. Real good programmers know how to abstract and describe this process using architectural elements and design patterns, and without the need for a lot of extra commenting.

    There is no algorithm that is so complex that it can not be put in a more or less readable form in code. Use abstractions and propper long names for classes, variables and fucntions. Describe with code what you are doing, and use small comments in-between to explain the progression within the algorithm, but refrain from writing large texts and function header comment blocks or whatever. They will become incorrect and misleading, and they are a drag to keep updated.

    The only reason where I could think of using extensive comment sections in your code is to feed document system code parsers like Doxygen or Doc++ etc.., but that is only usefull if you`re writing an API, and when you know that 90% of that API is stable and will not change much.

  • by Diomidis Spinellis ( 661697 ) on Wednesday March 09, 2005 @06:23AM (#11886786) Homepage
    I discuss revision control systems in Chapter 6 of the Code Reading [spinellis.gr] book, titled Tackling Large Projects. I also present revision control in the elective course Software Comprehension and Maintenance [spinellis.gr] I am teaching. In that course students have to contribute to an open source project, in order to pass the course.

    You are right, version control should be part of "programming" education, and should probably be taught in a software engineering course.

    Diomidis

  • by CoderBob ( 858156 ) on Thursday March 10, 2005 @04:09PM (#11903114)
    Not everything is coded in OOP. And what about when I'm using your code as a library? Maybe it wasn't intended to be a library, and there is something concerning system state between function calls, or little things like is it a blocking function or not, etc. that I might need to know, now that The Pointy Haired Boss wants me to use it in my code.

    A few lines at the top of each function/procedure/class at least tell the other guy what the known effects of the function are would be nice. I don't want to have to look into your specific code unless something goes wrong. I want to be able to call foo() with baz and bar as parameters, and know that if baz happens to be of the format "xxx.xxx.xxx.xxx" instead of as a long int, it will be translated to a long int for the socket library (This is especially useful in such languages as Python, PHP, etc, that are not strongly typed). And I'd rather not have to take the time to dig through your code to make sure you're doing it.

    I'm not saying comment the process, I'm saying comment the functionality. Preconditions, postconditions, undefined behavior, etc. The best part is, there are tools out there that can rip those out to generate paper docs later, if you want to have a nifty reference. "Oh! You need a function that gives pigs wings? I have one of those- its in file whenPigsFly.py. Here's the doc for the function, its got all you need to know."

UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever things. -- Doug Gwyn

Working...