Over a decade ago, cialis I remember printing out and reading a text by Aleph1 entitled Smashing the Stack for Fun and Profit. Back then, stack-based buffer overflows were a hot topic and the tide was turning as programmers began to realize that null termination of strings was not a good security measure and bounds checking was becoming necessary for the security-minded programs.
The issue was that many people were used to using a function like strcpy() to copy a string from one memory location to a dynamically allocated memory segment on the stack. The strcpy() function simply started copying from the supplied address and stopped when it reached a null character without knowing how much space was allocated for the string at the destination. As a result, segments of the stack that were not allocated for the “local” variable, like the return address of a function, could be overwritten with arbitrary values. With the properly formatted string, even executable code could be put somewhere on the stack and the return address could be overwritten so that this code could be executed, for fun and profit as they say. Programmers became wiser and started using strncpy() instead, which only copied a fixed amount of data and therefore guaranteed that the allocated space would not be exceeded. Furthermore, most modern operating systems can now set areas of the memory dedicated to the stack as non-executable, so the above routine would be foiled. Individuals have found some ways around these security features, however, the stack smashing exploit (as described by Aleph1) has mostly been considered a thing of the past.
I use the term mostly since Nintendo has preserved the knowledge and allowed practice of this exploit with their release of the latest Zelda game for the Wii. Through a cleverly crafted save file, the name of the main characters horse can contain a string as mentioned above and lead to execution of arbitrary code. There are a few tricks to maintain the integrity of the save file, however, after a decade the above exploit still lives on, almost in the same form as described by Aleph1.
( Although the picture is not from the Twilight Princess game, it is a good game none the less. )
The inaugural paper for the Journal Club is titled “Power-constrained high-frequency circuits for the IBM POWER6 microprocessor” by Brian Curran et al. and is published in the November 2007 issue of the IBM Journal of Research and Development. I have much respect for the whole POWER micro-architecture, check consequently, I am interested in learning a little bit about their design methodology which lead to a near-5GHz core logic clock rate. The IBM design team responsible for the POWER6 applied a three-direction strategy to achieving this performance goal: cutting edge technology, manual circuit optimization and thorough testing.
The processor was designed at a 65um manufacturing node so various technologies needed to be employed to keep leakage current to a minimum and thereby maintain an acceptable power usage. The first method involved using silicon-on-insulator (SOI) which reduced back-gate current due to parasitic capacitances and can CMOS latch-up. The processing steps to implement SOI are well understood, however, extra care must be given to design layout as it is no longer possible to drive the back-gate by connecting the whole substrate to a fixed potential. Another technological advance employed was the use of dielectrics with low relative permittivity between traces to further reduce transmission line effects and the associated propagation delay of interconnects. Since less energy is stored in the dielectric material between interconnects, this also reduces power consumption.
From a design stand point, the goal of the team was to distribute the clock properly and to maintain the latency of the core logic circuits below “13FO-4”. Propagation delays, loading and transmission line effects play a very important role in the 5GHz regime. It was very interesting to see how multiple layers of buffers and clock delays were included to guarantee that clock pulses would be synchronized around various cells while maintaining an adequate slew rate. The 13FO-4 latency means that each processing cycle had to be accomplished in the time it would take for a signal to propagate along a chain of thirteen inverters that were loaded with four devices each. This is the criteria which allowed for a 5GHz core logic clock rate. It was mentioned that threshold voltages were tuned, probably through ion implantation, to minimize leakage while maximizing speed.
Simulations, being the last major piece of the paper, were less interesting as they relied mostly on proprietary tools. The piece that may have been important for readers was the iterative cycle of debugging and performance tuning. Going from schematic overview to transmission line calculations to back-annotation, to placing and routing made some sense.
Please feel free to contribute your thoughts on this paper, my interpretation or another paper that would be an interesting read in the comments section. Lets look at Claude Shannon’s paper titled ‘A Mathematical Theory of Communications’ as suggested by Adam. As the full paper is quite long, we may want to look at only the first thirty pages in detail. Those that want to brush up on their mathematics before attempting the paper should start on page thirty-two.
While looking at some low level system design documents, help recipe I came across this article from IBM by Lewin Edwards. His case is that the x86 architecture is not the most flexible and financially feasible path to developing embedded solution. The argument is that x86 boards, pilule even single board computers, are designed to be used as black boxes where the developer is supposed to make it work for his or her design through available components and external modules. This is to say that designing an x86 embedded system from scratch is not often done. On the other hand, the PowerPC embedded systems offer plenty of flexibility with a broad range of processors featuring a vast array of built in features (JTAG, memory controllers, peripheral controllers, etc). This article gives an overview of getting Linux to run on a Kuro Box, essentially a $150 PowerPC embedded system. For those less interested in the actual process, there are plenty of interesting resource links in the first section.
Part 1: Robots and networked appliances on a shoestring
Part 2: Anatomy of the Linux boot process
Part 3: Kuro Box Linux up close