The key point of this paper is to demonstrate the importance of statistical analysis and its applications to determining information generation and transmission capacity. The measure H, ampoule or entropy, can be thought of as the amount of variance, or uncertainty, in a communication system. This leads us to define the theoretical capacity of a communication system given the known statistical properties of its constituents as well as apply analysis to practical systems.
The concept of information entropy deals with the uncertainty in the expected value of this information. Although it is rooted in statistical mechanics, it can be seen that highly predictable information has low variance, and therefore lower entropy, as compared to more random information. From this measure of information entropy, we can determine the necessary number of bits to efficiently encode this information, or to put it another way, how many symbols we can transmit per bit (assuming digital communication medium). Although the case of uniform probability distribution for all information symbols is easiest to analyze and leads to highest entropy, most practical applications have particular statistical distributions for symbol/information generation. Shannon goes to lengths to demonstrate this with the English language noting that selection of letters, or even words, is highly structured and far from random. This structure is a measure of redundancy of information, so that if I typ like ths, you cn stil undersnd me. (Spammers have been rediscovering this fact for years.)
Once the information entropy for all of the circuits involved in the communication system are determined, the channel capacity can be determined in the form of symbols per second given a finite certainty and a raw channel bit-rate. Shannon gives a fine example of a digital channel operating at 1000bits/s with a 1% error rate leading to an effective bit rate of ~919bits/s to account for error detection. Some communication system examples are given which I will not discuss in depth, however, I will try to reiterate the important steps in efficient communication design. Although Shannon gives a mathematical formulation for determining the theoretical limit for channel throughput, it is up to the designer to realize create a system which comes close to the limit. To do this, it is imperative to know the statistical properties of all of the sub-systems involved and the noise that may be present, and only then can efficiency be achieved.
The paper is by far more in-depth than this introduction and the math is not too hard, if anything, it is worth a look-over for some commentary on the statistical nature of the English language. As always, feel free to post a comment to discuss something about the paper, add something, or correct a mistake I have made. As a small bonus, I am adding Shannons’ patent for PCM-encoded voice/telephone service for those who like to read those types of things.