Computer Languages

< CS101

From Programmer to the CPU

Computer Languages


languages translated to machine code by compilers and interpreters

It is extremely rare to write machine code by hand. Instead, a programmer writes code in a more "high level" computer language with features that are more useful and powerful than the simple operations found in machine code. For CS101, we write code in Javascript which supports high level features such as strings, loops, and the print() function. None of those high level features are directly present in the low level machine code; they are added by the Javascript language. There are two major ways that a computer language can work.

Source Code, Language Features, Compilers

One common translation strategy is based on a "compiler". The computer languages C and its derivative C++ are popular low-level languages that use this strategy.

In C++, the programmer writes C++ code which includes useful facilities such as strings and loops (much as we have seen in Javascript) which do not exist in machine code. Here is some C++ code to append a "!" at the end of a string.

  // C++ code
  a = "hi";
  b = a + "!";

This code appends the string "!" on to the end of "hi", resulting in the string "hi!" stored into the variable b. The machine code instructions in the CPU are too primitive to implement this append operation as one or two instructions. However, the operation can be accomplished by a longer sequence of machine code instructions strung together.

Compiler

compiler takes in source code, produces machine code program

The Compiler for the C++ language, reads that C++ code and translates and expands it to a larger sequence of the machine code instructions to implement the sequence of actions specified by the C++ code. The output of the compiler is, essentially, a program file (.exe or whatever) made of many machine code instructions that implements the actions specified in the C++ code. The compiler produces the .exe file from the C++ code, and it is finished. Running the .exe can happen later, and is a separate step.

Source Code

The "source code" is the high level code authored by the programmer and fed into the compiler. Generally just the program.exe file is distributed to users. The programmer retains the source code. Changing the program in the future generally requires access to the source code. For example to add a feature, the programmer would make changes in the source code, and then run the compiler to produce a new version of the program.

Open Source

"Open Source" refers to software where the program includes access to its source code, and a license where the user can make their own modifications. Typically open source software is distributed for free. Critically, beyond the free price, open source software also includes freedom/independence since the user is not dependent on the original vendor to make changes or fixes or whatever to the source code. Since the source code is available, if a user feels strongly enough about some feature, they can add the feature themselves, or pay someone to add the feature. Open source means you are not dependent on some other party .. attractive as software is such a critical part of many organizations. Typically open source licenses include a requirement that improvements made in the source code be made available back to the community at large. We'll talk about open source more later on, but I wanted to touch on it here since it is a good example of the difference between a program and its source code.

High Level Languages and Interpreters

There is a broad category of more modern languages such as Java (used in Stanford CS106A), Javascript, and Python, which do not use the compiler/machine-code strategy. Instead, these languages can be implemented by an "interpreter", and I will lump them into an extremely broad "high-level" category.

Interpreter

So in Javascript when we have code lines like:

  // Javascript code
  a = 1;
  b = a + 2;

An interpreter is a program which reads in source code as its input, and "runs" the input code. The interpreter proceeds through the code given to it, line by line. For each line, the interpreter deconstructs what the line says and performs those actions, piece by piece. For example, Javascript which we have been using, is implemented by a Javascript interpreter which is built into Firefox.

The interpreter runs this code, by taking the lines one at a time, and for each, interpreting its actions. For "a = 1;" the interpreter reserves a few bytes to store the value of a, then stores the value 1 into those bytes. Then for "b = a + 2;" the interpreter evaluates (a + 2) getting the value 3, reserves some bytes for the b variable, then stores the 3 into the b bytes.

A compiler translates all the source code into equivalent machine code program.exe to be run later. It is a bulk translation. An interpreter looks at each line of code, and translates and runs it in the moment, and then proceeds to the next line of source code. The interpreter does not produce a program.exe, instead it performs the actions specified in the source code directly.

Computer Language Evaluation