Language Design Principles and Programming Processing 2 886340 Principles of Programming Languages
Programming Language A notation for communication to a computer what we want to do A notation system for describing computation in machine-readable Syntax, Semantics human-readable form Abstraction
Level of Language Low-Level Language Machine Language, Assembly High-Level Language Fortran, Cobol, Pascal, C, VB, Java etc…
Levels of Language in Computing
Layered View of Computer The operating system and language implementation are layered over machine interface of a computer Bare Machine: internal memory and processor Macroinstructions: Primitive operations, or machine instructions such as those for arithmetic and logic operations Operating system: Higher level primitives than machine code, which provides system resource management, input and output operation, file management system, text and or program editors… Virtual computer: Provides interfaces to user at a higher level
Requirement for PL Design Readable – comments, names, (…) syntax Simple to learn – Orthogonal - small number of concepts combine regularly and systematically (without exceptions) Portable – language standardization Abstraction – control and data structures that hide detail Efficient
Implementation Methods Compilation Programs are translated into machine language Use: Large commercial applications Pure Interpretation Programs are interpreted by another program known as an interpreter Use: Small programs or when efficiency is not an issue Hybrid Implementation Systems A compromise between compilers and pure interpreters Use: Small and medium systems when efficiency is not the first concern
Compilation Translate high-level program (source language) into machine code (machine language) Slow translation, fast execution Compilation process has several phases: lexical analysis: converts characters in the source program into lexical units syntax analysis: transforms lexical units into parse trees which represent the syntactic structure of program Semantics analysis: generate intermediate code code generation: machine code is generated
Pure Interpretation No translation Easier implementation of programs (run-time errors can easily and immediately be displayed) Slower execution (10 to 100 times slower than compiled programs) Often requires more space Now rare for traditional high-level languages Significant comeback with some Web scripting languages (e.g., JavaScript, PHP)
Hybrid Implementation Systems A compromise between compilers and pure interpreters A high-level language program is translated to an intermediate language that allows easy interpretation Faster than pure interpretation Examples Perl programs are partially compiled to detect errors before interpretation
Hybrid Implementation Systems Examples Initial implementations of Java were hybrid; the intermediate form, byte code, provides portability to any machine that has a byte code interpreter and a run- time system (together, these are called Java Virtual Machine)
(and semantic analyzer) Source program Source program Lexical analyzer Intermediate code generator Lexical units Parse tree Intermediate code Syntax analyzer Lexical analyzer Lexical units Syntax analyzer Parse tree Symbol table Intermediate code generator (and semantic analyzer) Optimization Intermediate code Code generator Interpreter Input data Machine language Computer Input data Results Results Pure Interpretation Compilation Hybrid Implementation
1. Lexical Analyzer Step in compiler ทำหน้าที่อ่านอักษรทีละตัวจาก source code แล้วส่งผลลัพธ์ที่เป็น Token เช่น คำสั่ง ตัวแปร ค่าคงที่ เป็นต้น กำจัด Comment, White space (blank, tab และ new line) ผลของขั้นตอนนี้จะแยกทุกสิ่งที่ควรแยกออกจากกัน เพื่อนำไปตรวจสอบในขั้นตอนต่อไป และเก็บค่าซึ่งจำเป็นต้องใช้ในขึ้นตอนอื่น ๆ ไว้ใน Symbol table เช่น ชนิดข้อมูลที่ programmer กำหนด
2. Syntax Analyzer Step in compiler ทำหน้าที่ ตรวจสอบความสัมพันธ์ การจัดเรียง และความถูกต้องของแต่ละคำ แต่ละประโยค ซึ่ง มีวิธีตรวจสอบหลายวิธี เช่น ตรวจสอบ parse tree ว่ามีไวยากรณ์ ถูกต้องตามที่กำหนดไว้หรือไม่
3. Semantic Analyzer Step in compiler ทำหน้าที่ ตรวจสอบความหมายของภาษา ทำหน้าที่ ตรวจสอบความหมายของภาษา สร้างรหัสของชุดคำสั่งให้อยู่ในรูปแบบภาษากลาง (intermediate code) ที่ใกล้เคียงกับภาษาเครื่อง ขั้นตอนนี้อาจนำไปรวมกับ Code Generator แล้วสร้างภาษาเครื่องก็ได้
4. Code Optimizer 5. Code Generator Step in compiler ตัดส่วนของโปรแกรม หรือตัวแปรที่ไม่จำเป็น ออกไป ใน compiler บางตัวอย่างไม่มีขั้นตอนนี้ 5. Code Generator ทำหน้าที่ เปลี่ยนรหัสที่ได้ให้เป็นภาษา Assembly หรือภาษาเครื่อง ซึ่ง code ที่ได้มีลักษณะขึ้นอยู่กับเครื่อง ที่ ให้บริการหน่วยความจำ และ register ที่แตกต่าง กันไป
Language Structures 1. Lexical (Token) 2. Syntactic 3. Contextual 4. Semantic
1. Lexical Identifiers Names chosen to represent data items Name variables, method, class, functions and procedures, etc. Considerations: Case sensitivity Number of characters Example: C language int sum; float tax_rate; float taxRate
1. Lexical Keywords Names chosen by the language designer to represent facets of particular language constructs which cannot be used as identifiers (sometimes referred to as reserved words). Example: C++; for, while, do, cin, cout Prolog; repeat, print JAVA; class, System.out.println
1. Lexical Operators Special “keywords” used to identify operations to be performed on operands, e.g. math’s operators. Example: Prolog; +, -, *, /, /\, \/ JAVA; +, -, *, /, &&, ||, ==, !=
1. Lexical Separators Punctuation marks used to group together sequences of tokens that have a “unit” meaning. When outputting text it is often desirable to include punctuation, where these are also used (within the language) as separators we must precede the punctuation character with what is called an escape character (usually a backslash ‘\’). Example: C int a; for (a=1; a<10 ; a++){ printf("\"Hello world\""); printf("\n"); }
1. Lexical Literals Denote direct values, can be: Numeric, e.g. 1, -123, 3.14, 6.02e23. Character, e.g. ‘a’. String, e.g. “Some text”.
1. Lexical Comments A good program is one that is understandable. We can increase understandability by including meaningful comments into our code. Comments are omitted during processing. The start of a comment is typically indicated by a keyword (e.g. comment) or a separator, and may also be ended by a separator. Example: Python; # comment VB; ‘ comment Prolog; % comment /* comment */
1. Lexical Layout Generally speaking layout (indentation etc.) is unimportant in the context of programming languages. White space (spaces, tabs and new lines) are usually ignored. A good layout does however enhance readability, and consequently understand ability. A good layout can also reduce the risk of programming errors.
1. Lexical Layout Example #include <stdio.h> int main( ) {int num1, num2, sum; printf("Enter two integers: "); scanf("%d %d",&num1,&num2); sum=num1+num2; printf("Sum: %d /n", sum); return 0; }
2. Syntactic The syntactic level describes the way that program statements are constructed from tokens. This is always very precisely defined in terms of a context free grammar. The best known examples are BNF (Backus Naur Form) or EBNF (Extended Backus Naur Form). Syntax may also be described using a syntax tree.
2. Syntactic <number> ::= 0|1|2| … |9
3. Contextual The contextual level of analysis is concerned with the “context” in which program statements occur. Program statements usually contain identifiers whose value is dictated by earlier statements (especially in the case of the imperative or OO paradigms). Context also determines whether a statement is “legal” or not (context conditions – a data item must “exist” before it can be used).
4. Semantic The semantic level refers to the final overall meaning of a program.
Managing and Reducing Complexity 1. Problem Decomposition 2. Abstraction 3. Contextual Checking
1. Problem Decomposition Divide and conquer (divide et impera). Problem decomposition hinges on procedures, recursion and parameter passing , and can be applied in most (high level) programming languages.
2. Abstraction Ignoring irrelevant detail in a safe way (information hiding). Requires the use of an “interface” to abstract away from lower level detail. Abstraction is typically facilitated through the use of packages or modules.
3. Contextual Checking Contextual checking is concerned with the contextual correctness of program code. Ideally we would like to check for (and eradicate) all possible “run time” errors, however contextual checking is a difficult undertaking and in some cases (e.g. recursion) completely impractical. Contextual checking consists (typically) of parameter and identifier validation.
Programming Processing 1. Translation 2. Libraries 3. Macro processing 4. Debugging tools 5. Program Management Systems and Environments
1. Translation A program written in a high level language (source code) can only be run in its machine code equivalent format. There are two ways of achieving this: Compilation, and Interpretation.
1. Translation Compilation Compilation requires the use of a special program (called a compiler) that translates source code into object code. Sometimes the object code cannot be directly executed. Various library files must be “linked in” using another special program called a linker, which produces executable code. Again various contextual checks are made during compilation.
1. Translation
1. Translation Interpretation Interpretation requires the use of a special program that reads and reacts to source code. Such a program is called an interpreter. During interpretation run-time errors may be detected and “meaningful” error messages produced.
1. Translation
2. Libraries Libraries (in computer programming terms) contain chunks of precompiled (object) code for various functions and procedures that come with a programming language that requires compilation. For example functions and procedures to facilitate I/O. In C <stdio.h> Defines core input and output functions <time.h> Defines date and time handling functions
3. Macro processing During macro preprocessing all occurrences of the name within the program are replaced by the string before interpretation/compilation takes place. Use of macros offers the advantage of enhanced readability. However it is also argued that a well designed language should not require the use of macros! Example: C #define PI 3.141 #define NUM 500 Example: C++ const double pi = 3.141; const char newline = '\n';
4. Debugging tools To assist in error detection many debugging tools exist. Some of these allow the user to analyze the core dump that occurs in the event of a fatal error. (A core dump describes the “state” of a program when a fatal error occurs). Others allow programmers to step through and execute a program line by line to support analysis of its execution.
5. Program Management Systems and Environments some vendors have combined a text editor with a compiler/interpreter into a single dedicated programming environment for the production of code in a particular programming language. Such environments include a program management systems and other “administrative” tools (e.g. version control). IDE (Integrated Development Environment ) Eclipse Code::Block NetBeans
แบ่งตามกลุ่มของภาษา Programming Language Markup Language Language in Computing แบ่งตามกลุ่มของภาษา Programming Language Markup Language Scripting Language
Programming Language ภาษาที่สามารถใช้ควบคุมกำหนดพฤติกรรม การทำงานของคอมพิวเตอร์ (Flow control) โครงสร้างไวยกรณ์ (Syntax) และตีความหมาย (Semantic) แต่ละภาษาจะมีโครงสร้างของภาษา รูปแบบ ไวยากรณ์ และคำศัพท์ ที่ไม่เหมือนกัน แต่ หลักการของภาษา จะเหมือนกัน เช่น ภาษา Pascal, C, VB, Java
Markup Language ภาษาประเภท Markup เป็นภาษาคอมพิวเตอร์ ที่แสดงทั้งข้อมูล และรูปแบบการแสดงผลเข้า ด้วยกัน ได้แก่ HTML, XHTML, XML
Scripting Language code ที่เขียนจะถูกตีความ (Interpreted) และ execute ไปทีละคำสั่ง ผ่าน software พวก Script Engine ที่สนับสนุนภาษา script นั้นๆ Scripting Language เป็น interpreted language และต้องอาศัย run บนโปรแกรมอื่น ภาษา script ที่นิยมใช้ในการสร้างเว็บเพจ Client-Side Script เช่น JavaScript, VBScript, Jscript ประมวลผลบนเครื่องคอมพิวเตอร์ของผู้ใช้ Server-Side Script เช่น PHP, ASP, JSP, CGI ประมวลผลที่ฝั่ง server
Conclusion and Question