1 ภาษาLANGUAGE
ภาษาอังกฤษ หน่วยของภาษา อักขระ letters อักขระ letters คำ words คำ words ประโยค sentences ประโยค sentences ย่อหน้า paragraphs ย่อหน้า paragraphs เรื่องราว coherent stories เรื่องราว coherent stories COLLECTION & SEQUENTIAL How do they do that ? LANGUAGE
หน่วยของภาษาคอมพิวเตอร์ อักขระ letters อักขระ letters คำสำคัญ keywords คำสำคัญ keywords คำสั่ง commands คำสั่ง commands โปรแกรม programs โปรแกรม programs ระบบ systems ระบบ systems Commands can be recognized by certain sequences of words. Language structure is based on explicitly rules. It is very hard to state all the rules for the language “spoken English”. ภาษาคอมพิวเ ตอร์ LANGUAGE
Definition Language means simply a set of strings involving symbols from alphabet. LANGUAGE ภาษา LANGUAGE
Formal refers explicitly rules What sequences of symbols can occur ? No liberties are tolerated. No reference to any “deeper understanding” is required. the form of the sequences of symbols not the meaning ทฤษฎีภาษารูปนัย THEORY OF FORMAL LANGUAGES LANGUAGE
One finite set of fundamental units, called “alphabet”, denoted . One finite set of fundamental units, called “alphabet”, denoted . An element of alphabet is called “character”. An element of alphabet is called “character”. A certain specified set of strings of characters will be called “language” denoted L. A certain specified set of strings of characters will be called “language” denoted L. Those strings that are permissible in the language we call “words”. Those strings that are permissible in the language we call “words”. The string without letter is called “empty string” or “null string”, denoted by . The string without letter is called “empty string” or “null string”, denoted by . The language that has no word is denoted by . The language that has no word is denoted by . specified นิยามและสัญลักษณ์ LANGUAGE
Union operation+ Different operation Alphabet Empty string LanguageL Empty language SYMBOLS LANGUAGE
การนิยามหรือการบรรยาย ภาษา ภาษา Given an alphabet = { a b c … z ‘ - }. We can now specify a language L as { all words in a standard dictionary }, named “”. named “ ENGLISH-WORDS ”. IMPLICITLY DEFINING LANGUAGE
The trick of defining the language L, By listing all rules of grammar. This allows us to give a finite description of an infinite language. Consider this sentence“I eat three Sundays”. This is grammatically correct. RIDICULOUS LANGUAGE การนิยามหรือการบรรยาย ภาษา ภาษา INFINITE LANGUAGE DEFINING LANGUAGE
Let = {x} be an alphabet. Language L can be defined by L = { x xx xxx xxxx … } L = { x n for n = … }. Language L 2 = { x xxx xxxxx xxxxxxx …} L 2 = { x odd } L 2 = { x 2n-1 for n = … }. การนิยามหรือการบรรยาย ภาษา ภาษา Method of exhaustion LANGUAGE
We define the function length of a string to be the number of letters in the string. For example, if a word a = xxxx in L, then length( a )=4. In any language that includes , we have length( )=0. The function reverse is defined by if a is a word in L, then reverse( a ) is the same string of letters spelled backward, called the reverse of a. For example, reverse(123)=321. Remark: The reverse( a ) is not necessary in the language of a. นิยาม ภาษา LANGUAGE
We define the function n a ( w ) of a w to be the number of letter a in the string w. For example, if a word w = aabbac in L, then n a ( w )=3. Concatenation of two strings means that two strings are written down side by side. For example, x n concatenated with x m is x n+m นิยาม ภาษา LANGUAGE
Language is called PALINDROME over the alphabet if Language = { and all strings x such that reserve(x)=x }. For example, let ={ a, b }, and PALINDROME ={ a b aa bb aaa aba bab bbb …}. Remark: Sometimes, we obtain another word in PALINDROME when we concatenate two words in PALINDROME. We shall see the interesting properties of this language later. นิยาม ภาษา LANGUAGE
Consider the language PALINDROME ={ a b aa bb aaa aba bab bbb …}. We usually put words in size order and then listed all the words of the same length alphabetically. This order is called lexicographic order. นิยาม ภาษา LANGUAGE
Given an alphabet , the language that any string of characters in are in this language is called the closure of the alphabet. It is denoted by *. This notation is sometimes known as the Kleene star. Kleene star can be considered as an operation that makes an infinite language. When we say “infinite language”, we mean infinitely many words, each of finite length. นิยาม ภาษา KLEENE CLOSURE LANGUAGE
More general, if S is a set of words, then by S* we mean the set of all finite strings formed by concatenating words from S and from S*. Example: If S = { a ab }then S* ={ and any word composed of factors of a and ab }. { and all strings of a and b except strings with double b }. { a aa ab aaa aab aba aaaa aaab aaba … }. นิยาม ภาษา KLEENE CLOSURE LANGUAGE
Example: If S = { a ab }then S* ={ and any word composed of factors of a and ab }. { and all strings of a and b except strings with double b }. { a aa ab aaa aab aba aaaa aaab aaba … }. To prove that a certain word is in the closure language S*, we must show how it can be written as a concatenation of words from the base set S. Example:abaaba can be factored as (ab)(a)(ab)(a) and it is unique. นิยาม ภาษา KLEENE CLOSURE LANGUAGE
Example: If S = { xx xxxxx }then S* ={ xx xxxx xxxxx xxxxxx xxxxxxx xxxxxxxx … }. { and xx and x n for n = … }. How can we prove this statement ? Hence: proof by constructive algorithm (showing how to create it). นิยาม ภาษา KLEENE CLOSURE LANGUAGE
Example: If S = { a b ab } and T = { a b ba }, then S* = T* = { a b }*. Proof:It is clear that { a b }* S* and { a b }* T*. We have to show that S* and T* { a b }*. For x S*, in the case that x is composed of ab. Replace ab in x by a, b which are in { a b }*. Then S* { a b }*. The proof of T* { a b }* is similarity.QED นิยาม ภาษา KLEENE CLOSURE LANGUAGE
Given an alphabet , the language that any string (not zero) of characters in are in this language is called the positive closure of the alphabet. It is denoted by +. Example: Let ={ ab }. Then + = { ab abab ababab … }. นิยาม ภาษา POSITIVE CLOSURE LANGUAGE
Given an alphabet ={ aa bbb }. Then * is the set of all strings where a’s occur in even clumps and b’s in groups of 3, 6, 9…. Some words in * are bbb aabbbaaaa bbbaa If we concatenate these three elements of *, we get one big word in **, which is again in *. bbbaabbbaaaabbbaa = (bbb)(aa)(bbb)(aa)(aa)(bbb)(aa) Note : ** means ( *)*. นิยาม ภาษา TRIVIAL REMARK LANGUAGE
Theorem For any set S of strings, we have S*= S**. Proof:Every words in S** is made up of factors from S*. Every words in S* is made up of factors from S. Therefore every words in S** is made up of factors from S. We can write thisS** S*. In general, it is true that S S*. So S* S**. Then S*= S**.QED นิยาม ภาษา THEOREM LANGUAGE
น่าคิด โจทย์ น่าคิด ใ ห้ L เป็นภาษาที่นิยามบน ={0,1} จงอธิบายความสัมพันธ์ของ L* + L* + L + * L + * ?
2 RECURSIVELY DEFINING LANGUAGE การบรรยายภาษาแบบวนซ้ำ
นิยายของภาษา การบรรยายแบบวน ซ้ำ EVEN language EVEN is the set of all positive whole numbers divisible by 2. EVEN is the set of all 2n where n = … Another way we might try this: The set is defined by these three rules: Rule1: 2 is in EVEN. Rule2: if x is in EVEN, then so is x+2. Rule3: The only elements in the set EVEN are those that can be produced from the two rules above. The last rule above is completely redundant. RECURSIVELY DEFINING LANGUAGE
EVEN language The set is defined by these three rules: Rule1: 2 is in EVEN. Rule2: if x is in EVEN, then so is x+2. Rule3: The only elements in the set EVEN are those that can be produced from the two rules above. PROBLEM: Show that 10 is in this language. By Rule1, 2 is in EVEN. By Rule2, 2+2=4 is in EVEN. By Rule2, 4+2=6 is in EVEN. By Rule2, 6+2=8 is in EVEN. By Rule2, 8+2=10 is in EVEN. PRETTY HORRIBLE ! นิยายของภาษา การบรรยายแบบวน ซ้ำ RECURSIVELY DEFINING LANGUAGE
EVEN language The set is defined by these three rules: Rule1: 2 is in EVEN. Rule2: if x,y are in EVEN, then so is x+y. Rule3: The only elements in the set EVEN are those that can be produced from the two rules above. PROBLEM: Show that 10 is in this language. By Rule1, 2 is in EVEN. By Rule2, 2+2=4 is in EVEN. By Rule2, 4+4=8 is in EVEN. By Rule2, 8+2=10 is in EVEN. DECIDEDLY HARD นิยายของภาษา การบรรยายแบบวน ซ้ำ RECURSIVELY DEFINING LANGUAGE
POSITIVE language The set is defined by these three rules: Rule1: 1 is in POSITIVE. Rule2: if x,y are in POSITIVE, then so is x+y, x-y, x y and x/y where y is not zero. Rule3: The only elements in the set POSITIVE are those that can be produced from the two rules above. PROBLEM: What is POSITIVE language ? นิยายของภาษา การบรรยายแบบวน ซ้ำ RECURSIVELY DEFINING LANGUAGE
POLYNOMIAL language The set is defined by these four rules: Rule1: Any number is in POLYNOMIAL Rule2: Any variable x is in POLYNOMIAL. Rule3: if x,y are in POLYNOMIAL, then so is x+y, x-y, x y and (x). Rule4: The only elements in the set POLYNOMIAL are those that can be produced from the three rules above. PROBLEM: Show that 3x 2 +2x-5 is in POLYNOMIAL. Proof: Rule1: 2, 3, 5 are in POLYNOMIAL, Rule2: x is in POLYNOMIAL, Rule3: 3x, 2x are in POLYNOMIAL, Rule3: 3xx is in POLYNOMIAL, Rule3: 3xxx+2x, 3x 2 +2x-5 are in POLYNOMIAL.QED. นิยายของภาษา การบรรยายแบบวน ซ้ำ RECURSIVELY DEFINING LANGUAGE
นิยายของภาษา ARITHMETIC EXPRESSIONS Language: Let be an alphabet for AE language. = { * / ( ) }. Define rules for this language. Problems: Show that the language does not contain substring //. Show that the language does not contain substring //. Show that ((3+4)-(2*6))/5 is in this language. Show that ((3+4)-(2*6))/5 is in this language. การบรรยายแบบวน ซ้ำ RECURSIVELY DEFINING LANGUAGE
การบรรยายภาษา ข้อสัง เกต Languages can be defined by L 1 ={ x n for n = … } L 1 ={ x n for n = … } L 2 ={ x n for n = … } L 2 ={ x n for n = … } L 3 ={ x n for n = … } L 3 ={ x n for n = … } L 4 ={ x n for n = … }. L 4 ={ x n for n = … }. More precision and less guesswork are required. RECURSIVELY DEFINING LANGUAGE
3 REGULAR EXPRESSION การบรรยายแบบ สม่ำเสมอ
Kleene star การบรรยายภาษาแบบ สม่ำเสมอ Definition: The simple expression x* will be used to indicate some sequence of x’s (may be none at all). We also define x 0 = . The star is as an unknown power or undetermined power. This notation can be used to help us define languages by writing L = language(x*) where L = { and x n for n = … }. REGULAR EXPRESSION
ตัวอย่าง L 1 = language(ab*) L 1 = { a ab abb abbb abbbb abbbbb abbbbbb … } L 2 = language(a*ba*) L 2 = { b ab ba aab aba baa aaab aaba abaa baaa … } L 3 = language(a*b*) L 3 = { a b aa ab bb aaa aab abb bbb aaaa aaab aabb … } L 4 = language((ab)*) L 4 = { ab abab ababab abababab ababababab … } L 5 = language(xx*) L 5 = { x xx xxx xxxx xxxxx xxxxxx … } = language(x + ) การบรรยายภาษาแบบ สม่ำเสมอ REGULAR EXPRESSION
การบวก (union) Definition: The plus expression x + y where x and y are string of characters from an alphabet, we mean “either x or y, but not both”. Example: L = language((a+b)c*) L = { a b ac bc acc bcc accc bccc … }. การบรรยายภาษาแบบ สม่ำเสมอ REGULAR EXPRESSION
Exercise: L = language(a*+(a+bb)*c*+d) Is in this language ? Is in this language ? Find words with length 1, 2, 3 and 4. Find words with length 1, 2, 3 and 4. Compare L with language(a*c*+(bb)*c*+d). Compare L with language(a*c*+(bb)*c*+d). การบรรยายภาษาแบบ สม่ำเสมอ REGULAR EXPRESSION
Finite language The language L where L = { aaa aab aba abb baa bab bba bbb } can be expressed by L = language((a+b) 3 ) or L = language((a+b)(a+b)(a+b)). การบรรยายภาษาแบบ สม่ำเสมอ REGULAR EXPRESSION
Definition: The set of regular expressions is defined by the following rules: Rule1:Every element of alphabet is a regular expression. Rule2: is a regular expression. Rule3:For every regular expressions r and s, then so are: (r) rs r+s r* Rule4:Nothing else is not a regular expression. REGULAR EXPRESSIONS การบรรยายภาษาแบบ สม่ำเสมอ REGULAR EXPRESSION
REGULAR EXPRESSIONS Example: Given a regular expression (a+b)*a(a+b)*+b(a+b)* This regular expression can be written more simple expression, as follow: (a+b)(a+b)*. การบรรยายภาษาแบบ สม่ำเสมอ REGULAR EXPRESSION
REGULAR EXPRESSIONS Finite & Positive closure: Is (a+b) 4 a regular expression ? This can be accepted to be a regular expression, since it equals to (a+b)(a+b)(a+b)(a+b) which is a regular expression. Is (a+c) + a regular expression ? This is also be accepted to be a regular expression since it represents a regular expression (a+c)(a+c)*. การบรรยายภาษาแบบ สม่ำเสมอ REGULAR EXPRESSION
REGULAR EXPRESSIONS ทำอย่างไร Define the set A by a regular expression. A = { b ab bb aba abb bbb abaa abab abbb bbbb abaaa abaab ababb abbbb bbbbb … }. The regular expression is (aba*)*b* การบรรยายภาษาแบบ สม่ำเสมอ REGULAR EXPRESSION
REGULAR EXPRESSIONS แบบฝึกหัด Given a regular expression (a+b)*ab(a+b)*+E where E is unknown expression. Find E if this expression equals (a+b)* and E (a+b)*ab(a+b)* = The regular expression E is b*a*. การบรรยายภาษาแบบ สม่ำเสมอ REGULAR EXPRESSION
REGULAR EXPRESSIONS Definition: Let S and T be sets of strings of letters. The product set of S and T is the set of all combinations of a string from S concatenated with a string from T in that order. ST = { uv : u S and v T } Example:S = { a bb aba } T = { a ab } thenST = { aa aab bba bbab abaa abaab }. การบรรยายภาษาแบบ สม่ำเสมอ REGULAR EXPRESSION
REGULAR EXPRESSIONS Example: P = { aa b } and Q = { ba } Then PQ = { aa b ba aaba bba }. การบรรยายภาษาแบบ สม่ำเสมอ REGULAR EXPRESSION
Definition: A language L is called a language associated with regular expression r if L = language(r). We also have L 1 L 2 = language(r 1 r 2 ) L 1 +L 2 = language(r 1 +r 2 ) L 1 * = language(r*). REGULAR EXPRESSIONS การบรรยายภาษาแบบ สม่ำเสมอ REGULAR EXPRESSION
ทฤษฎีบท If L is a finite language (only finitely many words), then L can be defined by a regular expression. In other words, all finite languages are regular. REGULAR EXPRESSIONS การบรรยายภาษาแบบ สม่ำเสมอ REGULAR EXPRESSION
What strings contain in the language? Given (a+b)*(aa+bb)(a+b)*+( +b)(ab)*( +a) Consider the regular (a+b)*(aa+bb)(a+b)*, strings that contain a double letter. { a b ab ba aba bab abab baba... } is the set of all strings that do not contain a double letter. ( +b)(ab)*( +a) defines all strings without a double letter. This language is (a+b)*. REGULAR EXPRESSIONS การบรรยายภาษาแบบ สม่ำเสมอ REGULAR EXPRESSION
Consider a regular expression (aa+bb+(ab+ba)(aa+bb)*(ab+ba))*. This represents the collection of all words that are made up of three type: type1aa type2bb type3(ab+ba)(aa+bb)*(ab+ba) Every words contain an even number of a and even number of b. EVEN-EVEN LANGUAGE REGULAR EXPRESSIONS การบรรยายภาษาแบบ สม่ำเสมอ REGULAR EXPRESSION
น่าคิด โจทย์ น่าคิด จ งหาภาษา L ที่นิยามบน ={0,1} ที่สอดคล้องกับ L ไม่เป็น { } L ไม่เป็น { } L ไม่เป็น * L ไม่เป็น * โดย ที่ L = L* ?
น่าคิด โจทย์ น่าคิด กำ หนดให้ภาษา L และ S นิยามบน ={0,1} ที่สอดคล้องกับ LS = SL LS = SL L ไม่เป็น subset ของ S L ไม่เป็น subset ของ S S ไม่เป็น subset ของ L S ไม่เป็น subset ของ L ทั้ง L และ S ไม่เป็น { } ทั้ง L และ S ไม่เป็น { } ?
น่าคิด โจทย์ น่าคิด กำ หนดให้ภาษา L และ S นิยามบน ={0,1} ที่สอดคล้องกับ LS = SL LS = SL L เป็น proper nonempty subset ของ S L เป็น proper nonempty subset ของ S L ไม่เป็น { } L ไม่เป็น { } ?