Mordechai Ben-Ari

Mathematical Surprises

OPEN ACCESS A Springer

Mathematical Surprises

Mordechai Ben-Ari

Mathematical Surprises

A Springer

Mordechai Ben-Ari

Department of Science Teaching Weizmann Institute of Science Rehovot, Israel

ISBN 978-3-031-13565-1 ISBN 978-3-031-13566-8 (eBook) https://doi.org/10.1007/978-3-03 1-13566-8

Mathematics Subject Classification (2020): 00-01, 51Nxx, 05-01, 00A07

© The Editor(s) (if applicable) and The Author(s) 2022. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 Inter- national License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Foreword

If everyone were exposed to mathematics in its natural state, with all the challenging fun and surprises that that entails, I think we would see a dramatic change both in the attitude of students toward mathematics, and in our conception of what it means to be “good at math.”

Paul Lockhart

I’m really hungry for surprises because each one makes us ever-so-slightly but substantially smarter. Tadashi Tokieda

Mathematics, when appropriately approached, can provide us with plentiful pleas- ant surprises. This is confirmed by a Google search of “mathematical surprises,” which, surprisingly, yields almost half a billion items. What is a surprise? The ori- gins of the word trace back to Old French with roots in Latin: “sur” (over) and “prendre” (to take, to grasp, to seize). Literally, to surprise is to overtake. As a noun, surprise is both an unanticipated or bewildering event or circumstance, as well as the emotion caused by it.

Consider, for example, an extract from a lecture by Maxim Bruckheimer! on the Feuerbach circle: “Two points lie on one and only one straight line, this is no surprise. However, three points are not necessarily on one straight line and if, during a geometrical exploration, three points ‘fall into’ a straight line, this is a surprise and frequently we need to refer to this fact as a theorem to be proven. Any three points not on a straight line lie on one circle. However, if four points lie on the same circle, this is a surprise that should be formulated as a theorem. ... Insofar as the number of points on a straight line is larger than 3, so is the theorem the more surprising.

1 Maxim Bruckheimer was a mathematician who was one of the founders of the Open University UK and Dean of its Faculty of Mathematics. He was Head of the Department of Science Teaching at the Weizmann Institute of Science.

vi Foreword

Likewise, insofar as the number of points lying on one circle is larger than 4, so is the theorem the more surprising. Thus, the statement that for any triangle there are nine related points on the same circle ... is very surprising. Moreover, in spite of the magnitude of the surprise, its proof is elegant and easy.”

In this book Mordechai Ben-Ari offers a rich collection of mathematical surprises, most of them less well known than the Feuerbach Circle and with sound reasons for including them. First, in spite of being absent from textbooks, the mathematical gems of this book are accessible with just a high school background (and patience, and paper and pencil, since fun does not come for free). Second, when a mathemat- ical result challenges what we take for granted, we are indeed surprised (Chaps. 1, 13). Similarly, we are surprised by: the cleverness of an argument (Chaps. 2, 3), the justification of the possibility of a geometric construction by algebraic means (Chap. 16), a proof relying on an apparently unrelated topic (Chaps. 4, 5), a strange proof by induction (Chap. 6), new ways of looking at a well-known result (Chap. 7), a seemingly minor theorem becoming the foundation of a whole field of mathe- matics (Chap. 8), unexpected sources of inspiration (Chap. 9), rich formalizations emerging from purely recreational activities such as origami (Chaps. 10-12). These are all different reasons for the inclusion of the pleasant, beautiful and memorable mathematical surprises in this lovely book.

So far I have addressed how the book relates to the first part of the definition of surprise, the cognitive rational reasons for the unexpected. As to the second aspect, the emotional aspect, this book is a vivid instantiation of what many mathematicians claim regarding the primary reason for doing mathematics: it is fascinating! More- over, they claim that mathematics stimulates both our intellectual curiosity and our esthetic sensibilities, and that solving a problem or understanding a concept provides a spiritual reward, which entices us to keep working on more problems and concepts.

It has been said that the function of a foreword tell readers why they should read the book. I have tried to accomplish this, but I believe that the fuller answer will come from you, the reader, after reading it and experiencing what the etymology of the word surprise suggests: to be overtaken by it!

Abraham Arcavi

Preface

Godfried Toussaint’s article on the “collapsing compass” [50] made a profound impression on me. It would never have occurred to me that the modern compass with a friction joint is not the one used in Euclid’s day. In this book I present a selection of mathematical results that are not only interesting, but that surprised me when I first encountered them.

The mathematics required to read the book is secondary-school mathematics, but that does not mean that the material is simple. Some of the proofs are quite long and require that the reader be willing to persevere in studying the material. The reward is understanding of some of the most beautiful results in mathematics. The book is not a textbook, because the wide range of topics covered doesn’t fit neatly into a syllabus. It is appropriate for enrichment activities for secondary-school students, for college-level seminars and for mathematics teachers.

The chapters can be read independently. (An exception is that Chap. 10 on the axioms of origami is a prerequisite for Chaps. 11, 12, the other chapters on origami.) Notes relevant to all chapters are given below in list labeled Style.

What Is a Surprise?

There were three criteria for including a topic in the book:

e The theorem surprised me. Particularly surprising were the theorems on con- structibility with a straightedge and compass. The extremely rich mathematics of origami was almost shocking: when a mathematics teacher proposed a project on origami, I initially turned her down because I doubted that there could be any serious mathematics associated with the art form. Other topics were included because, although I knew the results, their proofs were surprising in their ele-

Vii

viii Preface

gance and accessibility, in particular, Gauss’s purely algebraic proof that a regular heptadecagon can be constructed.

e The material does not appear in secondary-school and college textbooks, and I found these theorem and proofs only in advanced textbooks and in the research literature. There are Wikipedia articles on most of the topics, but you have to know where to look and the articles are often outlines.

e The theorems and proofs are accessible with a good knowledge of secondary- school mathematics.

Each chapter concludes with a paragraph What Is the Surprise? which explains my choice of the topic.

An Overview of the Contents

Chapter 1 presents Euclid’s proof that any construction that is possible with a fixed compass is possible with a collapsing compass. Many proofs have been given, but, as Toussaint shows, most are incorrect because they depend on diagrams which do not always correctly depict the geometry. To emphasize that one must not trust diagrams, I present the famous alleged proof that every triangle is isoceles.

Over the centuries mathematicians unsuccessfully sought to trisect an arbitrary angle (divide it into three equal parts) using only a straightedge and compass. Underwood Dudley made a comprehensive study of trisectors who find incorrect constructions; most constructions are approximations that are claimed to be accu- rate. Chapter 2 starts by presenting two of these constructions and developing the trigonometric formulas showing that they are only approximations. To show that trisection using just a straightedge and compass is of no practical importance, trisec- tions using more complex tools are presented: Archimedes’s neusis and Hippias’s quadratrix. The chapter ends with a proof that it is impossible to trisect an arbitrary angle with a straightedge and compass.

Squaring a circle (given a circle construct a square with the same area) cannot be performed using a straightedge and compass, because the value of z cannot be constructed. Chapter 3 presents three elegant constructions of close approximations to x, one by Kochański and two by Ramanujan. The chapter concludes by showing that a quadratrix can be used to square a circle.

The four-color theorem states that it is possible to color any planar map with four colors, such that no countries with a common boundary are colored with the same color. The proof of this theorem is extremely complicated, but the proof of the five-color theorem is elementary and elegant, as shown in Chapter 4. The chapter also presents Percy Heawood’s demonstration that Alfred Kempe’s “proof” of the four-color theorem is incorrect.

Preface ix

How many guards must be employed by an art museum so that all the walls are under constant observation by at least one guard? The proof in Chapter 5 is quite clever, using graph coloring to solve what at first sight appears to be a purely geometrical problem.

Chapter 6 presents some lesser-known results and their proofs by induction: theorems on Fibonacci numbers and Fermat numbers, McCarthy’s 91 function, and the Josephus problem.

Chapter 7 discusses Po-Shen Loh’s method of solving quadratic equations. The method is a critical element of Gauss’s algebraic proof that a heptadecagon can be constructed (Chapter 16). The chapter includes al-Khwarizmi’s geometric construc- tion for finding roots of quadratic equations and a geometric construction used by Cardano in the development of the formula for finding roots of cubic equations.

Ramsey theory is a topic in combinatorics that is an active area of research. It looks for patterns among subsets of large sets. Chapter 8 presents simple examples of Schur triples, Pythagorean triples, Ramsey numbers and van der Waerden’s problem. The proof of the theorem on Pythagorean triples was accomplished recently with the aid of a computer program based on mathematical logic. The chapter concludes with a digression on the ancient Babylonians’ knowledge of Pythagorean triples.

C. Dudley Langford observed his son playing with colored blocks and noticed that he had laid them out in an interesting sequence. Chapter 9 presents his theorem on the conditions for such a sequence to be possible.

Chapter 10 contains the seven axioms of origami, together with the detailed calculations of the analytic geometry of the axioms, and characterizations of the folds as geometric loci.

Chapter 11 presents Eduard Lill’s method and the origami fold proposed by Margharita P. Beloch. I introduce Lill’s method as a magic trick so I won’t spoil it by giving details here.

Chapter 12 shows that origami can perfom constructions not possible with a straightedge and compass: trisecting an angle, squaring a circle and constructing a nonagon (a regular polygon with nine sides).

Chapter 13 presents the theorem by Georg Mohr and Lorenzo Mascheroni that any construction with a straightedge and compass can be performed using only a compass.

The corresponding claim that a straightedge only is sufficient is incorrect, because a straightedge cannot compute lengths that are square roots. Jean-Victor Poncelet conjectured and Jakob Steiner proved that a straightedge is sufficient, provided that there exists a single fixed circle somewhere in the plane (Chap. 14).

If two triangles have the same perimeter and the same area must they be congru- ent? That seems reasonable but it turns out not to be true, although it takes quite a bit of algebra and geometry to find a non-congruent pair as shown in Chap. 15.

x Preface

Chapter 16 presents Gauss’s tour-de-force: a proof that a heptadecagon (a regular polygon with seventeen sides) can be constructed using a straightedge and compass. By a clever argument on the symmetry of the roots of polynomials, he obtained a formula that uses only the four arithmetic operators and square roots. Gauss did not give an explicit construction of a heptadecagon, so the elegant construction by James Callagy is presented. The chapter concludes with constructions of a regular pentagon based on Gauss’s method for the construction of a heptadecagon.

To keep the book as self-contained as possible, Appendix A collects proofs of theorems of geometry and trigonometry that may not be familiar to the reader.

Style

e The reader is assumed to have a good knowledge of secondary-school mathemat- ics, including:

Algebra: polynomials and division of polynomials, monic polynomials—those whose coefficient of the highest power is 1, quadratic equations, multiplication of expressions with exponents a” - a” = a’™"*",

Euclidean geometry: congruent triangles AABC = ADEF and the criteria for congruence, similar triangles AABC ~ ADEF and the ratios of their sides, circles and their inscribed and central angles.

Analytic geometry: the cartesian plane, computing lengths and slopes of line segments, the formula for a circle.

Trigonometry: the functions sin, cos, tan and the conversions between them, angles in the unit circle, the trigonometric functions of angles reflected around an axis such as cos(180° 6) = cos 8.

e Statements to be proved are called theorems with no attempt to distinguish between theorems, lemmas and corollaries.

e When a theorem follows a construction, the variables that appear in the the- orem refer to labeled points, lines and angles in the figure accompanying the construction.

e The full names of mathematicians have been given without biographical infor- mation that can be found easily in Wikipedia.

e The book is written so that it is as self-contained as possible, but occasionally the presentation depends on advanced mathematical concepts and theorems that are given without proofs. In such cases, a summary of the material is presented in boxes which may be skipped.

e There are no exercises but the ambitious reader is invited to prove each theorem before reading the proof.

Preface xi

e Geometric constructions can be studied using software such as Geogebra. e AB is used both for the name of a line segment and for the length of the segment. e AABC is used both for the name a triangle and for the area of the triangle.

Acknowledgments

This book would never have been written without the encouragement of Abraham Arcavi who welcomed me to trespass on his turf of mathematics education. He also graciously wrote the foreword. Avital Elbaum Cohen and Ronit Ben-Bassat Levy were always willing to help me (re-)learn secondary-school mathematics. Oriah Ben- Lulu introduced me to the mathematics of origami and collaborated on the proofs. I am grateful to Michael Woltermann for permission to use several sections of his reworking of Heinrich Dorrie’s book. Jason Cooper, Richard Kruel, Abraham Arcavi and the anonymous reviewers provided helpful comments.

I would like to thank the team at Springer for their support and professionalism, in particular the editor Richard Kruel.

The book is published under the Open Access program and I would like to thank the Weizmann Institute of Science for funding the publication.

The I4TRX source files for the book (which include the TikZ source for the diagrams) are available at:

https://github.com/motib/surprises

Mordechai (Moti) Ben-Ari

Contents

1 The Collapsing Compass os¢ 6s vepeesedes blew acs ridveuscescdiews 1 1.1 Construction with a Straightedge and Compass.................. 2 1.2 Fixed Compasses and Collapsing Compasses ................... 2 1.3 Euclid’s Construction for Copying a Line Segment............... k 1.4 A Flawed Construction for Copying a Line Segment ............. 5 L3. Doni Toeta Diagrami epres si terrre IIE RARER Seas ber Rete ES 7 2 ‘Trisectionof an Angle eose rrirerceviriresitorr tEn EREE DEE EERE 11 2:1 Approxiate TMSCHons ... nance cbereneedsseiekehedesneew ke 11 a2. “Weisechow, Usnea NOUS oc ackiereriakeecese iin ERE 16 2.4 Doubling the Cube with a Neusis: ....i.0c00ssc4 scorer eue ves 18 24 ‘Trisection Using a Quadratrix 2.24.5 050ccccecsssdsdenssosrvians 19 2 “Soiree nels: Number cierre hee eka beeeSecedao oeri 21 2.6 Constructible Numbers As Roots of Polynomials ................ 24 2.7 Impossibility of the Classical Constructions .................... 26 3 Sguarma Me CWE: cacoskinideeiebin us itoe aged teas sea EENE 29 SA Kochański s Construción, sesercer pb eereedine EENE CEEA 30 32 Ramanujan’s First Consttucton esr 2.2535)aseeee in eusseeduweds 32 3.3 Ramanujan’s Second Construction ................ 0000. e ee eee 35 34 Squaring a Circle Using a Quadratrix ....................00 000. 38 4 The Five-Color Theorem.............. 0.0... eee eee eee kropki 41 4.1 Planar Maps and Graphis .i2<scoscedvanspeteteinesesotaeidess 41 4.0 Eulers Pome .2ciccrtacedcapeeierdecrieeerebeeneeeaie bes 44 4.3°° Nop- planar Graphe sccccpcceanceeepeeea drone need dadeenreaiees 45 44 “The Degrees of the Verices ss .uccis se renret ke beXekederiucads 46 a> “She Sa Color TICO e ich auhieee ch euetee se iin a Aroer 48 4:6 The Five-Color Theorem ..22:cacin.bnsdeber se Tina erkk kewaa 48

xiii

xiv

Contents 4.7. Kempe’s Incorrect Proof of the Four-Color Theorem ............. 50

How to Guard a Museum o.an I3 5.1 Coloring T Pelyyore E EEE E 5.2 From Coloring of Polygons to Guarding a Museum 56 5.3 Any Polygon Can Be Triangulated ...2cccscssacciausiiaxseecaa SF

et aie Axiom of Mathematical ile. SNETT EEE NENN TATE: G2 Fibonacci Nomba s s0565i0cccsnseiviceerssdbiwiseessadencse BO 6.3 Fermat Numbers ........ kreida biiretti kehi PEN EA seata 66 6.4 a a ei ac eat aa EN CIR 67 6.5 The Josephus Problem . A EN EEEO E PEIE Oo

ji 3 cei ns 7i T2 TA 7S 7.6 11 7.8

Ramsey TONY sce. cecheureedisderacbiee deek api eeeeiiebionia B 8.1 Schur triple

8.2 8.3 8.5 apai Me od ccxacsess sacceraccnrvntoacccas 94 8.6 SAT Solving.. j LEETE PEET 8.7 Pythagorean Triples ı in Bagian ionene, ere rervoe iesta ON

ring Probie. ELTE LOETI TIET 106 Is Langford’s Problem Solvable? ......... 107

Contents XV

10

11

12

13

14

The Axioms of Origami .....i..2.0.cccesscecckesseeseaceveessne ccna LID POL Axiom l cispcsiseeser tinepi ee rashke oee A 10.2 Axiom2..... rarae EEATT EEE ee sramio WOES ARS 2h cht octane kei es isee erirken A DO Aiea) 24 65t cuuseatads Naa be TOS AKON S 65 i+122he iene breed ebeiebiarseteeerriceerioeescacad 1D E e ETE E T EDE E E E Pol WSF i AAEE POT LEE i ee ee

11.1 pene ‘ick . 112 Sperificadonof Li Aethod és T T

11.3 Peat of Lats Malad caibineit d? 114 The Baloch Fold occc0ccciicncovertineraedecressoatecnavsn LOS

ructions Using Origami.......................... 141 e's Trisection of an Angle « acioccccncsasecdviweesweseveness LE 12.2 Martin’s Trisection of an Angle ........600.20000000eeee eens 143

12.3 Messer’s Doub ... 144 TZA Beloch's Doubling of at bey suunedsieiesticeriasinsetdenany dt 12.5 Construction of a Regular Nonagon ........................... 148

A opine Is dna ce eer ee ee Se ente hehe ead ol

Sas US hie ARI a ee 13.5 Construction of a Line Segment as a Ratio of Se eg 13.6 Construction of the Interse tion of Two Lines . EE EEE 13.7 Construction of the Intersection of a Line and a Circle .. acoeq ll

14. 1 What fen

14.2 Construction of a Line Parallel to a C 14.3 Construction of a Perpendi

14.4 Copying a Line Segment in : it 14.5 Construction a a Line e Segment as a Ratio of Segments ..

xvi Contents 15 Are Triangles with Equal Areas and Perimeters Congruent? ........ 175 13.1 From a Trhanele toan Eliphe ture erressirieret eutsi reiri 175 15.2 Solving the Equation for the Elliptic Curve ..................... 178 15.3 Derivation of a Triangle From the Elliptic Curve ................ 180 16 Construction of a Regular Heptadecagon ......................... 183 16.1 Construction of Regular Polygons ................. 0.00. eee eee 184 16.2 The Fundamental Theorem of Algebra. .ucccsccsscnssceneseencens 185 16.5 Roots ONY 225320525 08ca¢0se48o E reeerensees 185 16.4 Gauss’s Proof That a Heptadecagon Is Constructible ............. 187 16.5 Derivation of Gases FORMULA vh.ckcs jadeseten ged eesieeen dads 191 16.6 Construction of a Heptadecagon .+.:sccrccrsrrsccrorierirecses 192 16.7 Construction of a Regular Pentagon ....4.¢2023 i552 ee0ebhe ss ones 196 A Theorems From Geometry and Trigonometry ..................... 199 A.1 Theorems About Triangles .............. 0.002. eee 199 A.2 Trigonometric Identities .......... 00... eee ee eee 202 A.3 The Angle Bisector Theorems ................ 0.0.0.0 cee eee eee 211 ALA Ptolemy's Theorem... cer ceeris eee gee see te tria eo be eee 212 AS Ceva s Theorems: ee i once tie bo there ae eee wee EO 215 A.6 Menelaus’s Theorem............. 0... eee eee eee 217 References 25:33 anene ened iila EE ia e ee eee) ewes AAE ae 219

Chapter 1

The Collapsing Compass

A

Check for updates |

A modern compass is a fixed compass: the distance between the two legs can be fixed so that it is possible to copy a line segment or a circle from one position to another (Fig. 1.1a). Euclid used a collapsing compass where a fixed distance cannot be maintained (Fig. 1.1b). Teachers often use a collapsing compass consisting of a marker tied to a string that is used to construct a circle on a whiteboard. Itis impossible to maintain a fixed length when the compass is removed from the whiteboard.

`

Fig. 1.la A fixed compass. One leg has a needle that is placed at the center of the circle. A pencil attached to the other leg is used to draw the circle. The legs are joined by a tight hinge so that the distance between the legs (the radius of the circle) is maintained even when the compass is lifted from the paper.

T

Fig. 1.1b A collapsing compass. The user holds a piece of string at the center of the circle. The other end of the string is tied to a pencil and is used to draw the circle. When the compass is lifted from the paper, the fingers (dashed) can easily slip to a new position.

This chapter begins with a discussion of the relevance of studying construction with a straightedge and compass (Sect. 1.1). Section 1.2 compares the two types of

© The Author(s) 2022

M. Ben-Ari, Mathematical Surprises, https://doi.org/10.1007/978-3-031-13566-8_1

2 1 The Collapsing Compass

compasses in the most elementary construction: a perpendicular bisector. Section 1.3 presents Euclid’s method of copying a line segment using a collapsing compass. This proves that any construction that can be done using a fixed compass can be performed using a collapsing compass. Section 1.4 shows a proof of this theorem which seems to be correct, but does not work for all configurations of lines and points. To emphasize that one must not trust diagrams, Sect. 1.5 presents a famous alleged proof that all triangles are isoceles; the proof appears to be correct but it is not because the proof is based on an incorrect diagram.

1.1 Construction with a Straightedge and Compass

Construction with a straightedge and compass used to be the fundamental concept taught in Euclidean geometry. Recently, it has fallen out of favor in school curricula. It is certainly true that the topic has little, if any, practical use. As we show in Sects. 2.2, 2.3, 2.4, 3.4, the Greeks knew how to perform constructions that are impossible with a straightedge and compass by using tools only slightly more advanced. Today, using numerical methods, computers can perform constructions to any desired precision. Nevertheless, I believe that there are advantages to studying constructions:

e Itis more fun and more challenging to learn geometry through constructions than simply to read theorems and proofs.

e Significant breakthroughs in mathematics have been achieved by attempts to find constructions. Chapter 16 presents a construction by Gauss that led to modern abstract algebra, in particular, the theory developed by Evariste Galois.

e Itis somewhat counterintuitive and therefore very interesting that it can be proved that it is impossible to construct some geometric objects.

e Sadly, there a many people who waste years of their lives trying to perform impossible constructions. Students should certainly be aware of the futility of such efforts.

1.2 Fixed Compasses and Collapsing Compasses

Some geometry textbooks present the construction of a perpendicular bisector of a line segment by constructing two circles centered at the ends of the line segment such that the radii are equal and greater than half the length of the segment (Fig. 1.2a). This can only be done with a fixed compass because after drawing the circle centered at A, the distance between the legs of the compass needs to remain fixed to draw the circle centered at B.

1.3 Euclid’s Construction for Copying a Line Segment 3

C C A B A B D D Fig. 1.2a Construction of a perpendicular bi- Fig. 1.2b Construction of a perpendicular bi- sector with a fixed compass sector with a fixed or a collapsing compass

Figure 1.2b shows the construction of a perpendicular bisector with either a fixed or a collapsing compass. Two circles are constructed: one centered at A with radius AB andone centered at B with radius BA. This can be done with a collapsing compass because (obviously) AB = BA, so the compass does not have to “remember” the length of AB to construct a circle centered at B with the same radius. The proof that the line constructed shown in Fig. 1.2a is a perpendicular bisector is not at all elementary because relatively advanced concepts like congruent triangles have to be used. However, the proof that the construction of a perpendicular bisector shown in Fig. 1.2b is correct is simple and based on the fact that AABC is an equilateral triangle. In fact, this is the first proposition in Euclid’s Elements. AC = AB since they are radii of the same circle and, similarly, BC = BA. We have: AC = AB = BA = BC.

Figure 1.3a shows that for the construction with a fixed compass, the triangle will

be an isosceles, not necessarily an equilateral, triangle (Fig. 1.3b).

1.3 Euclid’s Construction for Copying a Line Segment

The second proposition of Euclid’s Elements describes how to copy a given line segment AB to a segment of the same length, one of whose end points is a given point C. Therefore, a fixed compass adds no additional capabilities and a collapsing compass is sufficient, although constructions are easier with a fixed compass.

Theorem 1.1 Given a line segment AB and a point C, a line segment CC’, one of

whose endpoints is C, can be constructed using a collapsing compass, such that AB = CC’ (Fig. 1.4a).

4 1 The Collapsing Compass

C C A B A B D D Fig. 1.3a Construction of an isoceles triangle Fig. 1.3b Construction of an equilateral tri- with a fixed compass angle with a collapsing compass

Proof Construct the line segment AC. Construct the equilateral triangle AACD whose base is AC (Fig. 1.4b). By Euclid’s first proposition, the triangle can be constructed using a collapsing compass. Construct the ray that is an extension of the line segment from D to A, and construct the ray that is an extension of the line segment from D to C (Fig. 1.5a). Construct the circle centered at A with radius AB and denote the intersection of the circle and the ray extending DA by E (Fig. 1.5b). Construct the circle centered at D with radius DE and denote the intersection of the circle and the ray extending DC by F (Fig. 1.6).

DC = DA because AACD is equilateral. AE = AB are radii of the same circle, as are DF = DE. Therefore:

CF = DF - DC = DE- DC = DE- DA=AE=AB. go

The specification of the directions of the rays is essential. The proof here works for any line segment AB and any point C, regardless of its position relative to AB. By specifying directions the “cone” enclosed by the two rays will intersect the circles even if AC > AB (Fig. 1.7).

C A

Fig. 1.4a Copy the line segment AB. The Fig. 1.4b Copying a line segment with a col- orientation of CC” is not important. lapsing compass

1.4 A Flawed Construction for Copying a Line Segment 5

D B B

aS

E

Fig. 1.5a Constructing rays from D Fig. 1.5b Constructing a circle with radius AB

1.4 A Flawed Construction for Copying a Line Segment

Proof Construct three circles: one centered at A with radius AB, one centered at A with radius AC, and one centered at C with radius AC = CA. Denote the intersections of the circles centered at A and C by E and F, respectively, and denote an intersection of the circle centered at C and the circle centered at A with radius AB by D. If AC > AB, the construction is as shown in Fig. 1.8.

Construct a circle centered at E with radius ED. Denote the intersection of this circle with the circle centered at A with radius AC by G. There are two intersections, so choose the one closer to C (Fig. 1.9). CD = CE are radii of the same circle as are AE = AG. By construction the radii CE and AE are equal. Therefore,

CD =CE=AE=AG.

EG = ED are radii of the same circle, so AEAG =% ADCE by side-side-side and LGEA = ZDEC.

Fig. 1.6 Construction of CF = AB

6 1 The Collapsing Compass

Fig. 1.7 Construction for AC > AB

Since: ¿GEC = LGEA-ZCEA = ZDEC-/CEA = ZDEA,

it follows that AADE = ACGE by side-angle-side. AB = AD are radii of the smaller circle centered at A, so GC = AD = AB. o

The proof is correct only if AC > AB. Figure 1.10 shows a diagram where AC < AB and you can see that AB + GC.

F

Fig. 1.8 Construction for copying a line segment (1)

1.5 Don’t Trust a Diagram 7

Fig. 1.9 Construction for copying a line segment (2)

1.5 Don’t Trust a Diagram

Theorem 1.2 (Incorrect, of course) All triangles are isosceles.

Proof (Incorrect) Given an arbitrary triangle AABC, let P be the intersection of the angle bisector of ZBAC and the perpendicular bisector of BC. The intersections of the altitudes from P to the sides AB, AC are denoted by E, F, respectively (Fig. 1.11). AAPE = AAPF because they are right triangles with equal angles œ and common side AP. ADPB = ADPC since they are right triangles, PD is a common side and BD = CD =a. AEPB = AFPC since they are right triangles, EP = PF by the first congruence and PB = PC by the second congruence. By combining the equations we get that AABC is isoceles:

AB = AE + EB = AF + FC = AC. Gi

The logic of the proof is correct, but the diagram upon which the proof is based is not correct because point P is outside the triangle (Fig. 1.12).

Le

y |

Fig. 1.10 A diagram for which the proof doesn’t work

=

8 1 The Collapsing Compass

a

D a

Fig. 1.11 An incorrect proof that all triangles are isoceles

What Is the Surprise?

As a student I took it for granted that a compass has a friction joint that maintains the distance between the point and the pencil when it is lifted from the paper. When the teacher used a compass made from a piece of string and a piece of chalk, I never imagined that it differed from my compass. The article by Gotfried Toussaint was a real surprise, as was his demonstration that post-Euclid proofs were incorrect because they depended on diagrams that made unwarranted assumptions. I recommend the article to readers who wish to deepen their understanding of proofs in mathematics.

Fig. 1.12 Why the construction doesn’t work

1.5 Don’t Trust a Diagram 9

Sources

This chapter is based on [50]. The incorrect construction of the equivalence of the two compasses in Sect. 1.4 is from [37]. A comprehensive English translation of Euclid’s Elements together with an extensive commentary [22] was written by Thomas L. Heath, one of the foremost experts in Greek mathematics.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

A Check for

Chapter 2 ordets Trisection of an Angle

It is impossible to trisect an arbitrary angle (divide the angle into three equal parts) using only a straightedge and compass. Trisection requires the construction of cube roots, but a straightedge and compass can only construct lengths that are expressions built from integers, the four arithmetic operators and square roots. This was proved by Pierre Wantzel in 1837. Nevertheless, innumerable amateurs continue to attempt to trisect an angle. Their constructions are approximations though they are convinced that the constructions are correct. Section 2.1 presents two such constructions, de- velops formulas for the angles and shows the errors in the approximations.

Greek mathematicians discovered that if other instruments are allowed, angles can be trisected. Section 2.2 explains a construction by Archimedes using a simple instrument called a neusis and Sect 2.3 shows how to double a cube using the neusis. Section 2.4 presents a construction for trisection by Hippias using an instrument called a quadratrix. The rest of the chapter contains a proof of the impossibility of trisecting an angle. Section 2.5 characterizes constructible numbers, Sect. 2.6 relates constructible numbers to roots of polynomials and Sect. 2.7 uses this theory to show that trisecting an angle and doubling a cube are impossible.

2.1 Approximate Trisections

2.1.1 First Approximate Trisection

Construction: Let 6 = ZAOB be an arbitrary angle and without loss of generality assume that A, B are on a unit circle whose center is O. Bisect ZAOB and let C be the intersection of the bisector with the unit circle. Let D be the midpoint of OA and let T be the midpoint of the DC. Denote the angle ZDOT by ¢ (Fig. 2.1).

© The Author(s) 2022 11 M. Ben-Ari, Mathematical Surprises, https://doi.org/10.1007/978-3-03 1-13566-8 2

12 2 Trisection of an Angle

Fig. 2.1 First approximate trisection (1)

Theorem 2.1

2 sin(6/2) 1+2cos(@/2) `

tan @ =

Proof Figure 2.2 is extracted from Fig. 2.1 and contains additional annotations. Let CF be the perpendicular to OA that intersects OA at F. Since OC = 1, CF = sin(6/2) and OF = cos(@/2). Let TE be the perpendicular to OA that intersects OA at E. T is the midpoint of DC so DT = TC = a. But FT is the median to the hypotenuse of a right triangle, so FT = a and therefore ADT F is isoceles. It follows that TE is the both the median and the altitude of DF. From the diagram it is easy to see that:

Compute the length 2a = CD using Pythagoras’s Theorem in ADCF:

@ 1\7 6 ae vt 2U (2a)~ = [cos 5 5) +sin

2.1 Approximate Trisections 13

sin(@/2)

< cos( 0/2) > < (cos(@/2)-(1/2)) >

<— (1/2) + (1/2) (cos(@/2) (1/2)) —>

Fig. 2.2 First approximate trisection (2)

The length h = TE can be computed from Pythagoras’s Theorem in ADTE:

h tan ġ = = ———<—— = ——. 0 OE ee eae 1+2cos = 2 2 2 2 2

This is an approximation to a trisection ¢ = 6/3. For 6 = 60°:

4 | 2 sin 30° tan

-1 o o ear ee Í = 20.1° x 20°. 1 +2 cos z) ee 0 j

Table 2.1 shows the errors for a range of acute angles. The error is relatively small for small angles, rising to 1% at 85°.

14 2 Trisection of an Angle

Table 2.1 Errors in the first approximate trisection

a(°) 0/3(°) tan™! (°) Error(°) Error(%)

5 1.667 1.667 0.000 0.004 10 3.333 3.334 0.000 0.014 15 5.000 5.002 0.002 0.032 20 6.667 6.670 0.004 0.057 25 8.333 8.341 0.007 0.088 30 10.000 10.013 0.013 0.128 35 11.667 11.687 0.020 0.174 40 13.333 13.364 0.030 0.228 45 15.000 15.043 0.043 0.289 50 16.667 16.726 0.060 0.358 55 18.333 18.413 0.080 0.435 60 20.000 20.104 0.104 0.520 65 21.667 21.799 0.133 0.612 70 23.333 23.500 0.166 0.713 75 25.000 25.206 0.206 0.823 80 26.667 26.918 0.251 0.941 85 28.333 28.636 0.303 1.068

2.1.2 Second Approximate Trisection

Construction: Let 6 = ZAOB be an arbitrary angle and without loss of generality assume that A, B are on a unit circle whose center is O. Construct a circle of radius 1/3 with center O and let D be its intersection with OA. Bisect ZAOB and let C be the intersection of the bisector with the circle of radius 1/3. Construct the chord CD and the chords AE = ET = CD. Since equal chords subtend equal central angles ZTOE = ZEOA = ¢ (Fig. 2.3).

Theorem 2.2 1 2.9 cos¢@=1- g0 —cos(@/2)) = 1 9 sin (0/4). Proof By the Law of Cosines in ADOC:

1 1 1

2 2 2 1 2 CD = (5] + (5) = (5] (5) cos(@/2) = rie cos(6/2)).

By the Law of Cosines in AEOA:

AE = 17 +17-2-1-1-cos¢ =2(1 —cos¢).

2.1 Approximate Trisections 15

Fig. 2.3 Second approximate trisection

Equating the two expressions for CD = AE and simplifying we get: 1 cosġ = 1- z0 cos(0/2)).

Since cos 2e = cos? a sin? a = 1 2 sin? a, and therefore 1 cos 2a = 2 sin? Q,

we have the alternate formula: 2.3 cos ġ = 1- 9 sin~(6/4) . This is an approximation to a trisection = 6/3. For 6 = 60°: -1 1 o o o 2 cos Pall -cos 30°) zx 19.8° = 20°.

Table 2.2 shows the errors for a range of acute angles. This construction is much less accurate than the one in Sect. 2.1.1.

16

Table 2.2 Errors in the second approximate trisection

2 Trisection of an Angle

oC) -A/3(°)

5 1.667 10 3.333 15 5.000 20 6.667 25 8.333 30 10.000 35 11.667 40 13.333 45 15.000 50 16.667 55 18.333 60 20.000 65 21.667 70 23.333 75 25.000 80 26.667 85 28.333

cos™! 2 (°)

1.667 3.332 4.997 6.659 8.319 9.975 11.626 13.273 14.914 16.549 18.177 19.797 21.408 23.011 24.603 26.185 27.756

Error (°)

0.000 0.001 0.003 0.008 0.015 0.025 0.040 0.060 0.086 0.118 0.156 0.203 0.258 0.322 0.397 0.481 0.577

Error (%)

0.007 0.028 0.063 0.113 0.176 0.254 0.346 0.451 0.571 0.705 0.853 1.015 1.192 1.382 1.586 1.805 2.038

2.2 Trisection Using a Neusis

The term straightedge is used instead of ruler because a straightedge has no marks on it. It can only be used to construct a straight line between two given points. Archimedes showed that a neusis, a straightedge with two marks that are a fixed distance apart, can be used to trisect an angle (Fig. 2.4). We define the distance

between the marks to be 1.

Construction: Let a = ZABE be an arbitrary angle in a unit circle with center B, where the radius of the circle equals the distance between the marks on the neusis. Extend the radius EB beyond the circle. Place an edge of the neusis on A and move it until it intersects the extension of EB at D and the circle at C, using the marks so that the length of the line segment CD is 1.! Construct the line AD. Denote CDB = B

(Fig. 2.5).

Fig. 2.4 A neusis

1 This operation is called verging.

2.2 Trisection Using a Neusis 17

Fig. 2.5 The neusis construction for trisecting an angle (1)

Theorem 2.3 8 = a/3.

Proof Construct BC and denote the angles and line segments as shown in Fig. 2.6. AABC and ABCD are isoceles triangles: AB = BC are radii of the same circle and BC = CD by construction using the neusis. Since the sum of the angles of a triangle is equal to 180° and the sum of supplementary angles is also equal to 180°, we have:

e = 180° - 28

y = 180° -€ =28

ô = 180° 2y = 180° 46

a =180°-6-ß=3ß. o

Fig. 2.6 The neusis construction for trisecting an angle (2)

18 2 Trisection of an Angle

2.3 Doubling the Cube with a Neusis

Given a cube C construct another cube with twice its volume. If the volume of C is V its sides are of length WV. The sides of a cube with twice the volume are W2V = V2- VV. , SO if we can construct V2 we can double the cube.

Construction: Construct the unit equilateral triangle AABC and extend CA with another unit line segment to D. Construct rays extending AB and DB. Place the neusis on point C and move it until one mark on the neusis is placed on the ray AB at P and the other mark is placed on the ray DB at Q. Denote CQ = x and BP = y (Fig. 2.7).

Theorem 2.4 x = V2.

Proof Since AABC is equilateral, cos ZCAP = cos 60° = 5 and by the Law of Cosines in AAPC:

CP = AC +AP —2-AC- APcos60° (2.1a)

1 GIs a ly 2 eGeles (2.1b) x 4+2x =y +y. (2.10)

By Menelaus’s theorem (Thm. A.20):

Fig. 2.7 Doubing the cube with a neusis

2.4 Trisection Using a Quadratrix 19

Therefore: 1 1 2 ANST =] (2.2a) yx 1 xy=2. (2.2b)

Substituting Eq. 2.2b into Eq. 2.1c gives:

x? 42x = ae x2 x

42x? =4+2x

x3(x +2) =2(x +2)

x= V2. o

2.4 Trisection Using a Quadratrix

Let ABCD be a square. Let l; be a line segment placed initially at DC and let l2 be a line segment placed initially at AD. Move l; move at a constant linear velocity until it reaches AB and rotate / at a constant angular velocity clockwise on A until it also reaches AB. Assume that they reach AB together. For example, if lz rotates at 1°/second and the side of the square is 9 centimeters, /; must move at 0.1 cm/second. The trace of their point of intersection P is called a quadratrix curve or simply a quadratrix (Fig. 2.8a). Its definition is attributed to the mathematician Hippias.

A B A B

Fig. 2.8a A quadratrix curve Fig. 2.8b A quadratrix compass

20 2 Trisection of an Angle

A quadratrix can be constructed using a quadratrix compass as shown in Fig. 2.8b. It consists of two (unmarked) straightedges that move as described above. A joint constrains them to move together and traces out the curve.

A quadratrix can be used to trisect an angle. Construction: Let ZCDP, = a be an arbitrary angle, where P4 is the intersection of the line defining the angle æ relative to DC and the quadratrix. Construct a line through P; parallel to DC and denote its intersection with AD by E. Denote the line segment DE by t and trisect it (Sect. 2.5) to obtain point F that is t/3 from DC. Let P be the intersection of a line from F parallel to DC and the quadratrix, and denote by @ the angle between DC and DP) (Fig. 2.9).

Theorem 2.5 6 = a/3.

Proof E has y-coordinate 1 t so by construction F has y-coordinate 1 (t/3). Since the constant linear velocity of the horizontal line is proportional to the constant angular velocity of the rotating line 6/a@ = (t/3)/t and 6 = a@/3. o

Fig. 2.9 Trisection of an angle using a quadratrix

2.5 Constructible Numbers 21

2.5 Constructible Numbers

Let / be a line segment defined to be of length 1.

Definition 2.1 A number a is constructible if and only if a line segment of length a can be constructed with a straightedge and compass starting from /.

Given line segment / = AB, construct a line containing AB and use the compass to find a point C on the line that is a distance of 1 from B. Then AC is of length 2 so the number 2 is constructible. A line segment BD of length 1 can be constructed perpendicular to AB at B. The hypotenuse of the triangle AABD is of length V2 so the number V2 is constructible.

Theorem 2.6 A number is constructible if and only if it is the value of an expression built from the integers, the four arithmetic operations {+, —, x, /} and the operation of taking a square root y.

Proof First we show that the values of these expressions are constructible.

Addition and subtraction: Given line segments PO = a and RS = b, construct a circle centered at Q with radius b (Fig. 2.10). Extend PO until it intersects the circle at U. Then PTQU is a line segment, where PT =a—band PU =a+b.

P e

| S < a > ig ee b < a+b

Fig. 2.10 Construction of addition and subtraction

22 2 Trisection of an Angle

Ze Zo

b C b'c e wo O O a D A a/b D A < ab > < a > Fig. 2.11a Construction of multiplication Fig. 2.11b Construction of division

Multiplication: By similar triangles in Fig. 2.11a, (1/b) = (a/OA), so OA = ab. Division: By similar triangles in Fig. 2.11b, (1/b) = (OD/a), so OD = (a/b).

Square roots: Given a line segment BC = a, construct AB = 1 + a and a semicircle with AB as its diameter. Construct a perpendicular at C and let D be the intersection of the perpendicular and the circle (Fig. 2.12). ZADB is a right angle because it is subtended by a diameter. By similar triangles (h/1) = (a/h), so h? = a and h = Ya.

To prove the converse of the theorem, we need to determine what expressions can be constructed by a straightedge and compass. There are three constructions:?

1. Two lines intersect in a point (Fig. 2.13a). The coordinates of the intersection can be derived from the equations of the two lines y = x and y = 4x 2. The point of intersection is P = (2/3,2/3).

2. A line intersects a circle in zero, one or two points (Fig. 2.13b). The coordinates of the intersections can be derived from the equations of the line y = x and the circle x? + y? = 4. The points of intersection are P = (V2, V2) and Q = (-V2, - V2).

Fig. 2.12 Construction of a square root

2 For clarity these are illustrated for specific values rather than the most general equations.

2.5 Constructible Numbers 23

4 J 4 3 3 2 2 P 1 1 -3 -J -1 1 2 3 4 -3 1 2 3 4 -1 |/ Q -1 -2 -2 7 -3 Fig. 2.13a The point of intersection of two Fig. 2.13b The points of intersection of a line lines and a circle

3. Two circles intersect in zero, one or two points (Fig. 2.14). The coordinates of the intersections can be derived from the equations of the two circles (x—1)?+y? = 4, (x + 1)? + y? =4. The points of intersection are P = (0, V2), Q = (0,-V2). o

Fig. 2.14 The points of intersection of two circles

24 2 Trisection of an Angle

2.6 Constructible Numbers As Roots of Polynomials

To show that a number is not constructible, we need to prove that it cannot be expressed using just integers and the operations {+, —, x, /, y}.

We will show that constructible numbers are the roots of a certain class of polynomials and then prove that trisecting an angle and doubling a cube require the construction of roots of polynomials that are not in this class. Today these results are proved using field theory from abstract algebra, but here I give a proof that uses elementary mathematics. The proof is based on the following definition.

Definition 2.2 The depth of an expression built from the integers and the operators

{+, —, x, /, y} is the maximum level of nesting of square roots.

Example 2.1 Consider the following expression:

174 3V17 - 434- 2V17 - 2434 + 2VI7.

The depth is 3 because at the right of the expression we have V17 which is nested

within V34 + 2V17, which in turn is nested within V17 +: +- 2434 + 2 V17.

Theorem 2.7 A expression of depth n can be expressed as a + byc where a, b, c are

expressions of depth at most n 1.

Proof Simple computations show that the expressions (a, + b Vc) op (a2 + bayc) for the operators op = {+, —, x} result in expressions a + byc of depth n 1. For division the computation is a bit more complicated:

ai + biye z (a, + bi yc) (az bay c2) ay+boVe (ax + byc) (az - bac)

aiaz bbc a2bı aıb2

= + ve,

2_ 72 2_ 72 as b5c as b5c

which is of the form a + byc and of depth n 1. Finally, the square root of an expression of depth n 1 is an expression of depth n. o

Theorem 2.8 Let p(x) be a monic cubic polynomial with rational coefficients: Su 2 p(x) =x" +ax" +a\x+a0,

and let r = a + byc be a root of p(x) of minimal depth n, where a, b, c are of depth (at most) n 1. Then r’ = a byc is a root of p(x) andr + r'.

2.6 Constructible Numbers As Roots of Polynomials 25 Proof Let us compute p(r) which is equal to 0 since r is a root: (a+ byc)? + ax(a + byc)? + a; (a + byc) + ao (a? + 3a? byc + 3ab?c + becyc) +ay(a* + 2abyc + b?c) + ai (a + byc) + ag

(a> + 3ab*c + ana” + a2b?c + aja + ao) + (3a*b + b?’c + 2anab + a,b) Ve = d+evc =0.

where d, e are expressions of depth n 1 formed from the rational coefficients and a,b,c. Then yc = —d/e, so a + byc can be expressed as an expression of depth n 1, contracting the assumption that a + byc is of minimal depth n. Since yc + 0 and is of depth n, for d + ec to be zero it must be that d = e = 0.

Consider now r’ = a byc. By examining the above computation we see that p(r’) =d-evc =0+0- ve =0, sor’ is also a root of p.

Ifr =r’ then 0 =r r’ = 2byc, which is true only if b = 0 so r,r’ would be of depth n 1, again contradicting the assumption. o

Theorem 2.9 [fa monic cubic polynomial with rational coefficients: ee) 2 p(x) =x" +a2x^ + a,x + ao has no rational roots then none of its roots is constructible.

Proof By the Fundamental Theorem of Algebra (Thm. 16.1) p(x) has three roots r1, r2,r3. Let = a + byc be a root of minimal depth n. By the assumption that there are no rational roots, n > 1, and therefore b + 0 and c + 0. By Thm. 2.8, r2 =a byc is also a root. Perform the following multiplication:

(x = ri) (x = r2)(x = r3) = x? (r1 + r2 + r3)x? (2.3a) + (r1r2 + r1r3 +r2r3)X + r1r2r3 (2.3b)

az = —(r1 +r2 + r3) (2.3c)

r3 = —-(a2 +r, +r2). (2.3d)

Since az is rational so is: r3 = —d2 (rı +2) = —a2 2a,

contradicting the assumption. o

26 2 Trisection of an Angle

2.7 Impossibility of the Classical Constructions

Theorem 2.10 V2 is irrational.

Proof Assume that V2 is rational and equal to p/q where p, q are integers with no common factors other than +1. Then:

(p/q)° = (V2)° p> = 293, so p must be divisible by 2, say p = 2r. Now: 87° = 2q° g@=4r, so q is divisible by 2, contradicting the assumption that p, q have no common factor.O

Theorem 2.11 2 has no rational roots so it is impossible to double a cube with a straightedge and compass.

Proof One of its roots is V2 which by Thm. 2.10 is irrational. The other roots are the roots of the quadratic equation x? + V2x + (V2)? obtained by dividing x? 2 by x V2. It is easy to check that its roots are not rational (in fact, not even real). O

Theorem 2.12 Jt is impossible to trisect an arbitrary angle with a straightedge and

compass.

Proof It is sufficient to show the impossibility for one angle. Let us try to trisect 60° to obtain 20°. By Thm. A.6:

cos 3a@ = 4 cos? a 3 cos œ cos 60° = 4 cos? 20° 3 cos 20° .

Denote x = cos 20° and 2x by y. Since cos 60° = 1/2 we have:

E EG ie

8x3 - 6x -1=0

y -3y-1=0.

To prove that the polynomial y3 —3y—1 has no rational roots suppose that y = a/b is a rational root with a, b having no common factor other than +1. Then:

2.7 Impossibility of the Classical Constructions 27

(a/b 3(a/b) -1 =0 (2.4a) a> —3ab* = b? (2.4b)

a(a 3b’) = b’? (2.4c)

a? = b(b* + 3ab) . (2.4d)

By Eq. 2.4c, b must be divisible by a, and by Eq. 2.4d, a must be divisible by b, which is possible only if a = b = +1 and a/b = +1. By computation, y = a/b = 1 and y = a/b = —1 are not roots of the polynomial. o

An alternate way of proving the impossibility of the constructions is to use the following theorem which we present without proof.

1

Theorem 2.13 [fa monic polynomial p(x) = x" + an-1x"7 +++++ ao with integer

coefficients has rational roots then it has integer roots. To show the impossibility of duplicating a cube we need to show that: 3 xX 2 = (x = r2)(x - rı)(x ro)

has no integer roots. Since rorır2 = —2, all roots must divide 2, so the only possible

integer roots are +1, +2. A quick computation shows that none of them are roots. To show the impossibility of trisecting an angle we need to show that y? 3y 1

has no integer roots. An integer root must divide —1 but neither 1 nor —1 are roots.

What Is the Surprise?

Underwood Dudley has made an extensive study of what he calls “cranks” who waste years of their lives trying to trisect angles with a straightedge and compass. Not only do they delude themselves into thinking that this is possible, but, even worse, they think that a solution would be important. Of course, a solution would have no practical use, since tools such as the neusis and quadratrix can solve the problem exactly. The sheer number of such constructions is surprising, especially since many of them are clever and achieve good approximations. Computing the formulas associated with the constructions is an excellent exercise in trigonometry.

It is also surprising that proofs of the impossibility of these geometric construc- tions are purely algebraic using properties of roots of polynomials.

28 2 Trisection of an Angle

Sources

Wikipedia [51, 58, 62] is a good source for the constructions in this chapter. The two approximate trisections are from [15, pp. 67—68, 95-96]. The second example is attributed to the famous philosopher Thomas Hobbes. Both [31, pp. 48—49] and [15, pp. 6-7] discuss trisection using the quadratrix. The doubling of the cube using a neusis is taken from [14].

A rigorous treatment of constructibility can be found in textbooks on abstract algebra such as [17], which contains a general proof of the converse of Thm 2.6 in Sect. 32. Theorem 2.13 is Thm. 23.11 of [17]. A relatively accessible presentation of Wantzel’s proof can be found in [48]. My presentation of constructibility is based upon the presentations in [11, Chap. III] and [27].

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

A Check for

Chapter 3 si Squaring the Circle

Squaring the circle, the construction of a square with the same area as a given circle, is one of the three construction problems that the Greeks posed but were unable to solve. Unlike trisecting the angle and doubling the cube, where the impossibility follows from properties of the roots of polynomials, the impossibility of squaring the circle follows from the transcendentality of z: it is not the root of any polynomial with rational coefficients. This is a difficult theorem that was proved in 1882 by Carl von Lindemann.

Approximations to m ~ 3.14159265359 have been known since ancient times. Some simple but reasonably accurate approximations are:

22 333 355 7“ 3.142857, Toe ~ 3.141509, iB“ 3.141593.

We present three constructions by a straightedge and compass of approximations to x. One is by by Adam Kochański (Sect. 3.1) and two are by Ramanujan (Sects. 3.2, 3.3). Section 3.4 how to square the circle using the quadratrix.

The following table shows the formulas for the lengths that are constructed, their approximate values, the difference between these values and the value of x and the error in meters that results if the approximation is used to compute the circumference of the earth given that its radius is 6378 km.

Construction Formula Value Difference Error (m) m 3.14159265359 - Pe 40 5 Kochański z7 2V3 3.14153333871 5.93 x 10 757 i 355 7 Ramanujan 1 TB 3.14159292035 2.67 x 10 3.4 192\ "4 Ramanujan 2 (> + = 3.14159265258 1.01 x 107° 0.013 © The Author(s) 2022 29

M. Ben-Ari, Mathematical Surprises, https://doi.org/10.1007/978-3-03 1-13566-8 3

30 3 Squaring the Circle

3.1 Kochanski’s Construction

Construction (Fig. 3.1):

1. Construct a unit circle centered at O, let AB be a diameter and construct a tangent to the circle at A.

2. Construct a unit circle centered at A and denote its intersection with the first circle by C. Construct a unit circle centered at C and denote its intersection with the second circle is D.

3. Construct OD and denote its intersection with the tangent by E.

4. From E construct F, G, H, each at distance 1 from the previous point.

5. Construct BH.

Fig. 3.1 Kochański’s approximation to 7

Theorem 3.1 BH = Vs -2V3 x x.

Proof Figure 3.2 is an enlarged extract from Fig. 3.1, where dashed line segments have been added. Since all the circles are unit circles the lengths of the dashed lines are 1. It follows that AOCD is a rhombus so its diagonals are perpendicular to and bisect each other at the point labeled K. AK = 1/2.

3.1 Kochaniski’s Construction 31

Fig. 3.2 Detail from Kochariski’s construction

The diagonal AC forms two equilateral triangles AOAC, ADAC so ZOAC = 60°. Since the tangent forms a right angle with the radius OA, ZKAE = 30°. Now:

1/2 1/2 _ oog39° = X2 EA i 2 Ae

v3

aH =3-EA=(3- d).

AABH is aright triangle and AH = 3 EA, so by Pythagoras’s Theorem:

BH = AB +AA 27 - 1 4 + 6V3 + = 0 ays

=4 = 3 3

BH = Vs 2V3 x 3.141533387 x 7.

32

3 Squaring the Circle

3.2 Ramanujan’s First Construction

Construction (Fig. 3.3):

OMAN NMN PWN Fe

m pi o

. Construct a unit circle centered at O and let PR be a diameter. . Construct the point H that bisects PO and the point T that trisects RO (Thm. 2.6). . Construct the perpendicular at T that intersects the circle at Q. . Construct the chords RS = QT and PS.

. Construct a line parallel to RS from T that intersects PS at N. . Construct a line parallel to RS from O that intersects PS at M. . Construct the chord PK = PM.

. Construct the tangent at P of length PL = MN.

. Connect the points K, L, R.

. Find point C such that RC is equal to RH.

. Construct CD parallel to KL that intersects LR at D.

Fig. 3.3 Ramanujan’s first construction

3.2 Ramanujan’s First Construction 33

Theorem 3.2 RD = aces xn

Proof RS = QT by construction and by Pythagoras’s Theorem for AQOT: = eed 2\7 V5 RS = QT =4|1?- |=] ==. Npe =

ZPSR is subtended by a diameter so APSR is a right triangle. By Pythagoras’s theorem:

By construction MO || RS so AMPO ~ ASPR and:

PM PS PO PR PM _ V31/3 1 2 aes I Pu =~

By construction NT || RS so ANPT ~ ASPR and:

PN PS

PT PR

PN _ V31/3

53 2

pyp =- N31 18

EPE a EER 5 1\ {3I

MN = PN - PM = V31(|— - -|= —_. y a| 5] 9

APKR is aright triangle because ZPKR is subtended by a diameter. By construction PK = PM and by Pythagoras’s Theorem:

ALPRis aright triangle because PL is a tangent so ZLPR is aright angle. PL = MN by construction and by Pythagoras’s Theorem:

34 3 Squaring the Circle

2 a-f] 2.

By construction RC = RH = 3/2 and CD || LK. By similar triangles:

RD RL RC RK RD _ V355/9 3/2 113/6 TEA 113 —2 355

RD = z 3.14159292 ZT. 113 3 59292035 x m

In Fig. 3.4 the line segments are labeled with their lengths. o

PS=V31/3 v31/9

RD=¥V355/113

v31/9 PR=V31/6

RK=VI113/6

Fig. 3.4 Ramanujan’s first construction with labeled line segments

3.3 Ramanujan’s Second Construction 35

3.3 Ramanujan’s Second Construction

Construction (Fig. 3.5):

1.

NNW ND

on

Construct a unit circle centered at O with diameter AB and let C be the intersection of the perpendicular to AB at O with the circle.

. Trisect the line segment AO such that AT = 1/3 and TO = 2/3 (Thm. 2.6).

. Construct BC and find points M, N such that CM = MN = AT = 1/3.

. Construct AM, AN and let P be the point on AN such that AP = AM.

. From P construct a line parallel to MN that intersects AM at Q.

. Construct OQ and then from T construct a line parallel to OQ that intersects AM

at R.

. Construct AS tangent to A such that AS = AR. . Construct SO.

Fig. 3.5 Ramanujan’s second construction

36 3 Squaring the Circle 192 1/4

Theorem 3.3 3VSO = [> + 3 mT,

Proof ACOB is a right triangle so by Pythagoras’s Theorem CB = V2 and:

NB = V2 - 2/3.

ACOB is isoceles so ZNBA = ZMBA = 45°. By the Law of Cosines:

AN =BA +BN —2-BA-BN-cosZNBA 2\7 OV. V 2 EEEE

2 9 |22 AN =4|—. 9 Again by the Law of Cosines:

AM =BA +BM —2-BA-BM-cos/MBA

-2+(va-3) -2-2.(v2-3)- 2-8

3 2 9 [19 AM =,/—. 9

By construction QP || MN so AMAN ~ AQAP, and by construction AP = AM:

AQ AP AM AM AN AN

AM 19/9 19 AN 22/9 3V2.

>

AQ =

By construction TR || OQ so ARAT ~ AQAO and:

IR AT AQ AO a a E

AO 3V22 1 9y22°

3.3 Ramanujan’s Second Construction 37

By construction AS = AR and AOAS is a right triangle because AS is a tangent. By Pythagoras’s Theorem:

(oa)

= 192 1/4 192 1/4 3VSO =3 (1 a z) = (> + =| x 3,14159265258 x 7.

In Fig. 3.6 the line segments are labeled with their lengths. o

AM=19/9

AQ=19/3V22

Fig. 3.6 Ramanujan’s second construction with labeled line segments

38 3 Squaring the Circle

3.4 Squaring a Circle Using a Quadratrix

The quadratrix is described in Sect. 2.4.

Let t = DE be the distance that the horizontal straightedge has moved down the y-axis, and let 0 be the corresponding angle between the rotating straightedge and the x-axis. Let P the position of the joint of the two straightedges. The locus of P is the quadratrix curve.

Let F be the projection of P onto the x-axis and let G be the position of the joint when both straightedges reach the x-axis, that is, G is the intersection of the quadratrix curve and the x-axis (Fig. 3.7).

D C t P E l-t y 0 A x F G B

Fig. 3.7 Squaring the circle with a quadratrix

Theorem 3.4 AG =2/z.

Proof Let y = PF = EA = 1-1. Since ona quadratrix @ decreases at the same rate that ¢ increases:

l1-t_ 90

1 x/2

T 0=-(1-t). se)

Let x = AF = EP. Then tan 6 = y/x so:

= = yeotd = yeot (1-1) = yeot Sy. (3.1)

3.4 Squaring a Circle Using a Quadratrix 39

We usually express a function as y = f(x) but it can also be expressed as x = f(y).

To obtain x = AG we can’t simply plug y = 0 into Eq. 3.1, because cot 0 is not defined, so let us compute the limit of x as y goes to 0. First perform the substitution z = (7 /2)y to obtain:

T

2

m 2 (n 2 x=ycot=y= ( ycot y) = —(zcotz), 2 m \2 T

and then take the limit:

] 2ta 2 . (zcosz 2... COS Z 2cos0 2 lim x = lim (z cotz) = lim = = =

é = lim | , 230 T z0 mz30\ sinz m z=0 \ (sin z)/z nx 1 T where we have used lim,_,9(sin z/z) = 1 (Thm. A.12). o

What Is the Surprise?

It is surprising that such accurate approximations to 7 can be constructed. Of course one can’t help but be astonished by Ramanujan’s clever constructions.

Sources

Kochanski’s construction appears in [7]. Ramanujan’s constructions are from [38, 39]. Squaring the circle using the quadratrix is from [31, pp. 48-49] and [62].

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

A Check for

Chapter 4 uodas) The Five-Color Theorem

Maps use colors to distinguish one region from another by ensuring that adjacent regions are colored with different colors. In 1852 Francis Guthrie noticed that a map of the counties of England could be colored using only four countries. The claim that four countries suffice to color any planar map is called the four-color theorem and was only proved in 1976 by Kenneth Appel and Wolfgang Haken. They used sophisticated mathematical arguments to show that if there is a counterexample (a map needing more than four colors), it had to be associated with one of 1834 configurations. They then used a computer to check these configurations.

While the four-color theorem is extremely difficult to prove, the proofs of the five- and six-color theorems are relatively simple (Sects. 4.5, 4.6). On the way to proving these theorems, we define planar maps and graphs (Sect. 4.1), prove Euler’s formula (Sect. 4.2) and show that a planar graph must have vertex whose degree is less than or equal to five. In Sect. 4.3 Euler’s formula is used to show that two graphs are not planar.

In 1879 Alfred B. Kempe published a proof of the four-color theorem, but in 1890 Percy J. Heawood showed that the proof is incorrect. In Sect. 4.7 we present Kempe’s flawed proof and Heawood’s demonstration that it is not correct.

4.1 Planar Maps and Graphs

Definition 4.1 A planar map is a set of regions in the plane separated by boundaries. A coloring of a map is an assignment of a color to each region such that regions sharing a boundary are assigned different colors.

Figure 4.1a shows a five-coloring of a planar map with ten regions. Figure 4.1b shows a four-coloring of the same map.

© The Author(s) 2022 41 M. Ben-Ari, Mathematical Surprises, https://doi.org/10.1007/978-3-031-13566-8_4

42 4 The Five-Color Theorem

Fig. 4.la Five-coloring of a planar map Fig. 4.1b Four-coloring of a planar map

Definition 4.2 A graph is a set of vertices V and a set of edges E, such that each edge is incident with exactly two vertices.

A planar graph is a graph such that no edges cross each other. In a planar graph, areas enclosed by a set of edges are called faces.

A coloring of a planar graph is an assignment of colors to vertices such that no two vertices of the same color are connected by an edge.

Planar maps and planar graphs are dual and it is convenient to investigate coloring problems in graphs rather than maps.

Theorem 4.1 Given a planar map, a planar graph can be constructed such that for each coloring of the regions of the map there is a coloring of the vertices of the graph, and conversely.

Proof Construct one vertex for each region and construct an edge between two vertices if and only if the corresponding regions share a boundary. oO

Example 4.1 Figure 4.2a shows the planar map from Fig. 4.1b and the vertices associated with the regions. Figure 4.2b shows the planar graph that corresponds to the map.

We can further limit our graphs to those whose faces are triangular.

Definition 4.3 A graph is triangular if all its faces are bounded by three edges. A graph can be triangulated if edges can be added so that the graph is triangular. We also say that there is a triangulation of the graph.

Example 4.2 The faces in the planar graph in Fig. 4.2b are triangular since each one is bounded by three edges. The edges are curved so the faces are not triangles, which are polygons whose three edges are straight line segments.

4.1 Planar Maps and Graphs 43

Da

Fig. 4.2a Associating vertices with the re- Fig. 4.2b The planar graph that corresponds gions of a planar map to the planar map

Fáry’s Theorem states that any triangular planar graph can be be trans- formed into an equivalent planar graph whose edges are straight line seg- ments. Therefore, with no loss of generality, proofs can be restricted to planar graphs whose faces are triangles.

Example 4.3 Fig. 4.3 (left) shows that a square can be two-colored, but if it is triangulated (center), four colors are necessary. Our goal is to prove that all graphs can be n-colored for some n. If the triangulated graph is n-colored, so is the original graph, because deleting the extra edges does not invalidate the coloring (right).

Fig. 4.3 Coloring a triangulated graph

44 4 The Five-Color Theorem

4.2 Euler’s Formula

Theorem 4.2 Let G be a connected planar graph with V vertices, E edges and F faces. Then V E+ F =2.

Proof By induction on the number of edges. If the number of edges in the graph is zero, there is only a single vertex and a single face, so 1 -0+ 1 = 2. Otherwise, there is at least one edge e and it connects two vertices v1, v2. Delete edge e.

Case 1: The graph becomes disconnected (Fig. 4.4a). Merge with v2 (Fig. 4.4b). The resulting graph G’ is a planar connected graph and has fewer edges than G, so by the induction hypothesis (V 1) (E 1) + F = 2 since the number of vertices is also reduced by one. Simplifying, we get V E + F = 2 for G.

vI v2 V1, V2

Fig. 4.4a Removing an edge disconnects the Fig. 4.4b Merging two vertices graph

Case 2: The graph remains connected (Fig. 4.5a). G’ has fewer edges than G (Fig. 4.5b), so by the induction hypothesis V (E 1) + (F 1) = 2 since removing

the edge joins two faces into one. Simplifying, we get V E + F = 2 for G. oO e

Fig. 4.5a Removing an edge does not discon- Fig. 4.5b The graph remains connected and

nect the graph has fewer edges

Theorem 4.3 Let G be a connected, triangulated planar graph with E edges and V vertices. Then E = 3V 6.

Proof Each face is bounded by three edges, so E = 3F'/2, where we divided by 2 because each edge has been counted twice, once for each face it bounds. By Euler’s formula:

4.3 Non-planar Graphs 45

Fig. 4.6a Fewer edges than the upper limit Fig. 4.6b In a triangulated graph the number of edges is maximal

E=V+F-2 =V+2E/3-2 =3V-6. o

Example 4.4 The planar graph in Fig. 4.2b has 10 vertices and 3 - 10 6 = 24 edges. Theorem 4.4 Let G be a connected planar graph. Then E < 3V 6.

Proof Triangulate G to obtain G’. E’ = 3V’ 6 by Thm. 4.4. Now remove edges from G” to obtain G. The number of vertices does not change so E < 3V 6. o

Example 4.5 The graph in Fig. 4.6a has 8 edges and 6 vertices and 8 < 3-6-6 = 12. Figure 4.6b shows a triangulated graph with 6 vertices and 3 - 6 6 = 12 edges.

4.3 Non-planar Graphs

Let us take a short detour to show how Thms. 4.2 and 4.4 can be used to prove that certain graphs are not planar.

Theorem 4.5 Ks, the complete graph on five vertices, is not planar (Fig. 4.7a). Proof For Ks, V = 5 and E = 10. By Thm. 4.4 the number of edges must be less than or equal to 3 - 5 6 = 9 so the graph is not planar. o Theorem 4.6 K3 3, the bipartite graph with three vertices on each side, is not planar (Fig. 4.8a).

Proof V = 6 and E = 9. By Thm 4.2 if K3 3 is planar, F = E- V +2 =9-6+2=5. But each face is bounded by four edges (Fig. 4.8b), so E = 4F/2 = 10 #9. o

In 1930 Kazimierz Kuratowski proved a converse to these theorems: if a graph is not planar it contains (in a certain sense) Ks or K3,3.

4 The Five-Color Theorem

ee

Fig. 4.7a Ks is not planar Fig. 4.7b A failed attempt to draw Ks as pla- nar

4.4 The Degrees of the Vertices

Definition 4.4 d(v), the degree of vertex v, is the number of edges incident with v.

Example 4.6 The graph in Fig. 4.2b contains 8 vertices corresponding to the two rings and each vertex is of degree 5. The vertex corresponding to the outer face is of degree 4 as is the vertex corresponding to the inner face. Therefore:

X dv) =5-84+4-2=48. veV

To get the total number of edges divide 48 by 2 because each edge was counted twice, once for each of the vertices it is incident to.

OS

Fig. 4.8a K3 3 is not planar Fig. 4.8b A failed attempt to draw K3,3 as planar

4.4 The Degrees of the Vertices 47

By generalizing the argument we get:

Theorem 4.7 Let d; for i in {1,2,3,...,k} be the number of vertices of degree i in a connected planar graph G with V vertices and E edges, where k is the highest degree of a vertex in V. Then:

k

X dQ) = lid = 28.

veV i=1

Theorem 4.8 Let G be a connected planar graph with E edges and V vertices, and let di for i in {1,2,3,...,k} be the number of vertices of degree i, where k is the highest degree of a vertex in V. Then there must be a vertex v in V such that d(v) < 5.

Proof (1) If there are d; vertices of degree 1, dz vertices of degree 2, . . . , dx vertices of degree k, then V = ya di. From Thms. 4.4 and 4.7:

k i- dj =2E < (3V - 6) =6V -12=6 Jdi - 12.

i=1 i=l

Therefore:

Si-d <6 514-12 i=1 i=l

k X (6-i)d; > 12.

i=l Since 12 > 0 and all d; are positive, for least one 7, 6 i > O and for that i, i < 6.0

Proof (2) Let us compute the average degree of the vertices which is the sum of the degrees divided by the number of vertices:

Dki di

dayg = V

But the sum of the degrees is twice the number of edges which by Thm. 4.4 gives:

2E 6V-12 6 davg = < =6-—<6. avg V V V If the average is less than six there must be a vertex of degree less than six. o

Example 4.7 In Fig. 4.2b the sum of the degrees is 8 - 5 + 2 -4 = 48. There are 10 vertices so the average degree is 48/10 = 4.8 and there must be a vertex of degree 4 or less.

48 4 The Five-Color Theorem

4.5 The Six-Color Theorem

Theorem 4.9 Any planar graph G can be six-colored.

Proof By induction on the number of vertices. If G has six vertices or fewer, six colors suffice. For the inductive step, by Thm. 4.8 G has a vertex v with degree 5 or fewer. Delete vertex v to obtain the graph G’. By the induction hypothesis G’ can be six-colored, but v has at most 5 neighbors and at most 5 colors are used to color them (Fig. 4.9a), so v can be colored using the sixth color (Fig. 4.9b). oO

Fig. 4.9a Five colors suffice for coloring the Fig. 4.9b Color v with the sixth color neighbors of v

4.6 The Five-Color Theorem

Definition 4.5 Let G be a colored planar graph. A (Kempe) chain G’ is a maximal, two-colored, connected subgraph of G.

Theorem 4.10 Any planar graph G can be five-colored.

Proof By induction on the number of vertices. If G five vertices or fewer, five colors suffice. For the inductive step, by Thm. 4.8 G has a vertex v with degree 5 or less. Delete v to obtain G’. By the induction hypothesis, G’ can be five-colored. In G, if the degree of v is less than 5, or if vj,..., v5, the neighbors of v, are colored with four colors or fewer, v can be colored with the fifth color. Otherwise, vj,...,v5 are colored with different colors in G’ (Fig. 4.10, top).

4.6 The Five-Color Theorem

Fig. 4.10 Proof of the five-color theorem

49

50 4 The Five-Color Theorem

Consider vertex which is colored blue and vertex v3 which is colored red. If vı, v3 are not connected by a blue-red path (say if the edge vev7 did not exist), we can exchange the colors along the path from to ve and color v blue. Otherwise, consider the blue-red chain which contains v1, v3. By adding v and the edges vv1, vv3 we obtain a closed path P (double line) that divides the plane into an “inside” region and an “outside” region (Fig. 4.10, middle)

Consider v2 which is colored green and v4 which is colored orange. These vertices cannot be contained in a single green-orange chain, because v2 is inside P and v4 is outside P, so any path connecting them must cross P, contradicting the assumption that the graph is planar. Therefore, they must be contained in two unconnected green- orange chains (double dashed line, in Fig. 4.10, middle). Exchange the colors on the chain containing v2 and then v can be colored green to obtain a five-coloring of G (Fig. 4.10, bottom). oO

The statement that a continuous path from the inside of of a closed con- tinuous curve P to the outside of P must intersect P is the Jordan Curve Theorem. The theorem is intuitively obvious but difficult to prove.

4.7 Kempe’s Incorrect Proof of the Four-Color Theorem

Theorem 4.11 Any planar graph G can be four-colored.

Proof (Incorrect) The base case of the induction and most of the proof is the same as that of the five-color theorem. The new case that must be considered is a vertex v with five neighbors which, by the inductive hypothesis, can be colored with four colors after removing v.

In Fig. 4.1 1a there are two vertices v2, v5 colored blue. Consider the blue-green chain containing v2 and the blue-yellow chain containing vs. The blue-green chain is contained within the closed path defined by the red-yellow chain containing v1, v3 (double line) and the blue-yellow chain in contained within the closed path defined by the red-green chain containing v;, v4 (double dashed line).

Exchange the colors of both the blue-green chain and the blue-yellow chain (Fig. 4.11b). The result is that the neighbors of v are colored with the three colors red, green and yellow, leaving blue free to color v. o

Heawood noted that the closed paths defined by the red-yellow chain and the red-green chain can share red vertices (vı, vg in Fig. 4.12a). When the colors are exchanged in the blue-green and blue-yellow chains, it is possible for blue vertices v6, v7 to be connected (Fig. 4.12b) and the coloring is no longer correct.

4.7 Kempe’s Incorrect Proof of the Four-Color Theorem 51

Fig. 4.11a Blue-green and blue-yellow Fig. 4.11b Exchange the colors of the two Kempe chains Kempe chains

Fig. 4.12a Red-yellow and red-green chains Fig. 4.12b Exchanging colors causes the blue share red vertices vertices to become connected

52 4 The Five-Color Theorem

What Is the Surprise?

The four-color theorem is notorious because it is so easy to state but extremely difficult to prove. Therefore, it is surprising that the proof of the five-color theorem is elementary. The clever part of the proof is Thm. 4.8 (a planar graph must have a vertex of at most degree 5), which is a theorem that has nothing to do with coloring. Instead, it results just from counting vertices and edges.

Sources

For the four-color theorem see [49, 54]. The proof of the five-color theorem is based on [1, 53]. [16] presents numerous proofs of Euler’s formula. Kempe’s incorrect proof of the four-color theorem is described in [46].

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

® | Chapter 5 ‘pedates |

How to Guard a Museum

In 1973 Victor Klee asked how many guards are need to observe all the walls of a museum? If the walls form a regular polygon or even a convex polygon, one guard is sufficient (Fig. 5.1).

Fig. 5.1 A museum whose walls form a convex polygon

Consider now a museum with saw-toothed walls (Fig. 5.2). Verify by counting that the museum has 15 walls. Each “tooth” defines a triangle that is shaded gray in Fig. 5.3. A guard placed anywhere within one of the triangles can observe all the walls bounding that triangle (red arrows).

Fig. 5.2 A museum whose walls do not form a convex polygon

© The Author(s) 2022 53 M. Ben-Ari, Mathematical Surprises, https://doi.org/10.1007/978-3-031-13566-8_5

54 5 How to Guard a Museum

Fig. 5.3 Visibility within each “tooth”

If at least one of the guards is placed near the top wall spanning the entire museum, she can observe all the horizontal walls (blue arrows in Fig. 5.4). Thus 5 = 15/3 guards are sufficient to observe all the walls of the museum. Since the triangles do not overlap a guard in one triangle will not be able to observe all the walls of another triangle (green arrow) so 5 guards are necessary.

Fig. 5.4 Visibility of the walls of the museum

The example in Fig. 5.2 can be generalized to n/3 teeth with n walls, so we conclude that at least n/3 guards are necessary. We wish to prove that n/3 guards are sufficient to guard any museum.

Section 5.1 proves that any triangulated polygon can be three-colored. This is used in Sect. 5.2 to prove the theorem that n/3 guards are sufficient. Section 5.3 completes the proof by showing that any polygon can be triangulated.

5.1 Coloring Triangulated Polygons

Definition 5.1 A diagonal a of polygon is an edge connecting two vertices that is not one of the (outside) edges of the polygon.

Definition 5.2 A polygon can be triangulated if non-intersecting diagonals can be constructed such that the interior of the polygon is covered by triangles.

Theorem 5.1 Any polygon can be triangulated.

We defer the proof of Thm. 5.1.

5.1 Coloring Triangulated Polygons 55

Definition 5.3 A vertex of a polygon is convex if its interior angle is less than 180°; a vertex is concave if its interior angle is greater than 180°.

In Fig. 5.5 vertex 1 is convex and vertex 2 is concave.

Fig. 5.5 A polygon with a convex vertex (1) and a concave vertex (2)

Definition 5.4 A polygon with vertices V can be three-colored if there is a map: c: V > {red, blue, green} , such that no edge has two vertices that are assigned the same color.

Theorem 5.2 A triangulated polygon can be three-colored.

Proof By induction on the number of vertices. A triangle can be three-colored. A triangulated polygon with n > 3 vertices must have a diagonal. Choose an arbitrary diagonal AB (Fig. 5.6a) and divide the polygon along this diagonal into two smaller polygons (Fig. 5.6b). By induction each of these smaller polygons can be three- colored (Fig. 5.7a).

Since the colors assigned are arbitrary, if different colors are assigned to A, B in the two polygons, we can rename the colors in one of them so that the colors of A, B are the same in both polygons. For example, in Fig. 5.7b exchange red and green in the lower polygon. Paste the two polygons together to recover the original polygon with n vertices. It will be three-colored (Fig. 5.8). o

56 5 How to Guard a Museum

A B ay Fig. 5.6a An arbitrary diagonal in a polygon Fig. 5.6b Divide the polygon A B A a Fig. 5.7a Three-color the two smaller poly- Fig. 5.7b Exchange the colors of one polygon

gons to match the other

Fig. 5.8 Paste the two smaller polygons back together

5.2 From Coloring of Polygons to Guarding a Museum Theorem 5.3 A museum with n walls can be guarded by n/3 guards.

Proof By Thm. 5.1 the polygon can be triangulated and by Thm. 5.2 the polygon can be three-colored. All three vertices of each triangle in the triangulation must be colored by different colors in order to satisfy the condition of being three-colored. Since the polygon is three-colored, at least one color, say red, can appear at most n/3

5.3 Any Polygon Can Be Triangulated 57

Fig. 5.9 The exterior angles of a convex polygon

times, and every triangle must have a vertex colored red. Station a guard at each red vertex; she can observe all the walls of the each triangle the vertex belongs to. Since the triangles of the triangulation include all the edges of the polygon, n/3 guards are sufficient to observe all the walls of the museum. o

If n is not divisible by 3 the number of guards needed is |n/3], the largest integer less than or equal to n/3. For example, 4 guards are sufficient for museums with 12, 13, 14 walls since | 12/3] = [13/3] = [14/3] = 4. For simplicity we ignore this complication.

5.3 Any Polygon Can Be Triangulated

Theorem 5.4 The sum of the interior angles of a polygon with n vertices is:

180° (n - 2).

Proof Consider a convex polygon and denote its exterior angles by 6; (Fig. 5.9). As you move from one dashed line in sequence to the next dashed line, you complete a

rotation around a circle so: n

Xo: = 360°.

i=]

For each exterior angle 9; denote its corresponding interior angle by ¢;. Then:

y 6; = Vaso ġi) = 360° i=1 i=]

2, $i = n- 180° 360° = 180° (n 2) . i=l

58 5 How to Guard a Museum

Fig. 5.10 A concave vertex

If there is a concave vertex (B in Fig. 5.10), there is a triangle formed by the two edges incident with the concave vertex and the line AC connecting the other two vertices. By summing the angles of the triangle we get:

(180° a) + (360° 8) + (180° y) = 180° a+B+y =3-180°.

The sum of the interior angles increases by œ + 8 +y while the number of vertices increases by three preserving the equation in the theorem:

Y gi + (a+ B+ 7) = 180°(n 2) +3 - 180°

i=l = 180°((n +3) - 2). o Theorem 5.5 There must be at least three convex vertices in a polygon.

Proof Let k be the number of concave vertices where the interior angle of each is 180° + €;, c; > 0. The sum of the interior angles of the concave vertices is certainly less than or equal to the sum of the interior angles of all the vertices:

k k - 180° + 3 ci < 180°(n—2) {=l

A

k (k +2) 180° + > é; < n- 180° {=l (k +2) - 180° < n- 180° k<n-2.

It follows that there must at least three vertices that are convex, not concave. o

Proof (Theorem 5.1) By induction on the number of vertices. For n = 3 there is nothing to prove. If n > 3, by Thm. 5.5 there must be a convex vertex C. Label its adjacent vertices by B, D. If BD is contained within the polygon (Fig. 5.1 1a), it is

5.3 Any Polygon Can Be Triangulated 59

E Cc E C D D I I A B A B Fig. 5.1la Triangulation where a diagonal is Fig. 5.11b Triangulation where a diagonal is contained within the polygon not contained within the polygon

a diagonal and the polygon can be split into ABCD and another polygon ABDE with BD as an edge and which is smaller than the original polygon (Fig. 5.11a). By the inductive hypothesis, the polygon can be triangulated and then pasted back to ABCD, triangulating the original polygon.

If BD is not contained in the polygon, there must be concave vertex F that is closest to C (Fig. 5.11b). CF is a diagonal and splits the polygon into two smaller polygons CFED and CFAB. By the inductive hypothesis these can be triangulated and pasted together. o

What Is the Surprise?

The museum theorem is suprising because what seems to be a theorem in geometry is proved rather elegantly by an appeal to coloring a graph.

Sources

This chapter is based on [1, Chap. 39].

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

A Check for

Chapter 6 isi Induction

The axiom of mathematical induction is used extensively as a method of proof in mathematics. This chapter presents inductive proofs of results that may not be known to the reader. We begin with a short review of mathematical induction (Sect. 6.1). Section 6.2 proves results about the familiar Fibonacci numbers while Sect. 6.3 proves results about Fermat numbers. Section 6.4 presents the 91-function discovered by John McCarthy; the proof is by induction on an unusual sequence: integers in an inverse ordering. The proof of the formula for the Josephus problem (Sect. 6.5) is also unusual because of the double induction on two different parts of an expression.

6.1 The Axiom of Mathematical Induction

Mathematical induction is the primary method of proving statements to be true for an unbounded set of numbers. Consider:

l=1, 142=3, 14+24+3=6, 14+24+3+4=10. We might notice that: 1=(1-2)/2, 3=(2-3)/2, 6=(3-4)/2, 10=(4-5)/2,

and then conjecture that for all integers n > 1:

n

yi B mnt 1)

i=l

If you have enough patience, checking this formula for any specific value of n is easy, but how can it be proved for all of the infinite number of positive integers? This is where mathematical induction comes in.

© The Author(s) 2022 61 M. Ben-Ari, Mathematical Surprises, https://doi.org/10.1007/978-3-03 1-13566-8_6

62

6 Induction

Axiom 6.1 Let P(n) be a property (such as an equation, a formula, or a theorem),

where n is a positive integer. Suppose that you can:

¢ Base case: Prove that P(1) is true.

e Inductive step: For arbitrary m, prove that P(m + 1) is true provided that you

assume that P(m) is true.

Then P(n) is true for all n > 1. The assumption that P(m) is true for arbitrary m is

called the inductive hypothesis.

Here is a simple example of a proof by mathematical induction.

Theorem 6.2 Forn > 1:

Proof The base case is trivial:

ye ue)

i=1

The inductive hypothesis is that the following equation is true for m:

re man D)

i=1

The inductive step is to prove the equation for m + 1:

m+1 m vi = ` i+(m+1) i=l i=l m(m +1) (m+1)(m+2)

Se aa +(m+1)=

2

By the principle of mathematical induction, for any n > 1:

si = mnt 1) l

i=1

o

The inductive hypothesis can be confusing because it seems that we are assuming what we are trying to prove. The reasoning is not circular because we assume the truth of a property for something small and then use the assumption to prove the

property for something larger.

Mathematical induction is an axiom so there can be no question of proving induc- tion. You just have to accept induction like you accept other axioms of mathematics such as x + 0 = x. Of course, you are free to reject mathematical induction, but then

you will have to reject much of modern mathematics.

6.2 Fibonacci Numbers 63

Mathematical induction is a rule of inference that is one of the Peano axioms for formalizing natural numbers. The well-ordering axiom can be used to prove the axiom of induction and, conversely, the axiom of induction can be used to prove the well-ordering axiom. However, the axiom of induction cannot be proved from the other, more elementary, Peano axioms.

6.2 Fibonacci Numbers

Fibonacci numbers are the classic example of a recursive definition:

fi=l h=1 Sn = fa-1 + fa-2 for n23.

The first twelve Fibonacci numbers are:

1,1,2,3, 5,8, 13,21, 34,55, 89, 144. Theorem 6.3 Every fourth Fibonacci number is divisible by 3. Example 6.1 f4=3=3-1, fg =21=3-7, fig =144=3. 48.

Proof Base case: f4 = 3 is divisible by 3. The inductive hypothesis is that f4n is divisible by 3. The inductive step is:

Sane) = Sansa = fans3 + fans2 = (fans2 + fansi) + fans2 = ((fanti + fan) + fanei) + fanse = ((fansi + fan) + fansi) + (fanei + fan) = 3 fansi +2 fan -

3 fant is divisible by 3 and, by the inductive hypothesis, f4n is divisible by 3. Therefore, f4(n+1) is divisible by 3. o

64 6 Induction

7 n Theorem 6.4 fn < (G) ;

7\' IN 49 a tt ok l Proof Base cases: fi = 1 < Z and h =1 < ri ETA The inductive step is:

Jn+1 = Sh + faa

since:

n_ an 1 M _ tn $ - ? where (o = n (o = ai Proof We first show that ¢? = ¢ + 1: 1 5 á 2 + 21,285 5. 1,35 +1 4 4 4HR 2 =ġ+l1.

Similarly, we can show that ¢* = ¢ + 1. The base case for Binet’s formula is:

z l+y5 _ 1-v5 TE eA

6.2 Fibonacci Numbers Assume the inductive hypothesis for all k < n. The inductive step is: grt! gm! = gg! _ gg! = (p+ 1g"! - (b+) ge"!

= (6" - 6") +(¢" |-¢"") = V5 fr + V5 fai

n+l _ 7nt+l aa = Sh + fn-1 = fnri .

v5

Theorem 6.6

(elt lera) era

n n E n! n! l i a ,) = ka-p! k+Din-(k+))!

R ni(k +1) n\(n k) = k+ Dn- k)! (einen)! 7 n\(n+1) ~ (k+1)'(n—k)! (n+ 1)!

~ (kt DN(n+1) —(k+))! _ n+1 = (ota)

k k! We will also use the equality (3

~ Ok 0)!

We can now prove the theorem. The base case is:

1 1! n=) =a o z

= ] for any k 2 1.

66 6 Induction

The inductive step is:

Jn = fn-1 + fn-2 = o)e 7

6.3 Fermat Numbers

Definition 6.1 The integers F, = 27" + 1 for n > 0 are called Fermat numbers. The first five Fermat numbers are prime: Fo =3, Fr =5, Fo =17, F3=257, Fy = 65537.

The seventeenth-century mathematician Pierre de Fermat claimed that all Fermat numbers are prime, but nearly a hundred years later Leonhard Euler showed that:

Fs = 27 +1 = 23241 = 4294967297 = 641 x 6700417.

Fermat numbers become extremely large as n increases. It is known that Fermat numbers are not prime for 5 < n < 32, but the factorization of some of those numbers is still not known.

Theorem 6.7 For n > 2, the last digit of Fn is 7. Proof The base case is F = 2?’ +1 = 17. The inductive hypothesis is F, = 10k, +7 for some kn > 1. The inductive step is: gn+l 27.21 Qn 2 Fai so? e123 +1=(2 ) +1

ES 2 Ao +1)-1) +1=(F,-1)}+1 = (10k +7 1) +1 = (10k, +6)? +1 = 100k? + 120k, +36 + 1 = 10(10k2 + 12k, +3)+6+1

= l0kn+1 +7, forsome kny 2 1. oO

6.4 McCarthy’s 91-function 67 n-1

Theorem 6.8 Forn > 1, Fy, = I] Fg +2. k=0

Proof The base case is:

0 Fi =| | Fk +2= Fo +2=3+2=5. k=0

The inductive step is:

z a +1 -2) Z + 1)

n\2 n+ = (2"") -1=(? '+1)-2

= nel 2 n Foi =| | Fe +2. o k=0

6.4 McCarthy’s 91-function

We usually associate induction with proofs of properties defined on the set of positive

integers. Here we bring an inductive proof based on a strange ordering where larger

numbers are less than smaller numbers. The induction works because the only

property required of the set is that it be ordered under some relational operator. Consider the following recursive function defined on the intergers:

f(x)=if x > 100 then x- 10 else f(f(x+11)). For numbers greater than 100 the result of applying the function is trivial: f(101)=91, f(102)=92, f(103)=93, f(104) =94,....

What about numbers less than or equal to 100? Let us compute f(x) for some numbers, where the computation in each line uses the results of previous lines:

68 6 Induction

(100) = f(f(100 + 11)) = f(F(111)) = f(101) = 91 (99) = f(f(99 +11) = f(F(110)) = F(100) = 91 (98) = f(f(98 +11) = f(F(109)) = f(99) = 91

F(91) = fF(FO1 +11) = F(F(102)) = f(92) = f (f (103)) = f (93) =-+- = f (98) =91 f(90) = f(f(90+11)) = f(f(101)) = f(91) =91

f(89) = fF(F(89+11)) = f(f(100)) = f(91) =91. Define the function g as: g(x) =if x > 100 then x- 10 else 91.

Theorem 6.9 For all x, f (x) = g(x).

Proof The proof is by induction over the set of integers S = {x |x < 101} using the relational operator < defined by:

y <x ifandonlyif x< y,

where on the right-hand side < is the usual relational operator on the integers. This definition results in the following ordering:

101 < 100 < 99 < 98 < 97 <.---.

There are three cases to the proof. We use the results of the above computations. Case 1: x > 100. This is trivial by the definitions of f and g. Case 2: 90 < x < 100. The base case of the induction is:

f(100) = 91 = g(100),

since we showed that f(100) = 91 and by definition g(100) = 91. The inductive assumption is f(y) = g(y) for y < x and the inductive step is:

f(x) = f(f@+11)) (6.1a) = f(x+11-10) = f(x +1) (6.1b) = g(x + 1) (6.1c) =91 (6.1d) = g(x). (6.1e)

Equation 6.la holds by definition of f since x < 100. The equality of Eq. 6.1a and Eq. 6.1b holds by the definition of f, because x > 90 sox +11 > 100. The equality

6.5 The Josephus Problem 69

of Eq. 6.1b and Eq. 6.1c follows by the inductive hypothesis x < 100, sox+1 < 101 which implies that x + 1 S and x + 1 < x. The equality of Eq. 6.1c, Eq. 6.1d and Eq. 6.le follows by definition of g and x + 1 < 101, sox < 100.

Case 3: x < 90. The base case is: f(89) = f(f(100)) = f(91) = 91 = g(89) by definition of g since 89 < 100.

The inductive assumption is f(y) = g(y) for y < x and the inductive step is:

f(x) = f(f(+ 1) (6.2a) = f(g(x +11)) (6.2b) = f(91) (6.2c) =91 (6.2d) = g(x). (6.2e)

Equation 6.2a holds by definition of f and x < 90 < 100. The equality of Eq. 6.2a and Eq 6.2b follows from the inductive hypothesis x < 90, sox + 11 < 101, which implies that x +11 Sandx+11 < x. The equality of Eq. 6.2b and Eq 6.2c follows by definition of g andx+11 < 101. Finally, we have already shown that f(91) = 91 and g(x) = 91 for x < 90 by definition. o

6.5 The Josephus Problem

Josephus was the commander of the city of Yodfat during the Jewish rebellion against the Romans. The overwhelming strength of the Roman army eventually crushed the city’s resistance and Josephus took refuge in a cave with some of his men. They preferred to commit suicide rather than being killed or captured by the Romans. According the account by Josephus, he arranged to save himself, became an observer with the Romans and later wrote a history of the rebellion. We present the problem as an abstract mathematical one.

Definition 6.2 (Josephus problem) Consider the numbers 1,...,n+1 arranged in a circle. Delete every g’th number going around the circle g,2q,3q,... (where the computation is performed modulo n+1) until only one number m remains. J(n+1,q) = m is the Josephus number for n + 1 and q.

Example 6.2 Let n + 1 = 41 and let q = 3. Arrange the numbers in a circle:

> 123 45 67 8 9101112131415 161718192021 | T 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 2423220 <—

70 6 Induction

The first round of deletions leads to:

> 1234567 8 9101112131475 16178192071 | T 41 40 399 38 37 36 35 34 33 32 31 30 29 28 97 26 25 %4 2322 e

After removing the deleted numbers this can be written as:

124578 10 11 13 14 16 17 19 20 22 23 25 26 28 29 31 32 34 35 37 38 40 41

The second round of deletions (starting at the last deletion of 39) leads to: Y24978 y0 11 13 y4 16 17 y9 20 22 73 25 26 98 29 31 y2 34 35 y7 38 40 41

We continue deleting every third number until only one remains:

2 4 7 811 ¥3 16 17 20 22 25 26 29 31 34 35 38 40 2 4 £11 16 V7 22 25 29 31 35 38

2 4 Yi 16 22 35 31 35

2 416 22 31 35

A 163135

y6 31

31

It follows that J(41,3) = 31.

The reader is invited to perform the computation for deleting every seventh number from a circle of 40 numbers in order to verify that the last number is 30.

Theorem 6.10 J(n + 1,q) = (J(n,q) +q) (mod n +1).

Proof The first number deleted in the first round is the g’th number and the numbers that remain after the deletion are the n numbers:

12... q-l1 q+l1...nn+1 (modn+l1).

Counting to find the next deletion starts with q + 1. Mapping 1,...,” into this sequence we get:

1 2 ...n-q n+l-q n+2-q...n-1 n (mod n+!) l l ) l l l l q+1 q+2... n n+1 1 ...g-2 q-1 (mod n+l).

Remember that the computations are modulo n + 1:

(n+2-q)+q=(n+1)+1 = 1 (mod n+ 1) (n)+q =(n+1)-1+q=q-1 (modn+1).

6.5 The Josephus Problem 71

This is the Josephus problem for n numbers, except that the numbers are offset by q. It follows that:

J(n+1,q) =(J(n,q)+q) (modn+1). go

Theorem 6.11 Forn > 1 there existnumbers a > 0,0 < t < 2%, such that n = +t.

Proof This can be proved from repeated application of the division algorithm with divisors 2°,2!,27, 24, ..., but it is easy to see from the binary representation of n. For some a and ba-1, bg_2,...,b1, bo, where for all i, bi = 0 or b; = 1, n can be expressed as:

n =2" + ba-12%! ee ED’ n = 2% + (ba-12%! +--+ + b02?)

n=2%+t, wheret < 27-1. o

We now prove that there is simple closed form for J (n, 2). Theorem 6.12 For n =2° +t, a > 0,0 < t < 2%, J(n,2) =2t +1.

Proof By Thm. 6.11, n can be expressed as stated in the theorem. The proof that J(n, 2) = 2t + 1 is by a double induction, first on a and then on t.

First induction:

Base case. Assume that t = 0 so that n = 2%. Let a = 1 so that there are two numbers in the circle 1,2. Since q = 2, the second number will be deleted, so the remaining number is 1 and J(2!,2) = 1.

The inductive hypothesis is that J(2“,2) = 1. What is J(2“*!,2)? In the first round all the even numbers are deleted:

1 2 3 4 ... 21-1 2%, There are now 2% numbers left: VS. ee (2SFLET.,

By the inductive hypothesis J(2¢*!,2) = J(2%,2) = 1 so by induction J(n,2) = 1 whenever n = +0. Second induction: We have proved J(2% + 0,2) = 2 - 0 + 1, the base case of the second induction. The inductive hypothesis is /(2% + t,2) = 2t + 1. By Thm. 6.10:

J(2% + (t+1),2) = J(2% +t,2)+2=2t+1+2=2(t+1)+1. o

T2 6 Induction

Theorems 6.11 and 6.12 give a simple algorithm for computing J (n, 2). From the proof of Thm. 6.11:

n=2% +t =2" + (ba-12%! +--+ bo2®),

so t = ba-12%7! +- - -+bo2?. We simply multiply by 2 (shift left by one digit) and add 1. For example, since n = 41 = 2°+23+2° = 101001, it follows that J (41,2) = 2t+1, and in binary notation:

41 = 101001 9 = 01001 2t +1 = 10011 = 16+2+1=19.

The reader can verify the result by deleting every second number in a circle 1,...,41. There is a closed form for J (n, 3) but it is quite complicated.

What Is the Surprise?

Induction is perhaps the most important proof technique in modern mathematics. While Fibonacci numbers are extremely well-known and Fermat numbers are also easy to understand, I was surprised to find so many formulas that I never knew (such as Thms. 6.3 and 6.4) that can be proved by induction. McCarthy’s 91-function was discovered in the context of computer science although it is a purely mathematical result. What is surprising is not the function itself, but the strange induction used to prove it where 98 < 97. The surprise of the Josephus problem is the bidirectional inductive proof.

Sources

For a comprehensive presentation of induction see [21]. The proof of McCarthy’s 91- function is from [30] where it is attributed to Rod M. Burstall. The presentation of the Josephus problem is based on [21, Chapter 17], which also discusses the historical background. That chapter contains other interesting problems with inductive proofs, such as the muddy children, the counterfeit coin and the pennies in a box. Additional material on the Josephus problem can be found in [44, 57].

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

A Check for

Chapter 7 wes | Solving Quadratic Equations

Poh-Shen Loh proposed a method for solving quadratic equations that is based on a relation between the coefficients of the quadratic polynomial and its roots. Section 7.1 reviews the traditional methods for solving quadratic equations. Section 7.2 tries to convince the reader that Loh’s method makes sense and then explains how to compute the roots. In Sect. 7.3 the computation is carried out for two quadratic polynomials and a similar computation for a quartic polynomial. Section 7.4 derives the traditional formula for the roots from Loh’s formulas.

The introduction of algebra and modern algebraic notation is relatively recent. Previously, mathematicians used geometry almost exclusively, so it is interesting to look at al-Khwarizmi’s geometric construction of the formula for the roots of quadratic equations (Sect. 7.5). Section 7.6 shows a clever geometric construction used by Cardano in developing the formula for the roots of cubic equations.

Section 7.8 presents other geometric methods for finding the roots of quadratic equations.! The chapter concludes with Sect. 7.9 which discusses numerical com- putation of the roots of quadratic equations.

7.1 Traditional Methods for Solving Quadratic Equations

Every student of mathematics memorizes the formula for obtaining the roots of a quadratic equation ax? + bx + c = 0:

—b + Vb? —4ac

X1, X2 = a

1 Chapter 11 is a prerequisite for a full understanding of these methods.

© The Author(s) 2022 73 M. Ben-Ari, Mathematical Surprises, https://doi.org/10.1007/978-3-03 1-13566-8_7

74 7 Solving Quadratic Equations

For now we will work with monic polynomials, x? + bx + c = 0, whose roots are:

-b + Vb? -4c sni = —— s. (1.1)

Another method of solving quadratic equations is by factoring the polynomials more-or-less by trial-and-error. Sometimes it is easy to obtain the roots by factoring:

x? -4x +3 = (x-1)(x- 3). (1.2)

It is much harder to factor x? 2x 24 because there are many possible pairs of roots that must be considered:

(+1, 24) , (+2, 12) , (+3, =8) , (+4, F6).

7.2 The Relation Between the Roots and the Coefficients

Theorem 7.1 If r1, 72 are the roots of x? + bx +c then: (x -ri)(x- r2) =x? - (rı +r2)x +rir2 =x +bx+c. Therefore, even if we do not know the values of the roots, we do know that: ry tr2 =-—b, rır =c. (7.3)

There is really nothing to prove because the result emerges from the computation. Consider some values of —b, r1, r2 and let 12 be the average of 71, r2:

For any quadratic equation the average of the two roots is constant:

ceti = (-b-r2)+r2 = b

™\,2

2 2 2

7.2 The Relation Between the Roots and the Coefficients 75

m2 rj mi —12 ><

fitri

‘e m2 r2

Fig. 7.1 Relation between the roots r1, r2 = 2, 6 and their average mj 2 = 4

Let s be any number. Then:

-b= -04s +09 =(Frs}4(P-s}=n4n.

If one root is at distance s from the average, the other root is at distance —s from the average. For r1, r2 = 2,6, where m2 = 4, s = 2, we have:

=b rı) m| mp|mp-ri|m-r2 33} 12) 21| 165) 44| -4% 33} 8| 25| 165) 84] -8% 33 1| 32| 165} 154| -154 -4| -16| 12| -2 14| -14 -4| -4| o -2 2|) -2 -4| -3| -1| -2 1} -1

Figure 7.1 visualizes this relation. If we use other values r1,r2 = 3,5 for which +12 = 8 then mız = 4 remains the same while s becomes | (Fig. 7.2). The offset s seems to be arbitrary in:

E aa _(-b ry = 2 S|, r2 = 7 S|,

but there is an additional constraint rjr2 = c, where c is the constant term in the polynomial. By multiplying the two expressions we have derived for r1, r2, we can determine s and then rj, r2:

a eae b bB o; c= 2 S 7 s=% S

76 7 Solving Quadratic Equations mn2-ri m2 —12 ee ene

ritn

- m]ı2 r2

Fig. 7.2 Relation between the roots r1, r2 = 3, 5 and their average mj 2 = 4

7.3 Examples of Loh’s Method

Example 7.1 Consider the polynomial x? 2x 24 where b = —2, c = —24:

f 2) (-2) e=- +s] (-S-5]

-24 = (1+ s)(1 - s)

s=5 ri=1+5=6 rm =1-5=-4.

Check: (x 6)(x (—4)) = x? 2x 24. Example 7.2 Let us find the roots of x? 83x 2310:

83 83 —2310 = (= + s] (F = |

2 9889, 5349 = 16129

4 4 127 ae 127 ry = a8 =-22 2 2 83 127 = = 1 r2 7 + 7 05

Check: (x +22) (x 105) = x? 83x 2310.

7.3 Examples of Loh’s Method 77

Compare this computation with the computation using the traditional formula:

-b + Vb? 4c _ =(-83) + y¥(-83)? - 4 - (2310)

2 2 _ 83=vVI6I29 83+127 E 2 2 Bob 83-127 a y 2 83 +127 BE =A = 105.

Example 7.3 Theorem 7.1 can be generalized to polynomials of higher degrees. Here is an interesting example for a quartic equation x* 10x? x + 20 = 0. As with quadratic equations there are formulas for solving cubic and quartic equations (though not equations of higher powers), but the formulas are quite complicated.

Does this polynomial of degree four factor into two quadratic polynomials with integer coefficients? If so, the coefficients of the x terms must be equal and of opposite signs since the coefficient of the x* term is zero. Therefore, the form of the quadratic factors is:

f (x) = (x? -nx + ky) (x? + nx + ko). Carrying out the multiplication results in:

f(x) = xt 4nx3 +kox? —nx3 —n?x? —nkox

+k x2 +nk x +kyk2 z Equating the coefficients gives three equations in the three unknowns n, k1, k2 gives:

(ki +k) -= n? = -10 n(kı = k2) =-1 kiko =20.

Since we are looking for factors with integer coefficients, from the last two equations it is clear that:

n=1,k, =4,k.=5 or n= l, = —5, k = —4.

Only n = 1, = —5, ko = —4 satisfy the first equation for the coefficient of x:

f(x) = (x? - x-5) (x7 +x- 4).

78 7 Solving Quadratic Equations

Solving these quadratic equations gives four solutions of the quartic equation:

MESAI -1 + V17

7.4 Derivation of the Traditional Formula

For an arbitrary monic polynomial x? + bx + c, Loh’s formulas are:

earn

the traditional formula for obtaining the roots of a monic quadratic polynomial. If the polynomial is not monic divide it by a, substitute in the equation and simplify:

b c xX +—x4+—=0

a a _ —(b/a) + y (b/a)? -4(c/a)

r1, r2

2 —b + Vb? 4ac

2a

7.5 Al-Khwarizmi’s Geometric Solution of Quadratic Equations

Let us write a monic quadratic polynomial as x? + bx c. The roots can be found by completing the square:

eall

b 2

2 b any)

2 2

(2

etiz

+ b”

etiz b b\? —b+Vb24+4c

E iaa 4 G+ = 5 2 2 2

7.6 Cardano’s Construction for Solving Cubic Equations 79

This is the familiar formula for finding the roots of a quadratic equation, except that 4c has the opposite sign since the coefficient of the constant term was —c.

Completing the square was developed in the 8th century by Muhammad ibn Musa al-Khwarizmi in a geometric context. Given the equation x? + bx = c, assume that there is a square whose side is x so that its area is x”. To the area x? add bx by adding four rectangles of area bx/4 whose sides are b/4 and x (Fig. 7.3a). Now complete the diagram to a square by adding the four little squares of area (b/4)? (Fig. 7.3b).

We can’t construct the diagram in Fig. 7.3a because we don’t know what x is, but the area of the larger square in Fig. 7.3b is:

RA Se. 4 4

which we do know since we are given the coefficients b,c. By constructing the diagram and erasing the small squares whose sides are (b/4)—another known quantity—we obtain the line segment of length x.

Example 7.4 Letx*+12x = 64. Then c+(b?/4) = 64+36 = 100. Itis easy to construct a square of area 100 since each side has length 10. Now subtract (b/4) + (b/4) = 6, the sides of the smaller squares, to get x = 10 6 = 4.

7.6 Cardano’s Construction for Solving Cubic Equations

The formula for the roots of cubic equations was first published in the 16th century by Gerolamo Cardano. We will not develop the formula here, but it is interesting that the central idea is based on a geometric construction similar to al-Khwarizmi’s.

b b 1 b b l 4 x 4 1 4 x 4 1 b b b b 4 4 4 4 x x x x b b b b 4 4 4 4 I b i b ' b x% b 1 4 4 1 4 4 1 | eae ee | Fig. 7.3a The areais x7+4(b/4)x = x?+bx Fig. 7.3b The area is x? + 4(b/4)x +

4(b/4)? = x? + bx + (b*/4)

80 7 Solving Quadratic Equations The construction can be obtained very simply using algebra. By multiplication: (a+b) =a? +3a7b + 3ab* + b? = (a? + b°?) +3ab(atb). (7.4)

Geometrically, we start with a cube whose side is a + b so that its volume is (a + b)’. The cube is decomposed into five pieces. The first two are cubes whose sides are a and b with volumes a? (blue) and b? (red), respectively (Fig. 7.4).

The other three parts are boxes (the technical term is cuboid) each with one side of length a + b coinciding with a side of the cube, one side of length a and one side of length b, so that the volume of each of the three boxes is ab(a + b). In Fig. 7.5, there is one box at the left side of the cube (blue), one at the back of the cube (red) and one at the top of the cube (green). By combining the five solids in Fig. 7.4 and Fig. 7.5 we obtain Eq. 7.4.

b,at+b,at+b atb,atb,atb 0,atb,atb O,atb,a fb abb.a Gales atb,d,atb i b,a,a+b 0,a,a atb,a,a b,a,a 0,at+bs0 atb,atb, b,at+b,0 b,a,0 ‘atb,a,0 at+b,0lat+b 0,0,d+b 0,0,a ‘atb,0,a b,0,a 0,0,0 b,0,0 at+b,0,0

Fig. 7.4 (a? +b?) = (a? +b?) phus

7.7 They Weren’t Intimidated by Imaginary Numbers 81

b,atb,atb a+b,a+b,a+b 0,a+b,a+b 0,a+b,a b,atb,a dalar a+b,d,a+b b,a,a+b 0,a,a a+p,a,a b,a,a O,at atb,at b,at+b,0 b,a,0 ‘atb,a,0 a+b,0ja+b 0,0,d¢b 0,0,a ‘atb,0,a b,0,a 0,0,0 b,0,0 at+b,0,0 Fig. 7.5 (a? +b?) =---+3ab(a+b)

7.7 They Weren’t Intimidated by Imaginary Numbers

The history of mathematics demonstrates a progression of concepts that were initially considered to be meaningless, but were eventually understood, accepted and proved to be useful. “Obviously,” since numbers count things, —1, a negative number, is meaningless. “Obviously,” since numbers are ratios of integers (rational numbers), V2, which can easily be proved to be irrational, is meaningless. “Obviously,” V—1, the square root of a negative number, is meaningless since there is no number—integer, rational or real—whose square is —1.

A full understanding of the square roots of negative numbers, to this day called imaginary numbers although they are no less real than real numbers, was not achieved until the nineteenth century. Therefore, it is surprising that already in the sixteenth century, Geralamo Cardano and Rafael Bombelli refused to be intimidated by the concept, and took the first small steps towards understanding these numbers.

82 7 Solving Quadratic Equations Consider the quadratic equation: x? -10x+40=0. (7.5) By the familiar formula (Eq. 7.1):

10+ V100- 1 pose Ve eens

1,72 = 2

Well, we don’t know anything about the square roots of negative numbers and we don’t know what these values are, but like Cardano we do know by Thm 7.1 that:

trp = (5 + V=15) + (5 - V-15) = 10 = -b rir = (5+ V-15)(5 V-15) = 25 5V-15 + 5V-15 - (-15) = 40 =c.

which correspond with the coefficients of the quadratic equation Eq. 7.5. It is rather intuitive that V-15 + (V-15) = 0 even if we know nothing about V-15, and, similarly, it is rather intuitive that V-15 - -(V-15) = —(—15) = 15 even if we don’t know what V-15 is.

Consider now the cubic equation:

x- 15x-4=0. (7.6)

It is not hard to observe that 4 is a root, but how can it be computed? Cardano’s

formula gives the root: = 2+ 11V-1+42-11V-1, (7.7)

a quite complicated formula that bears no obviously relation to 4. Bombelli courageously performed the following computation (see Eq. 7.4):

(2+ V-1)? = 8+3- 4V-1 +3- 2(-1) +(-1V-1) =24+11V-1 (2 -= V-1)? = 8-3 -4V-1+3-2(-1) - (-1V-1) =2-11V-1,

and by Eq. 7.7:

E

= Je +V + 2- vay? = (2+ V-1) + (2 - V-1) =4

7.8 Lill’s Method and Carlyle’s Circle 83

Fig. 7.6 Lill’s method on x? 4x + 3

7.8 Lill’s Method and Carlyle’s Circle

Lill’s method can be applied to solve quadratic equations.2 As an example we use Eq. 7.2 which gives the roots of a quadratic equation obtained by factorization:

x +bx+ce =x —4x43 =(x-1)(x-3).

Applying Lill’s method results in the paths shown in Fig. 7.6. Check that the angles are correct:

—tan(-—45°) = —1, -—tan(-—71.57°) x -3.

For quadratic equations we can find the points Pı, P2 as the intersections of the line representing the coefficient b and the circle whose diameter is the line connecting the starting point and the end point of the paths (Fig. 7.7). In order for a point on the line b to be a root, the reflection of the line must be 90° and therefore the inscribed angle is subtended by a diameter.

This can also be checked by computation. The center of the circle is the midpoint of the diameter (—1, —2). The length of the diameter is:

(—2)? + (-4)? = V20,

2 This section assumes that you have read about Lill’s method in Chap. 11.

84 7 Solving Quadratic Equations

Fig. 7.7 Constructing a circle to find the roots

2 so the square of the length of the radius is (v20 2) = 5. We need the intersection of this circle and the line x = 1:

(@- (“DY a er? (x? + 2x +1) + (y? +4y 44) =5 y? +4y+3=0

y=-l, -3.

A similar method for solving quadratic equations is the Carlyle circle which predates Lill’s method. Given a quadratic equation x? bx + c (note the minus sign on the linear term), construct points at (0,1) and (b,c). Construct a circle whose diameter is the line connecting the two points (Fig. 7.8). Its intersections (if any) with the x-axis are the roots of the equation.

In the general case, the center of the circle is (b/2, (c (—1))/2) and the length of the diameter is yb? + (c 1)?, so the equation of the circle is:

b-i) pE) = Pee

2 4

For the example, substituting b = 4, c = 3 and y = 0, we see that x = 1 and x = 3 are the roots of the quadratic equation.

7.9 Numerical Computation of the Roots 85

Fig. 7.8 Carlyle circle for x? 4x +3

7.9 Numerical Computation of the Roots

Students learn symbolic computation of roots, derivatives and so on. Today, most computation is performed by computers so symbolic computation is less important. Numerical analysis is the branch of mathematics and computer science that develops accurate and efficient computational methods. The main challenge is to deal with the finiteness of values stored in the computer’s memory. The computation:

0.12 x 0.14 = 0.0168

is easy to do, but: 0.123456789 x 0.123456789

needs eighteen digits to be accurately represented and this cannot be done in a memory word that stores sixteen digits. This error is called a round-off error. An even more serious problem is encountered when floating point arithmetic is performed. Clearly: (0.12 x 10710) x (0.14 x 1078)

would not be computed by writing out all the zero digits. Instead, we multiply the mantissas and add the exponents to obtain 0.0168 x 10718, which is normalized to 0.168 x 107! so that the most significant digit appears after the decimal point, ensuring maximum precision given the fixed size of the mantissa. If the maximum exponent that can be represented is 16 the result simply cannot even be stored. This error is called floating-point underflow.

86 7 Solving Quadratic Equations

The formula for finding the roots of the quadratic equation x? + bx + c is:

-b + Vb? 4c 5

1,72 =

(7.8)

Consider what happens if b = 1000 and c = 4. The roots are:

—1000 + ¥1000000 16 5 f

1,72 =

Depending on the precision of the arithmetic, it is possible that one of the roots is so close to zero that the value stored is zero. Evaluating the quadratic equation gives the surprising result 0? +b -0+4=4=0.

Can we do better? By Eq. 7.3:

ritr =b, rın =c.

If r2 is much less that r1, written r2 « r1, then ~ —b and r2 = c/b. Table 7.1, computed by a computer program, compares the values of the roots computed by these formulas with the values obtained from the traditional formula Eq. 7.8. The value of c is fixed at 4 and the roots for increasing values of b are shown.

Initially, the true values computed by the traditional formula for r2 are more accurate (r2 r2y is negative) but from b = 100000, the computation based upon Eq. 7.3 is more accurate. Such are the surprises of numerical analysis.

Table 7.1 Two computations of the roots of a quadratic equation. rı, r2 are the roots computed by Eq. 7.8. riv, 72, are the roots computed using Eq. 7.3. The errors are r; riy. The values are truncated to four decimal places. Floating-point numbers are written —4e 5 in place of 4 x 1075 because computer programs are normally written as linear sequences of characters.

b ri Fiv Error, r2 ry Error

100 —99.9599 —100 0.0400 —0.04001 —0.04 -1.6012e-05 1000 —999.9959 —1000 0.0040 —0.0040 0.004 —1.6000e -08 10000 —9999.9996 10000 0.0004 0.0004 -0.0004 -1.6270e-11

100000 -99999.9999 —100000 3.9999e-5 -3.9999e-5 4e-5 1.0104e-12 1000000 -—-999999.9999 —1000000 4.0000e-6 -3.9999e-6 4e-6 2.7749e-11 10000000 -—10000000.0 -10000000 3.9860e-—7 -3.9953e-7 4e-7 4.626le-10

7.9 Numerical Computation of the Roots 87

What Is the Surprise?

Poh-Shen Loh’s approach provides a new way of looking at the relation between the coefficients and the roots that one doesn’t see simply by memorizing the traditional formula. What is surprising is that this relation is fundamental in Gauss’s algebraic proof of the constructibility of a regular heptadecagon (Chap. 16).

With the modern dominance of algebraic methods in geometry it is important to be reminded that the reverse once held. As shown by the constructions of Al- Khwarizmi and Cardano, geometric methods were used to obtain results in algebra. Lill and Carlyle both developed geometric methods for solving quadratic equations. Considerations of numerical computation on computers will surprise students who have not experienced it before.

Sources

Poh-Shen Loh’s method is from [28, 29]. Al-Khwarizmi’s construction is from [6, Chapter 1] and [32]. Cardano’s construction can be found in [6, Chap. 1]. For the colorful history of the development of Cardano’s formula see [52]. The early attempts at computing with imaginary numbers are from [6, Chapter 2]. Lill’s method and Carlyle’s circle can be found in [61] together with a discussion of numerical computation of the roots.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

A Check for

Chapter 8 ta Ramsey Theory

Ramsey theory is a branch of combinatorics that asks questions of the form: How large must a set be so that if it is divided into subsets, at least one subset has a certain property? Results in Ramsey theory are difficult to prove and there remain many open problems. In this chapter we present simple cases of four problems to give a taste of this fascinating subject: Schur triples (Sect. 8.1)—triples of integers such that a + b = c, Pythagorean triples (Sect. 8.2)—triples of integers such that a? + b? = c?, van der Waarden’s problem (Sect. 8.3) which concerns sequences of numbers, and Ramsey’s theorem (Sect. 8.4) on coloring graphs. Section 8.5 shows how the probabilistic method in combinatorics can be used to develop a lower bound for Ramsey numbers.

The Pythagorean triples problem was recently solved with the aid of computers, using a relatively new method call SAT solving. For readers familiar with proposi- tional logic Sect. 8.6 gives an overview of how this is done.

Section 8.7 describes Pythagorean triples as known to the Babylonians four thousand years ago.

Terminology: monochromatic means of the same color.

8.1 Schur triples

Definition 8.1 Given any decomposition of the set of positive integers: S(n) = {1,... n}

into two disjoint subsets S1, S2, do there exist {a,b,c} © S1 or {a,b,c} C S2 (or both) such that a < b <c and a +b = c? If so, the set {a, b, c} is called a Schur triple.

Example 8.1 For n = 8, in the decomposition:

S1 = {1,2,3,4}, So = {5,6,7,8}, (8.1)

© The Author(s) 2022 89 M. Ben-Ari, Mathematical Surprises, https://doi.org/10.1007/978-3-031-13566-8_8

90 8 Ramsey Theory

the set includes the Schur triple {1, 2, 3}. However, the decomposition:

Si = {1, 2,4, 8}, 85 = {3,5,6,7}, (8.2)

does not contain a Schur triple, as you can check by enumerating all the triples in each subset.

Theorem 8.1 Jn all decompositions of S(9) = {1,...,9} into two disjoint subsets, at least one subset contains a Schur triple.

Of course we could check the 2? = 512 decompositions of S(9) into two disjoint subsets, but let us try come up with a more succinct proof.

Proof We try to construct a decomposition that does not contain a Schur triple and show that the constraints of the problem make this impossible. Start by placing 1 and 3 into the subset S1. 2 must be placed in S2 because 1 + 2 = 3 and we are trying to construct a decomposition that does not contain a Schur triple. Similarly, 4 must be placed in S2 because 1 + 3 = 4. Continuing, 6 is placed in because 2+ 4 = 6 and 7 is placed in Sz because 1 + 6 = 7. However, 3+ 6 = 9 and 2 +7 = 9, so 9 must appear in both and S2, a contradiction. The sequence of inferences is shown in the following table:

S1 So

1,3

1,3 2

1,3 2,4 1,3,6 2,4 1,3,6 2,4,7 1,3,6,9 2,4,7 1,3,6,9 2,4,7,9

Backtracking, we search for a decomposition where 1,3 are in different subsets. If we place 5 into S2, a sequence of inferences again leads to a contradiction because 9 must appear in both subsets. The reader should justify each of the inferences shown

in the following table:

Sy So

1 3

1 3,5

1,2 3,5

1,2,8 3,5

1,2,8 3308 2 1,2,8 3,5,7,9 1,2,8 3,5,6,7,9 1,2,8,9 3,5,6,7,9

8.2 Pythagorean Triples 91

Backtracking again, we try to place 5 into S4, but that also leads to a contradiction, as shown in the following table:

S1 So

1 3

1,5 3

1,5 3,4 1,5 3,4, 6 1,2,5 3,4, 6 1,2,5 3,4,6,7

1,2,5,7 3,4, 6,7 If follows that there is no decomposition that does not include a Schur triple. o

Issai Schur proved the following theorem:

Theorem 8.2 (Schur) For every k > 2 there is a smallest n such that in any disjoint decomposition of S(n) into k subsets, at least one of the subsets must contain a Schur triple.

8.2 Pythagorean Triples

Definition 8.2 Given any decomposition of the set of positive integers: S(n) = {1,...,n}

into two disjoint subsets S1, S2, do there exist {a,b,c} © S1 or {a,b,c} © S2 (or both) such that a < b < c and a? + b? = c*? If so, {a, b, c} is called a Pythagorean triple.

Example 8.2 For n = 10, in the decomposition into even and odd numbers: = {1,3,5,7,9}, S2 = {2,4,6, 8, 10},

there are no Pythagorean triples in but {6, 8, 10} in Sz is a Pythagorean triple since + = 10°.

Marijn J.H. Heule and Oliver Kullmann proved the following theorems. Their method of proof is discussed in Sect. 8.6.

Theorem 8.3 Foralln < 7824, there is some decomposition of S(n) into two disjoint subsets such that both subsets do not contain a Pythagorean triple.

92 8 Ramsey Theory

Theorem 8.4 For all n > 7825, in all decompositions of S(n) into two disjoint subsets at least one subset contains a Pythagorean triple.

It is impossible to check all 278*° decompositions of 5(7825). If we could check one decomposition every microsecond, 2785 microseconds ~ 106° years, while the estimated age of the universe is only about 10!° years.

8.3 Van der Waerden’s problem

Consider the sequences of eight colored dots in Fig. 8.1. In the top sequence there are red dots at positions (1, 2,3) and blue dots at positions (4,5, 6). In each case, the positions form an arithmetic progression. Similarly, in the middle sequence the red dots at positions (1,3,5) form an arithmetic progression. However, in the bottom sequence there is no set of three monochromatic dots whose positions form an arithmetic progression. Triples of red dots are at positions (1,2,5), (1, 2,6), (2,5, 6), none of which form arithmetic progressions, and similarly for the blue dots.

Fig. 8.1 van der Waerden’s problem for eight colored dots

With nine dots any coloring must contain a sequence of three monochromatic dots that form an arithmetic progression. For example, let us add a red dot or a blue dot at the end of the bottom sequence in Fig. 8.1 to obtain the sequences in Fig. 8.2. In the top sequence there are red dots at positions (1,5, 9), an arithmetic progression, and in the bottom sequence there are blue dots at positions (7, 8,9), also an arithmetic progression.

Bartel L. van der Waerden posed the following problem: For any positive integer k, what is the smallest number n such that any sequence of n colored dots must contain a sequence of k monochromatic dots that form an arithmetic progression? For k = 3, n = 9, as demonstrated above for one decomposition. The next result is more difficult to show: for k = 4, n = 35.

8.4 Ramsey’s Theorem 93

1 2 3 4 5 6 7 8 9 oeoeoeoeeeoe#eoe ®@

Fig. 8.2 van der Waerden’s problem for nine colored dots

8.4 Ramsey’s Theorem

Color the edges of Ks, the complete graph on 5 vertices, with two colors as shown in Fig. 8.3a. There are no monochromatic subgraphs K3 (triangles) in the graph. Fig- ure 8.3b shows one coloring of K6 and it is easy to see that there are monochromatic triangles AACE and ABDF. In this section we prove a simple case of a theorem by Frank P. Ramsey on the existence of subsets with a certain property.

Fig. 8.3a A coloring of Ks with two colors Fig. 8.3b A coloring K6 with two colors

Definition 8.3. R(k), the Ramsey number for k, is the smallest number n such that in any coloring of Kn, the complete graph on n vertices, with two colors there is a monochromatic complete subgraph Kx.

Theorem 8.5 (Ramsey) R(3) = 6.

Proof Figure 8.3a shows that R(3) > 5. To show that R(3) < 6, consider any vertex v in K6. v is connected to five other vertices, and when the edges are colored with two colors there must be at least three monochromatic edges incident with v.

94 8 Ramsey Theory

Fig. 8.4a One vertex of K6 Fig. 8.4b Monochromatic triangles in K6

In Fig. 8.4a, AB, AC, AE are colored red. Since the graph is complete all the vertices are connected, so if any one of the edges BC, BE, CE is colored red, say BE, a red triangle AABE is formed. Otherwise, all three edges of these edges are

colored blue and they form a blue triangle (Fig. 8.4b). oO

The theorem can be generalized to any number of colors, as well as to colorings where the sizes of the subgraphs are not the same. R(r, b, g) is the smallest complete graph such that in any coloring with three colors there must be complete subgraphs with r red edges, b blue edges and g green edges.

8.5 The Probabilistic Method

The only known non-trivial Ramsey numbers are R(3) = 6 and R(4) = 18. In 1947 Paul Erdős developed the probabilistic method and used it to show lower and upper bounds on R(k). Subsequent research has improved both bounds, but this is still a significant research area since the bounds are not tight. For example, it has been proved that 43 < R(5) < 48 and 798 < R(10) < 23556. In this section elementary probability is used to obtain a lower bound on R(k).

To show that there exists an element of a set S that has property A, prove that the probability of a random element of S having property A is greater than zero. It is important to understand that the method is non-constructive: it just proves that such an element exists but does not construct one. Although from Thm. 8.5 we know that R(3) = 6, let us use the probabilistic method to obtain a lower bound for R(3).

8.5 The Probabilistic Method 95 Theorem 8.6 (Erdős) R(3) > 4.

Proof Given a random coloring of K, by the two colors red and blue, consider an arbitrary subgraph K3, that is, an arbitrary triangle with (5) = 3 sides. The probability that all sides are colored red is 27°, as is the probability that all sides are colored blue, so the probability that the triangle is monochromatic is 27? +27? = 27? = 1/4. The number of triangles in K, is (5) so P(n,3), the probability that some triangle contained in a random coloring of K, is monochromatic, is:

P(n, 3) = 4 i

If P(n,3) < 1 then its complement P(n,3) = 1 P > 0, that is, the probability that a random coloring of K, does not contain a monochromatic triangle is greater than zero, So at least one must exist.

The following table shows P(n, 3) for several values of n, and whether the value of P(n, 3) proves that there exists a coloring with no monochromatic triangle:

n P(n,3) Exists

3 3/4 yes

4 5/6 yes 5 =3/7

oO

At first glance the result is strange because Fig. 8.3a shows that there exists a coloring of Ks with no monochromatic coloring. However, the probabilistic criterion is sufficient but not necessary; it is a lower bound, meaning that R(n) > 4 which is true because Thm. 8.5 showed that R(n) = 6.

The same proof works for arbitrary k, so the probability of the existence of a coloring of K, with no monochromatic complete graph Kx, is:

P(n, k) = H 2-27-G),

peoa) -Cj

P(6,4) = (32 15)/32 = 17/32 P(7,4) = (32 35)/32 = —3/32.

For k = 4:

If follows that R(4) > 6 which is much less than the known value R(4) = 18.

96 8 Ramsey Theory

8.6 SAT Solving

SAT solving is a method for solving problems that works by encoding a problem as a formula in propositional logic and then using a computer program to check the truth value of the formula. Advances in algorithms and implementations have made SAT solving a viable approach for problem solving. We give an overview of SAT solving and explain how it can be used to solve the mathematical problems described in this chapter. The reader is assumed to have an elementary knowledge of propositional logic as summarized in Def. 8.4.

8.6.1 Propositional Logic and the SAT Problem

Definition 8.4

e A formula is composed of atomic formulas or atoms connected by the proposi- tional operators V (disjunction, “or’’), A (conjunction, “and’’), (negation, “not’).

e A formula is given an interpretation by an assignment of T or F to each atom. Evaluating a formula in an interpretation results in its truth value T or F.

e A formula is satisfiable if and only if there is an interpretation that makes its truth value T. Otherwise, the formula is unsatisfiable.

e A formula is in conjunctive normal form (CNF) if and only if it is composed of a conjunction of subformulas each of which is a disjunction of literals (atoms or negations of atoms).

The following formula is in CNF: (ap VqVrr) A pyr) A Gr) A (pV qgvar).

The SAT problem is to decide if a given formula in CNF is satisfiable or not. A SAT solver is a computer program that can solve the SAT problem. Most SAT solvers are based on the DPLL algorithm which goes back to the 1960’s, but recent developments have made very significant improvements to the algorithm. Highly optimized implementations of these algorithms have made SAT solvers an important tool for solving problems in many fields including mathematics.

8.6 SAT Solving 97

8.6.2 Schur triples

Let us encode the Schur triples problem S(8) as a formula in CNF. The formula will be satisfiable if and only if there is a decomposition of a set S into disjoint subsets S1, S2 such that neither S$; nor S2 contains a Schur triple. There is an atom p; for each of the numbers | < i < 8. The intended meaning of an interpretation for the formula is that it assigns T to p; if i is in the first subset S4 and it assigns F to p; if i is in the second subset S2. To show that in all decompositions neither subset contains a Schur triple, the interpretation must ensure that for each possible Schur triple at least one atom is assigned T and one atom is assigned F.

For example, {2,4, 6} is a Schur triple so at least one of the three integers must be in and at least one of them must be in S2. Therefore, p2 V p4 V pe must be true and also =p2 V a=p4 V ~pe must be true. There are 12 possible Schur triples so the CNF formula is:

(ap V ap2 V ap3) A (=p, V aps V mp4) A (=p, V ap4 V aps) A (ap, V aps V ape) A (pı V ape V mp7) A (pı V ap7 V aps) A (p2 V aps V aps) A (p2 V ap4 V ape) A (p2 V aps V ap7) A (p2 V ape V aps) A (p3 V ap4 V ap7) A (ps V aps V =ps).

(pı V p2 V ps) (p1 VY p3 V pa) (pı VY p4 V ps) (pı V Ps V Po) (p1 V Po V p7) (pı Y p7V ps) (p2 V p3 V ps) (p2 V Pa V po) (p2 V Ps V p7) (p2 V Po V Ps) (p3 V p4 V p7) (p3 V Ps V ps)

(8.3)

> > > T e a a a em

When a SAT solver is given this formula it answers that the formula is satisfiable under either of the interpretations:

P2 P3 P4 P5 Po P7 P8 F F T F T T T F De OE EOE Be dP OE A

One interpretation corresponds to the decomposition in Eq. 8.2: = {1,2,4, 8}, S2 = {3,5,6,7}, while the other corresponds to the symmetrical decomposition = {3,5,6,7}, So = {1,2,4, 8}.

98 8 Ramsey Theory

For S(9), four pairs of subformulas are added for the additional possible triples:

(p1 V ps V po) ^ (api V apg V apo) A (p2 V p7V po) ^ (=p2 V ap7 V apo) A (p3 V Po V po) ^A (=p3 V “p6 V apo) A (p4 V ps V po) ^A (apa V aps V apo).

When the SAT solveris given this formula, it answers that the formula is unsatisfiable, meaning that no decomposition has no Schur triple. Removing the double negative, this states that in every decomposition of S(9) there exists a Schur triple.

8.6.3 Pythagorean Triples

Heule and Kullmann solved the Pythagorean triple problem using a highly optimized SAT solver. There was a significant difference in efficiency between finding a decom- position that does not have Pythagorean triples (you just need one decomposition), and showing all that decompositions have a Pythagorean triple (you have to check all of them). To show that for all S(n), 1 < n < 7824, there is a decomposition with no triple took only one minute of computing time, whereas to show that every decomposition of $(7825) has a triple took about two days of computing time for a computer with 800 cores (processors) working in parallel, altogether 40, 000 hours of computing time.

The use of computers in mathematics naturally raises the question: Can we trust a proof generated by a computer? Of course, even “ordinary” mathematical proofs can be incorrect (Sect. 4.7), but our experience with frequent computer bugs, as well as the opaqueness of large computer programs, makes us more sensitive to potential errors in computer-generated proofs.

One approach to increasing confidence in the correctness of a computer-generated proof is to use two or more programs, written independently by two or more re- searchers. If the multiple programs are written in different programming languages and for different computers and operating systems, this lessens the possibility of a bug in the computer hardware and software.

Heule and Kullmann’s SAT solver wrote out a log of the steps in the proof so that it could be examined for correctness. The log was so massive, 200 terabytes, that it was impossible to examine directly. To put this into perspective, 200 terabytes is 200,000 gigabytes while your computer might have an internal memory of 16 gigabytes and a solid-state disk of 128 gigabytes. Instead, they wrote a small program to verify the correctness of the data in the log. To ensure that this program was correct, they wrote a formal proof using the Coq proof assistant that supports and checks the work of mathematicians without totally automating the proof process.

8.6 SAT Solving 99

8.6.4 An Overview of the DPLL Algorithm

The first algorithm that one learns for SAT solving is truth tables. Given a for- mula A in propositional logic with n different atoms, there are 2” interpretations since each atom can be independently assigned T or F. For each interpretation it is straightforward to compute the truth value of A using the definition of the propo- sitional operators. However, to check 2” interpretations is very inefficient for even moderately large n.

The DPLL algorithm works by incrementally assigning T or F to an atom and then attempting to evaluate the formula. For example, given A = p Aq A-r, if p is assigned F then A evaluates to F, regardless of the assignments to q and r, and there is no need to perform further evaluations. Similarly, A = p V q V ~ar evaluates to T if p is assigned T, regardless of the assignments to q and r.

The efficiency of DPLL comes from unit propagation. Consider part of the formula for Schur triples:

(p1 V p2 V p3) A (api V mp2 V aps) A (pi V p3 V pa) A (p1 V aps V mp4) ^ (p3 V p4 V p7) A (>p3 V apa V ap7) ^

(p3 V ps V ps) A (p3 V aps V aps) -

Suppose that we have assigned F to p1, p2. The first subformula reduces to the unit formula consisting of the single atom p3. If the formula is to be satisfied, we must assign T to p3 and all the subformulas:

(pı V p2 V p3), (pı V p3 V p4), (P3 V pa V p7), (p3 V ps V ps),

immediately evaluate to T.

Since —p3 evaluates to F, each subformula containing ~p3 can be satisfied only if some other literal in the subformula is assigned T. In =p3 V aps V aps, either p5 or pg must be assigned F so that either ~p5 or —~ps evaluates to T.

This analysis shows that once p1, p2 have been assigned F, the formula in Eq. 8.4 is satisfiable if and only if (~ p4 V a=p7) A (aps V aps) is satisfiable. By performing the propagation of p3 on all the subformulas of Eq. 8.3, the formula is reduced to:

(p4 V ps) A (p4V po) A (ps V po) A (ps V pI) A (p6 V p7) ^ (pe V ps) A (p7V ps) A (p4 V ap7) A (aps V aps).

One more assignment of F to p4 results in a satisfying interpretation which we have found after only three arbitrary assignments.

100 8 Ramsey Theory

8.7 Pythagorean Triples in Babylonian Mathematics

This section is a digression from Ramsey theory; it is included to give a taste of the rich theory of Pythagorean triples and to demonstrate the depth of mathematical knowledge in the ancient world. Pythagorean triples were known in Babylonian mathematics since at least 1800 BCE.

Definition 8.5 A primitive Pythagorean triple is a set of three positive integers {a, b, c} such that a? + b? = c? and a, b, c have no common factor greater than 1.

Example 8.3 {3,4,5} is a primitive Pythagorean triple but {6, 8, 10} is a Pythagorean triple that is not primitive since 2 is a common factor.

A cuneiform tablet called Plimpton 322 is one of the earliest examples of Babylonian mathematics. It lists fifteen primitive Pythagorean triples by giving a and c. Table 8.1 displays four of these triples, together with the computed values of b and other values that will be discussed below. Historians of mathematics have proposed several explanations for how these triples were found. One explanation is that Euclid’s formula was used to obtain the triples from a pair of generating numbers.

Theorem 8.7 (Euclid) {a, b, c} is primitive Pythagorean triple if and only if there exist two positive integers u,v, called generating numbers, such that:

lu>v

2. they are not both odd

3. they have no common factor greater than 1

4. the following relations hold between {a, b, c} and u, v:

a=u-v’, b = 2uv, c=u +v.

Proof By computation it follows immediately that if {a, b, c} can be expressed as required in item 4 they form a Pythagorean triple:

Table 8.1 Babylonian triples from the Plimpton 322 tablet

a factors b Dyactors c u Ufactors v Vyactors

119 7-17 120 23.3.5 169 12 22-3 5 5 4601 43 - 107 4800 26.3.5? 6649 75 3-5? 32 2 12709 71-179 13500 2.33.53 18541 125 53 54 2.33

65 5-13 72 23. 3? 97 9 3? 4 2?

8.7 Pythagorean Triples in Babylonian Mathematics 101

a’ +b? = (u? v?’ + (uv)? = uf 2(uv)? + vf + 4(uv)? = uf +2(uv)* + v4 =Rv e.

The proof of the other direction is more complicated and is omitted. m

If it is true that the Babylonians used Euclid’s formula, the question remains: How did they discover the generating numbers u, v?

Each row of Table 8.1 displays factors and Dyactors, the factorizations of a and b, respectively, showing that they have no common factors. The reader is invited to check that c has no common factor with a,b so the triples are primitive. The generating numbers u, v and Uyactors, Vfactors are also displayed. Not only do they not have any common factors as required by Thm. 8.7, but the only factors greater than 1 in u and v are powers of 2, 3,5.

Definition 8.6 A Babylonian triple is a primitive Pythagorean triple such that the only prime factors of u, v are 2,3, 5.

The reason that the Babylonians restricted themselves to these factors is that they used the sexagesimal or base 60 = 2 - 2-3-5 number system whose prime factors are 2,3 and 5.

For readers who are not familiar with non-decimal number systems, here is a brief overview of the concept. The “number” 12345 is a shorthand for the number:

(1 x 10*) + (2 x 10°) + (3 x 10°) + (4 x 10!) + (5 x 10°).

This number system is called the decimal or base 10 number system. There are

ten digits 0,1,2,...,8,9 for the coefficients of the powers, and the powers are

represented by the places of coefficients with powers increasing from right to left. The number could also be represented in the binary or base 2 number system by:

12345 = 8192+40964+324+164+841 = 2!342!742542442342° = 11000000111001 .

Binary notation uses two digits 0, 1 for the coefficients and the powers of two are indicated by the places of the coefficients.

Another popular number system is the hexadecimal or base 16 number system which is used in computing. For this number system we need 16 “digits” and the convention is to use 0,1,2,...,8,9, A, B,C, D, E, F.

The base 60 number system is not as unfamiliar as it may seem, because we rep- resent time, geographical coordinates and angles in that system. We are comfortable carrying out computations such as (1 hour 40 minutes) plus (1 hour 30 minutes) equals (3 hours 10 minutes).

102 8 Ramsey Theory

Table 8.2 Babylonian triples in base 60

a C

(1) (59) (2)(49) (1)(16)(41) (1) (50) (49) (3)(31)(49) (5) {09){01)

(1) (05) (1)(37)

Table 8.2 shows the values of a,c that appear in the tablet in base 60 notation where (d) represents the d’th “digit” for O < d < 60. The reader can check that these values are the same as the decimal values given in Table 8.1, for example:

(3 x 607) + (31x60!) + (49 x 60°) = 12709 (5x 602) + (9x60!) + (1x 60°) = 18541

The Babylonians did not have 60 distinct symbols for the digits. Instead, they used a hybrid system where the coefficients were represented with two symbols: one for the tens coefficient and the other for the ones coefficient, and the places of the coefficients indicated the powers of 60. Using 9 for the tens coefficient and ¢ for the ones coefficient, the decimal number (38 x 60) + (16 x 60°) = 2296 would be

represented as:

9000 OO FVV $9099 V OHO0N. .

What Is the Surprise?

Frank P. Ramsey’s theorem appeared to be a minor result in combinatorics. Sur- prisingly, the theorem was the foundation of an entirely new and challenging field of mathematics with many open problems. The nature of Ramsey theory is also surprising: if a set is large enough there exist regularities in its subsets.

I was introduced to Ramsey theory by the article by Marijn J. H. Heule and Oliver Kullmann on Pythagorean triples whose proof bears some similarity to the proof of the four-color theorem: the use of massive computing resources that is only successful after theoretical advances. Hence the title of their article: The Science of Brute Force.

Problems in combinatorics ask for specific numerical values, for example, R(n) must be a specific positive integer. It is surprising that probabilistic methods have proved so fruitful in obtaining results in this field.

8.7 Pythagorean Triples in Babylonian Mathematics 103

We tend to think that humans are smarter today then they used to be thousands of years ago. It can be a surprise to find out that four thousand years ago Babylonian mathematics was sufficiently advanced to discover that {12709, 13500, 18541} is a Pythagorean triple.

Sources

For an overview of Ramsey theory see [9], while an advanced presentation can be found in [20]. The section on the probabilistic method is based on [43, Example 40] and [9, Chapter 4]. A database of Ramsey numbers can be found in [34].

The method of proof of the theorem on Pythagorean triples is explained in detail in [23]. See [4] for an introduction to logic and to SAT solving. The archive of my SAT solver for education [5] contains formulas for Schur triples, Ramsey graphs and van der Waerden’s problem.

Section 8.7 is based upon [60], [42]. The sexagesimal number system is described in [63].

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

| D | Check for

Chapter 9 isa Langford’s Problem

C. Dudley Langford noticed that his son had arranged colored blocks as shown in Fig. 9.1. There is one block between the red blocks, two blocks between the blue blocks and three blocks between the green blocks.

Fig. 9.1 Layout of blocks for Langford’s problem

Definition 9.1 (Langford’s Problem L(7)) Given the multiset! of positive integers: {1,1,2,2,3,3,...,n,n},

can they be arranged in a sequence such that for 1 < i < n there are i numbers

between the two occurrences of i?

Figure 9.1 shows that 312132 is a solution for L(3).

Section 9.1 restates Langford’s problem using a mathematical formalism that facilitates solving the problem. Section 9.2 characterizes values of n for which L(n) is solvable and presents two proofs of the theorem. The first proof which is relatively simple uses the technique of double-counting: counting the same value in two different ways and equating the resulting formulas. The second proof is a clever induction but the “bookkeeping” involved requires careful attention to the details. Section 9.3 works out the solution for L(4).

1 A multiset or bag is like a set except that there may be more than one occurrence of an element.

© The Author(s) 2022 105 M. Ben-Ari, Mathematical Surprises, https://doi.org/10.1007/978-3-03 1-13566-8_9

106 9 Langford’s Problem

9.1 Langford’s Problem as a Covering Problem

Langford’s problem can be posed using an array. For L(3) there are six columns, one for each position at which the six numbers can be placed. There is one row for each possible placement of one of the numbers, that is, the two occurrences of k must have k numbers between them. There are four possible placements of 1’s, three of 2’s and two of 3’s:

1);2);3)4)5]6 1} 1 1 2 1 1 3 1 1 4 1 1 5 |} 2 2 6 2 2 7 2 2 8 |) 3 3 9 3 3

To solve the problem we need to select one row for the 1’s in the sequence, one row for the 2’s and one row for the 3’s, such that if we stack these rows on top of each other, no column contains more than one number.

Row 9 needed not be considered because of symmetry: starting with row 9 just gives the reversal of the sequence obtained by starting with row 8.

Row 8 is the only one containing 3’s so it must be chosen and the sequence is 3-3. Any row with numbers in columns 1 and 5 can no longer be used, because only one number can be placed at each position. Let us denote the permissible and forbidden rows by:

1,2, 2,4, 3, 6,7,8.

Row 7 is the only remaining row containing 2’s so it must be chosen and the sequence is 3.2.32. Deleting rows that can no longer be used gives:

A, 2, 3, A, Í, 6, 7, 8 $ Choosing the only remaining row, row 2, gives the solution 312132: Jı]2J]4]5]6 2 1 1 7 2 2 8 |) 3 3

The analysis has shown that this is the only solution, except for the symmetrical solution obtained by starting with row 9.

9.2 For Which Values of N Is Langford’s Problem Solvable? 107

9.2 For Which Values of N Is Langford’s Problem Solvable?

Theorem 9.1 L(n) has a solution if and only if n = 4k orn = 4k +3.

We prove the forward direction of the theorem. Proof 1 shows that if L(n) has a solution then n = 4k or n = 4k + 3. Proof 2 shows the contrapositive: if n = 4k + 1 or n = 4k +2 then L(n) has no solution.

Proof (1) If the first occurrence of the number k is at position ig, the second oc- currence is at position ig + k + 1. For example, in 312132, the solution for L(3), choosing k = 2 gives ix = 3 andik +k+1=34+2+1=6.

Sn, the sum of the positions of all the numbers, is:

ep S rere

k=l k=1 yee 1) k=1 k=l

Faery saa

=2 k=l 2 But Sn is simply 1+2+3+---+2n, so: m O 2n(2n +1) Sn = k= 7 oe b k=1

Equating the two formulas for S;, gives:

“1. n(n+3) 2n(2n+1) 2X it = = E

k=1 2

2 y . 1 (2n(2n+1) n(n+3) l = = aK a\ 2 2

o 3n? =n = p~

The left-hand side is an integer since it is the sum of integers (the positions), so the right-hand side must also be an integer. When is 3n? —n divisible by 4? Factoring 3n? n gives n(3n 1).

If n is a multiple of 4, the product is divisible by 4.

When is 3n 1 divisible by 4? Any integer n can be expressed as n = 4i + j for j =0,1,2,3. If 3n 1 is divisible by 4, then so is 3(4i + j) 1 = 12i + 37 1. 12i is divisible by 4. For j = {0, 1,2,3}, 37 1 = {-1, 2,5, 8} is divisible by 4 if and only if j = 3, that is, n = 4i +3. oO

108 9 Langford’s Problem

To introduce the idea of the second proof consider what a solution for n = 4 might look like. In the following tables the positions of the occurrences of 4 are 1 and 6, and the positions of the occurrences of 2 are 5 and 8. In both cases, one position is odd and the other is even.

1[2[3[4[5[6|]7]/8 17/2/3/4]/516/7]8 A) 4S (Tee lals Aa |S) Tiere sia

Let k = 2m be an even number. If i is the position of the first occurrence of k, then the position of the second occurrence is i+ k + 1. The sum of the positions is:

i+(i+k+1) =2i+2m+1=2(i+m) +1,

which is an odd number. For the sum of two numbers to be odd, one must be odd and the other even.

Let us now check the positions of the occurrences of odd numbers. The positions of the occurrences of | are 2 and 4, both even numbers, and the positions of the occurrences of 3 are 3 and 7, both odd numbers.

1[2][3]4]/5]6]7/8 Pie sas eas Aq SS ee al ANS la Sa a2 * * * *

Let k = 2m + 1 be an odd number. The sum of the positions is: i+(it+tk+1)=2i+2m4+14+1=2(i+m+1),

which is an even number. For the sum of two numbers to be even, both must be odd or both even.

The positions 1,2,...,2m 1,2n contain an equal number of even and odd positions. The two occurrences of a number in a row “cover” two positions. When the set of rows covers all the positions, they must cover an equal number of even positions and odd positions. Define the parity of a set of rows to be the difference between the number of even and odd positions covered. Initially, the parity is zero, and if the problem has a solution, the set of rows in the solution also has zero parity.

When two occurrences of an even number are placed, they cover one even position and one odd position, so the parity remains the same:

1 [2]3]4]5] 6 [7[8 1/2/3/4]5 16/7

4/1/;3/1/2] 4 )3)2 4;1]/3};1]2 141/3] 2 -1 +1 -1 +1

9.2 For Which Values of N Is Langford’s Problem Solvable? 109

When two occurrences of an odd number are placed, the parity becomes +2 or —2, so we must be able to associate this pair with a pair of occurrences of another odd number that are placed at positions that balance out the parity:

213] 4 |5]6|]7/8 1l2]3l4l5l6]7 aia [3]l1ı1l2ļ]41312 4|ıl3 lıl2ļ4]3 12 +1 +1 -1 -1

We have shown that there can be a solution to Langford’s problem if and only if there is an even number of odd numbers in {1,...,n}! The theorem claims that if this is true then either n = 4k or n = 4k 1, and if not then either n = 4k 2 or 4k 3.

Proof (2) The proof is by induction. There are four base cases:

e n= 4k —3 = 1. In {1} there is an odd number of odd numbers and there is no solution.

e n=4k—2=2. In {1,2} there is an odd number of odd numbers and there is no solution.

e n=4k-1=3. In {1,2,3} there is an even number of odd numbers and we have seen that there is a solution.

e n=4k —0. In {1,2,3,4} there is an even number of odd numbers and Sect. 9.3 gives a solution.

The inductive hypothesis is that the theorem is true for {1,...,4k j}, k = 1, 0 < j < 3, and we will prove that it is true for n = 4(k + 1) J.

e Add 4k +1 = 4(k + 1) —3 to {1,...,4k}. By the inductive hypothesis for 4k = 4k 0 there is an even number of odd numbers. 4(k + 1) 3 is odd so there is now an odd number of odd numbers and there is no solution.

e Add 4k +2 = 4(k +1) -2 to {1,...,4k + 1}. By the inductive hypothesis for 4k +1 =4(k+1)-3 there is an odd number of odd numbers. 4(k + 1) 2 is even so there is still an odd number of odd numbers and there is no solution.

e Add 4k +3 = 4(k +1) 1 to {1,...,4k +2}. By the inductive hypothesis for 4k +2 =4(k +1) —2 there is an odd number of odd numbers. 4(k + 1) 1 is odd so there is an even number of odd numbers and a solution likely exists.

e Add 4k +4 =4(k +1) -Oto {1,2,...,4k +3}. By the inductive hypothesis for 4k +3 = 4(k + 1) 1 there is an even number of odd numbers. 4(k + 1) 0 is even so there is an even number of odd numbers and a solution likely exists. oO

110 9 Langford’s Problem

9.3 Solution for L (4)

Here is the array for L(4). Try to find the solution yourself.

1/12/3]415]6/7]8 1 | 1 1 2 1 1 3 1 1 4 1 1 5 1 1 6 1 1 7 |2 2 8 2 2 9 2 2 10 2 2 11 2 12 || 3 3 13 3 3 14 3 3 15 3 3 16 || 4 4 17 4 4 18 4 4

By symmetry row 18 may be eliminated.

Choose row 16 and the sequence is 4-4 ... Any row with an element in position 1 or position 6 can no longer be part of the solution.

1,2,3, 4,5, 6, 7,8, 9, 10, 11, 72, V3, 14, 15, 16, y7 Choose row 14 and the sequence is 4-343..

1,2, 3, 4, B, 6, 7,8, 9, YO, 11, y2, V3, 14, YS, 16, y7 Choose row 8. The sequence is 423.243..

1, 2, 2, A, B, $, 7,8, 9, YO, V1, Y2, V3, 14, Y5, 16, y7

All of the choices for 1’s have been eliminated so we must backtrack.

Instead of row 8 choose row 11 and the sequence is 43.2432.

1,2, 3, 4, B, 6, 7, B, 9, YO, 11, ¥2, y3, 14, Y5, 16, y7

Choose row 2 and we have a solution 413124372. Continue backtracking to see if there is another solution.

Instead of row 14 choose row 15 and the sequence is 4.3.4.3.

A, 2,3,4,5, 6, 7,8, 9, VO, V1, V2, ¥3, ¥4, 15, 16, y7

9.3 Solution for L(4) 111

Row 8 must be chosen and the sequence is 42_324.3.

1, 2, B.A, B, $, 7,8, 9, YO, V1, Y2, V3, V4, 15, 16, y7

All of the choices for 1’s have been eliminated so again we backtrack.

Instead of row 16 choose row 17 and the sequence is 4.4... 1, 2,3,4, 3, 6,7, 8,9, YO, 11, 12, V3, V4, 15, Y6, 17

Choose row 15 and the sequence is 4.3.43. 1, 2,3, 4, B, 6, 7, 8,9, VO, V1, Y2, Y3, ¥4, 15, y6, 17

Row 9 must be chosen and the sequence is _423_243.

1, 2, B, 4, B, $, 7, B,9, YO, V1, Y2, V3, V4, 15, Y6, 17

All of the choices for 1’s have been eliminated. We can backtrack one last time.

Instead of row 15 choose row 12 and the sequence is 34__3.4.

1, 2, 2, A, B, $, 7, 8,9, YO, V1, 12, y3, V4, Y5, y6, 17

Again, all of the choices for 1’s have been eliminated.

Therefore the only solution is 41312432.

What Is the Surprise?

The source of the inspiration for a mathematical theorem can be surprising. Langford noticed a pattern in his son’s colored blocks which led to the interesting Thm. 9.1. Students should also be introduced to the fact that a theorem can have many com- pletely different proofs.

Sources

This chapter is based on [35]. [12] shows how to find a solution for n = 4k and n=4k +3.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

A Check for

Chapter 10 wedi The Axioms of Origami

Origami, the art of paper folding, was developed several centuries ago in Japan and now has a worldwide following. In the late twentieth century the mathematical theory of origami was developed. Its foundation is a set of seven axioms, the Huzita—Hatori axioms, named after Humiaki Huzita who formalized the first six axioms and Koshiro Hatori who found the seventh. Jacques Justin published all seven axioms several years before Huzita and Hatori, and Margherita P. Beloch formulated the sixth axiom in 1936. Nevertheless, the axioms as known as the Huzita-Hatori axioms.

In a sequence of three chapters we will explore the mathematics of origami. This chapter presents the axioms, Chap. 11 connects origami with the roots of polynomials and Chap. 12 shows that constructions with origami can solve problems that are impossible using a straightedge and compass.

This chapter contains a section for each of the seven axioms. Following a statement of an axiom and a diagram of the fold it specifies, the equations of the fold and the points of intersection are developed using analytic geometry. A fold can also be defined as a geometric locus, the set of all points satisfying some property. The term fold comes from the origami operation of folding a piece of paper, but here it is used to refer the geometric line that would be created by folding the paper.

Folds result in reflections. Given a point p, its reflection around a fold / results in a point p’ such that / is the perpendicular bisector of the line segment pp’ (Fig. 10.1).

Fig. 10.1 The fold is the perpendicular bisector of the line connecting a point and its reflection

© The Author(s) 2022 113 M. Ben-Ari, Mathematical Surprises, https://doi.org/10.1007/978-3-03 1-13566-8_10

114 10 The Axioms of Origami

10.1 Axiom 1

Axiom 10.1 Given two distinct points = (x1, y1), p2 = (x2, y2), there is a unique fold / that passes through both of them (Fig. 10.2).

N ` ` ` ` `

`

` ` D %

Fig. 10.2 Axiom 1

Derivation of the equation of the fold: The equation of the fold / is derived from the coordinates of and p2. The slope is the quotient of the differences of the coordinates and the intercept is derived from p1:

277

y-a Agg. (10.1) X2 X1

Example 10.1 Let = (2,2), p2 = (6,4). The equation of / is:

4-2 -2= -2 y z302 1 y=>x+l1.

2

10.2 Axiom 2 115

10.2 Axiom 2

Axiom 10.2 Given two distinct points = (x1, y1), p2 = (x2, y2), there is a unique fold / that places onto po (Fig. 10.3).

The fold is the geometric locus of all points equidistant from and p2.

` p2

Fig. 10.3 Axiom 2

Derivation of the equation of the fold: The fold / is the perpendicular bisector of P1P2. Its slope is the negative reciprocal of the slope of the line connecting and p2. l passes through the midpoint between the points:

_yity2 x27%*1 k- =”)

= (10.2) 2 yo-y1 2

Example 10.2 Let = (2,2), p2 = (6,4). The equation of / is:

»- (4) --3 (AS)

y=-2x+11.

116 10 The Axioms of Origami

10.3 Axiom 3

Axiom 10.3 Given two lines l4, l2, there is a fold / that places /; onto / (Fig. 10.4).

The fold is the geometric locus of the points that are equidistant from /; and l2, where the distance from a point to a line is the length of the line segment through the point that is perpendicular to the line. Using congruent triangles it is easy to show that the fold is a bisector of the angle formed by l; and l2.

Fig. 10.4 Axiom 3

Derivation of the equation of the fold:

lı, l are parallel: Let l; be y = mx + and let l2 be y = mx + bo. The fold is the line parallel to /; and l2 that is halfway between them:

+ bi + bo = mx + ———. 7 2

L, l intersect: Let l be y = mıx + by and let l be y = m2x + bo. pi = (xi, yi), the point of intersection of the two lines, is:

10.3 Axiom 3 117

mx; + by = mx; + bo _ b-b;

i m -m

yi = Mıxi + b1.

Example 10.3 Let 1, be y = 2x 2 and let l2 be y = —x + 8. Then p; = (xi, yi) is:

Be 10. oze g A 10 14 1=2: = -2=— x 4.67

: 3 3

The fold is the bisector of the angle formed by /; and l} at their point of intersection. There are two possible folds since there are two pairs of vertical angles. We need to determine the slopes of the angle bisectors. If the angle of line /; relative to the x-axis is 6; and the angle of line lz relative to the x-axis is 62, then the fold is the line which makes an angle of 6, = (@; + 62)/2 relative to the x-axis.

Let = tan 6), m2 = tan 02. By Thm. A.9, ms, the slope of the line making an angle of 0; + @ relative to the x-axis, is:

tan 6; + tan 62 + m2

m, = tan(@; +02) = = . ( l 2) 1 tan 8; tan 62 1- mym

By Thm. A.10, mp, the slope of the angle bisector, is:

6,+6. -1+1 +tan?(01 +02) -1+41+m? mp = tan = = : 2 tan(@, + 62) Ms

Example 10.4 For y = 2x 2 and y = —x + 8, the slope of the angle bisector is:

ME ee Le A S -1 + VI+ 1/3} mp = ZENU = -3 + VTO x -6.16, 0.162.

Let us derive the equation of the fold /y, with the positive slope. From Ex- ample 10.3, the coordinates of the intersection of the two lines are (10/3, 14/3). Therefore:

14 10 3 = (-3+ VLD) ae ei p,- 44- 10V10

2a 3

44- 10V1 y = (-3 + V10)x + HENE ~ 0.162x + 4.13.

118 10 The Axioms of Origami

10.4 Axiom 4

Axiom 10.4 Given a point and a line l4, there is a unique fold / perpendicular to 1, that passes through point p, (Fig. 10.5).

The fold is the geometric locus of all points on the line perpendicular to that passes through pı.

Fig. 10.5 Axiom 4

Derivation of the equation of the fold: Let /; be y = m,x+, and let = (x1, y1). l is perpendicular to /; so its slope is —(1/m1). Since it passes through we can compute the intercept b and write down its equation:

1 y=-—%X1 +b m

= (my, +x1) m 1 + yae yp a.

Example 10.5 Let = (2,6) and let /; be y = 2x 4. The equation of the fold / is:

1 2-642 1 y=-—-x+ SE eds

10.5 Axiom 5 119

10.5 Axiom 5

Axiom 10.5 Given two points p1, p2 and a line 44, there is a fold / that places onto /; and passes through p2 (Fig. 10.6).

Since the fold passes through p2 and p2 is on the perpendicular bisector of pip}, the geometric locus of the reflection of p; is the circle centered at pọ with radius P1P2. The fold is constrained so that the reflection p{ is on the given line /).

Fig. 10.6 Axiom 5

Derivation of the equations of the folds: Let /,; be y = m,x+b, and let = (x1, y1), p2 = (x2, y2). The equation of the circle centered at p2 with radius pj pz is:

(x - x2)? +(y- yo)" =r’, where

r = (x2-41)? + (y2-y1).

120 10 The Axioms of Origami

Substituting the equation of /; into the equation for the circle gives: (x x2)? + ((mix + bi) - y2)? =r?

(x x2)? + (mix + (bi - y) =P’,

and we obtain a quadratic equation for the x-coordinates of the possible intersections:

x (1 + m?) + 2(-x2 + (b y2))x + Gs + (bt 2biy2 + y5) r’) =0. (10.3)

Since a quadratic equation has at most two solutions, for a given pair of points and a line there may be zero, one or two folds. From the solutions x}, x7 we can compute yi> y7 from y = mıx + bı. The reflected points are pi = (x), y1) PY = (x7, y1).

Example 10.6 Let = (2,8), p2 = (4,4) and let l; be y = -4x+ 3. The equation of the circle is (x—4)*+(y—4)? = (4-2)? + (4-8)? = 20. Substitute the equation of the line into the equation of the circle to obtain a quadratic equation for the x-coordinates of the intersections (or use Eq. 10.3):

1 Z (x-4)? + ((-3+3] -4] = 20

2 -4+0 (54) -20=0

5x? 28x -12 =0 (5x + 2)(x -6) =0.

The two points of intersection are:

pi = (-2/5, 16/5) = (-0.4,3.2), p{ = (6,0).

The folds will be the perpendicular bisectors of pip‘ and pip’. Example 10.7 For = (2,8) and p = (—2/5, 16/5) the equation of l j is: —8+6/5) __(-2/5)-2 | _ _ 2+ (-2/5) A 2 ` (16/5)-8 2

1 y= SiE.

Example 10.8 For = (2,8) and p? = (6,0) the equation of l, is:

ala meme rs a

8+0 6-2 2+6 2

10.6 Axiom 6 121

10.6 Axiom 6

Axiom 10.6 Given two points p1, p2 and two lines l1, l2, there is a fold / that places onto and places pz onto ly (Fig. 10.7).

A fold that places p; onto l; is a line p such that the distance from p; to lp is equal to the distance from /; to lp. The geometric locus of points equidistant from a point p; and a line l; is a parabola. p; is called the focus and is called the directrix. A fold is any line tangent to the parabola (Sect. 10.6.3).

Fig. 10.7 Axiom 6

For a fold to simultaneously place onto l; and p2 onto l}, it must be a tangent common to the two parabolas. There may be zero, one, two or three common tangents (Figs. 10.8a, 10.8b, 10.9a, 10.9b).

The formula for an arbitrary parabola is quite complex so we limit the presentation to parabolas whose axis of symmetry is the x- or y-axis.

122 10 The Axioms of Origami

+ Pai . . £ s R : : Pi s x : 7 ry ` : cf . > . F : sts ok a Pa st e $ . . fad Ne . A £ EEUTER IER . `~ 1 + Fed l aa t . * fe ¥ 1 A K 1 se a ee nE E . a . + `~’ . . P2 & sy. P2 > . l ta © at l Seu, ee 2 2 Reese ` ` ` h `~ ` `~

Fig. 10.9a Two common tangents Fig. 10.9b Three common tangents

10.6.1 Derivation of the Equation of a Fold

Let (0, f) be the focus of a parabola with directrix y = d. Define p = f d, the signed length of the line segment between the focus and the directrix.! If the vertex of the parabola is on the x-axis the equation of the parabola is y = x?/2p. To move the parabola up or down the y-axis so that its vertex is at (0, A), add h to the equation of the parabola (Fig. 10.10):

x2

= +h. y 2p

1 We have been using the notation p; for points; the use of p here might be confusing but it is the standard notation. The formal name for p is one-half the latus rectum.

10.6 Axiom 6 123

focus

(0, f) = (0,4) ¢ R

Wey at

vertex p=6 (0, 1) [a=

X-axis

directrix y = —2

y-axis

Fig. 10.10 The elements in the definition of a parabola

Define a = 2ph so that the equation of the parabola is:

2

X a = 10.4 y 2p * 2p (10.4a) x -2py+a=0. (10.4b)

The equation of the parabola in Fig. 10.10 is x? 12y + 12 = 0. Substitute the equation of an arbitrary line y = mx + b into Eq. 10.4b to obtain an equation for the points of intersection of the line and the parabola:

x —2p(mx +b) +a =0 x?” +(—2mp)x + (-2pb +a) =0.

The line will be tangent to the parabola if and only if this quadratic equation has exactly one solution if and only if its discriminant is zero:

(-2mp)? 4-1-(-2pb+a) =0 (10.5a) mp’ +2pb-a=0. (10.5b) This is an equation with variables m, b for the tangents to the parabola. To obtain

the common tangents to both parabolas we must simultaneously solve the equations for the two parabolas.

124 10 The Axioms of Origami

Example 10.9 Parabola 1: Focus (0, 4), directrix y = 2, vertex (0,3). p=2,a=2-2-3 = 12. The equation of the parabola is:

x*-—4y+12=0. Substituting p and a into Eq. 10.5b and simplifying gives: m’ +b-3=0.

Parabola 2: Focus (0, —4), directrix y = —2, vertex (0, —3). p = —2, a = 2 . —2 - —3 = 12. The equation of the parabola is:

x?’ +4y+12=0. Substituting p and a into Eq. 10.5b and simplifying gives: m’ -b-3=0. The solutions of the two equations:

m +b-3=0 m -b-3=0

are m = +V3 x +1.73 and b = 0. There are two common tangents:

Example 10.10

Parabola 1: Unchanged.

Parabola 2: Focus (0, —6), directrix y = —2, vertex (0, —4). p=-—4,a=2--4--4 = 32. The equation of the parabola is:

x? +8y+32=0. Substituting p and a into Eq. 10.5b and simplifying gives: 2m? -b-4=0. The solutions of the two equations:

m+b-3=0 Im -b-4=0

10.6 Axiom 6 125

7 2 are m = efi x +1.53 and b = 7 There are two common tangents:

Example 10.11

Let us now define a parabola whose axis of symmetry is the x-axis. Parabola 1: Unchanged.

Parabola 2: Focus (4, 0), directrix x = 2, vertex (3,0). p=2,a=2-2-3 = 12. The equation of the parabola is:

y —4x+12=0. (10.6)

This is an equation with x and y? instead of x? and y, so Eq. 10.5b can’t be used and we must perform the derivation again. Substitute the equation for a line into Eq. 10.6:

(mx + b)? —4x+12=0 mx? + (2mb —4)x + (b° + 12) =0.

Set the discriminant equal to zero and simplify:

(2mb 4)? 4m?(b* +12) =0

-3m ~mb+1=0. If we try to solve the two equations:

m+b-3=0

-3m —mb+1=0, we obtain a cubic equation with variable m: m? 3m? —3m+1=0. (10.7)

Since a cubic equation has at least one and at most three real solutions, there can be one, two or three common tangents.

The formula for solving general cubic equations is quite complicated, so I used a calculator on the internet and obtained the three solutions:

m = 3.73, m = —1, m=0.27.

126 10 The Axioms of Origami

From the form of Eq. 10.7 we might guess that m = 1 or m = —1 is a solution:

P-3-1°-3-14+1=-4 (-1)? -3-(-1)?-3-(-1)+1=0.

Divide Eq. 10.7 by m (—1) = m + 1 to obtain the quadratic equation m? 4m + 1 whose roots are the other two solutions of the cubic equation m = 2+ V3 ~ 3 .73, 0.27.

10.6.2 Derivation of the Equations of the Reflections

We derive the position of the reflection p| = (xj, y{) of = (xı, yı) around a tangent line l, whose equation is y = m;x + b,. First, find the line /,, with equation y = MpX + bp that is perpendicular to l, and passes through p1:

Next find the intersection p; = (xr, yr) of l; and lp:

Xt xX] m,X,; + by = + Vir EA my Mt

x b +— -»,| mt

ec, my + mt

Yt = m,X;, + by a

Xt =

p: is the midpoint between p; and p}:

Xp +X; ,

ae x, 52x =X], + 7

MS eas yi =2y-y1

10.6 Axiom 6 127

Example 10.12 Let l, be y = 3x + 0 and let = (0,4):

ee 4+ -0 nei 3 B+] ieee y, = V3V3 +0 =3 xi = Day = 2V3 x 3.46

yi =2yr -y1 =2.

10.6.3 Tangents to a Parabola

We wish to prove that the folds of Axiom 6 are tangents to the parabolas. Figure 10.11 shows five points p;, i = 1,...,5, each point p; at a distance a; from both the focus and the directrix. Drop perpendicular lines from p; to the directrix and denote the intersections of these lines with the directrix by p}. By Axiom 2 there are folds l; through p; that place p onto the directrix. The points p; are the reflections of p around the folds. The figure shows the fold /; through and the reflection p}.

Pi . directrix y=-f y-axis

Fig. 10.11 The tangent as a geometric locus

128 10 The Axioms of Origami

Fig. 10.12 The proof that the fold is a tangent

Theorem 10.7 The folds of Axiom 6 are the tangents to the parabolas that are the loci of the points equidistant to the points pı, p2 and lı, l2, respectively.

Proof In Fig. 10.12, the focus is p and the directrix is d. p’ is a point on the directrix and / is the fold that reflects p onto p’. Let s be the intersection of pp’ and J. Then ps = p's =a and l L pp’ since l is the perpendicular bisector of pp’.

Let r be the intersection of the line perpendicular to d through p’ and the fold L. Then Apsr = Ap’sr by side-angle-side. It follows that pF = p'r = b sor isa point on the parabola. Choose a point p” on the directrix that is distinct from p’ and assume that the fold / also reflects p onto p”. Let q be the intersection of the perpendicular to d through p” Denote gp” = e. If q is a point on the parabola then e = qp” = Gp = c, but c is the hypotenuse of the right triangle Agp’’ p’ and it is not possible that the hypotenuse is equal to one of the other sides of the right triangle. Therefore the fold / has only one intersection with the parabola and must be a tangent. o

and the fold /. Apsq = Ap’sq so pq = p'q = c.

10.7 Axiom 7

Axiom 10.8 Given a point and two lines /; and Jy, there is a fold / that places onto /; and is perpendicular to l2 (Fig. 10.13).

The fold is the geometric locus of all points on the line perpendicular to l2 and equidistant from and p{, the reflection of onto l4. Derivation of the equation of the fold: Let = (x1, y1), let l4 be y = mıx + and let l2 be y = m2x + bp. Let lp be the line containing pipi. Since / L h, lp L l, it follows that /,, || /2 and the equation of lp is y = m2x + bp.

10.7 Axiom 7

Fig. 10.13 Axiom 7

129

lp passes through p; so = m2x,+b, and its equation is y = max + (yı m2x1).

The reflection pi = (x1, y|) is the intersection of l; and lp:

, , mıx + by = mxi + (yı 2x1) , _ yı- Mx, by xy = m2 $ , yy = mx, +b).

The equation of the midpoint Pm = (Xm, Ym) Of Lp is:

( xi +x yı+yí Xm, Ym 2 > 2 l 1 l and it passes through pm so its equation is: 1 y =-—x + bm, m2 1 where bm can be computed from y = —-—x + bm: mz

Xm bm = Ym +t —. m2

130 10 The Axioms of Origami

The equation of the fold / is therefore:

Example 10.13 Let = (5,3), let l; be y = 3x 3 and let l2 be y = —x + 11. Then: p 3-(-1)-5-(-3) ll . E =

' 3=(-1) 4 ,, Ul 21 eer) ar ll, 21 Se ee eee Be ye eS

What Is the Surprise?

Origami, the art of paper folding, has been practiced for hundreds of years, so it is surprising that the mathematical formalization goes back only to the twentieth century. It is even more surprising that there is an axiomatization of paper folding. The mathematics of origami is an excellent way to learn analytic geometry, properties of parabolas and the concept of geometric locus.

Sources

The axioms of origami are presented in [56]. Lang [26] gives descriptions of origami constructions. [31, Chap. 10] contains the detailed theory of the mathematics of origami, including the proof that two parabolas can have zero, one, two or three common tangents. The proof of Thm. 10.7 was shown to me by Oriah Ben-Lulu. I found that geometric software like Geogebra is useful for understanding the relation between the geometry and the algebra of the axioms.

A clear presentation of cubic equations can be found in [6, Chapters 1, 2].

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Check for

Chapter 11 in Lill’s Method and the Beloch Fold

11.1 A Magic Trick

Construct a path consisting of four line segments {a3 = 1, az = 6, a; = 11, ap = 6}, starting from the origin along the positive direction of the x-axis and turning 90° counterclockwise between segments. Construct a second path as follows: construct a line from the origin at an angle of 63.4° and mark its intersection with az by P. Turn left 90°, construct a line and and mark its intersection with a; by Q. Turn left 90° once again, construct a line and note that it intersects the end of the first path at (-10, 0) (Fig. 11.1).

Fig. 11.1 A magic trick

© The Author(s) 2022 131 M. Ben-Ari, Mathematical Surprises, https://doi.org/10.1007/978-3-03 1-13566-8_11

132 11 Lill’s Method and the Beloch Fold

Compute the negation of the tangent of the angle at the start of the second path: tan 63.4° = —2. Substitute this value into the polynomial whose coefficients are the lengths of the segments of the first path:

p(x) = a3x? + aox? + ax + ao =x + 6x7 +11x+6 p(—tan 63.4°) = (-2)? + 6(-2)? + 11(-2) +6 =0.

We have found a root of the cubic polynomial x? + 6x? + 11x + 6! Let us continue the example. The polynomial p(x) = x? + 6x? + 11x +6 has three roots —1, —2, —3. Compute the arc tangent of the negation of the roots:

æ =—tan7!(-1) = 45°, B=-tan7!(-2) ~ 63.4°, y =—tan!(-3) ~ 71.6°.

For each angle the second path intersects the end of the first path (Fig. 11.2).

The value tan 56.3 ~ —1.5 is not a root of the equation. Fig. 11.3 shows the result of the application of the method for this angle. The second path does not intersect the line segment for the coefficient ap at (—10, 0).

This example demonstrates a method discovered by Eduard Lill in 1867 for graphically finding the real roots of any polynomial. We are not actually finding the roots but verifying that a given value is a root.

Section 11.2 presents a formal specification of Lill’s method (limited to cubic polynomials) and gives examples of how it works in special cases. A proof of the

Fig. 11.2 Lill’s method for the three roots of the polynomial

11.2 Specification of Lill’s Method 133

Fig. 11.3 A path that does not lead to a root

correctness of Lill’s method is given in Sect. 11.3. Section 11.4 shows how the method can be implemented using origami Axiom 6. This is called the Beloch fold and preceded the formalization of the axioms