Notes from tarpit reading: http://shaffner.us/cs/papers/tarpit.pdf
OOP: intentionally couple state to related behavior
No Silver Bullet, what makes software hard: Complexity, Conformity, Changeability, Invisibility
Dijkstra: testing is hopelessly inadequate… it can be used very effectively to show the presence of bugs but never to show their absence.
State hinders Informal Reasoning, reasoning about expected behavior from the inside, with knowledge of internal code.
Control Complexity: complexity due to having to be concerned about the order of events. Languages with explicit flow control make you think about this.
a = b + 3 c = d + 2 e = f * 4
No reason for this flow, but programmer has to over-specify it (and compilers have to go to lengths to know that the order can be safely ignored). Accidental complexity:
- artificial, totally ignorable ordering imposed on programmer
- compiler work is done to optimize it away
Note that these two forms of complexity only apply given the assumption that the above code is for an imperative language with guarantees about order of execution; Oz is an example of a programming language that didn’t specify this.
“Running a test in the presence of concurrency with a known initial state and set of inputs tells you nothing at all about what will happen the next time you run that very same test with the very same inputs and the very same starting state… and things can’t really get any worse than that.”
CLOS: Common Lisp Object System, with multiple dispatch (methods can specialize on any/all required arguments, unlike classic OOP single dispatch).
Problems with OOP encapsulation:
- access to state can still be spread all over the place, e.g. in presence of inheritance
- encapsulation strongly biased toward single-object constraints; not a lot of help in coordinating multiple object states
Identity and State
Object identity: in OOP, each object is considered uniquely identifiable regardless of attributes. This is intensional identity (as opposed to extensional where objects are the same if their attributes are). Intensional identity opposes relational algebra view of the world.
But OOP is complicated when mutability isn’t needed, and you add concepts like Value Objects where equality is based on values and not identity (in other words, it brings back extensional identity).
Immutable small objects whose equality is not based on identity. In int is an int forever, but objects can be mutated, so if you want an object that acts like, say, an int, you want a value object.
TL;DR C++ has copy-by-value because C does, hence both support value objects. Java has no native support for value objects, but you can get a functionally similar/equivalent thing by passing around references to immutable objects (VALJO, VALue Java Object, where all attributes of the obj are final and doesn’t contain other objects w mutable state).
Object Identity exists because state exists, and is a source of error due to mental switching between the meaning of equality. (TODO: why isn’t this a problem in Clojure? You have to mentally switch between values and atoms… how is that any different?).
Hmm I feel like there’s something I’m not quite grasping about what people mean when they say state. Seems like everything has state? I dunno.
Summary: conventional OOP suffers from state-derived and control-derived complexity.
Pure: Haskell. Impure: ML, in that it advocates avoiding but still permits state.
Referential transparency: an expression (e.g. function calls) are replaceable by a value… nope, i guess it’s just that a function called with the same args will always return the same value.
Still an implicit left-to-right sequencing of operations, but fortunately control flow like loops are avoided in favor of fold/map.
Kinds of state
When most people talk about state they really mean mutable state.
Stateful methods can be replaced by functions where the state is passed in, and a new version of the state is returned, with the expectation that the new state must be passed in again to future calls (because referential transparency means the same function invocation always returns the same thing).
This paper seems to distinguish b/w procedures and functions based on a procedure having state… it runs and can manipulate inner state. But a functional only returns values based on what was passed in and has new inner state.
BUT, this is just using FP to simulate state. In principle, you could build functional programs by just passing in the global god state of the app into every function and just chaining it along, which brings back the problem of single pool of global variables. Ref transparency++, ease of reasoning–. But this is an extreme example.
Argument: state increases modularity:
“Working within a stateful framework it is possible to add state to any component without adjusting the components which invoke it.”
With FP, you have ripple effects where adding “state” to a function means that state needs to be provided by a caller of a caller of a caller, etc, e.g. adding that extra state parameter.
It’s a tradeoff between hiding state and FP where you know exactly what will happen.
But in stateful PLs, you never know if there will be side-effects; you have to inspect all code to really know.
“As with the discipline of (static) typing, it is trading a one-off up-front cost for continuing future gains and safety (“one-off” because each piece of code is written once but is read, reasoned about and tested on a continuing basis).”
In other words, it’s convenient in the moment to have state (the one-off time you write that stateful code), but for all future readers/reasoners, they have to deal with the fallout of losing the guarantee of statelessness.
Monads are kind of a way to have the cake and it it too, but makes it possible to create a stateful sub-language within Haskell, even if technically it’s properly typed as such. But still, monads have been insufficient in helping widespread FP adoption.
State your axioms and desired goals, let the system build the formal proof for each solution. Prolog is seminal logic PL.
“It is worth noting that a single Prolog program can be both correct when read in the first way, and incorrect (for example due to non-termination) when read in the second.”
This is because Prolog axioms aren’t read as purely logical axioms but are applied sequentially, which is why the order of axioms can affect the outcome.
LTR and top to bottom dependencies exist. Also, extra-logical features such as cuts (which prevent a restriction of control flow, presumably to prevent non-termination) add complexity.
Oz gives you flexibility of control rather than depth-first prolog, but rather than sprinkling these into the code, it’s at a separate level; in other words, the way you execute code is configurable, not within the code itself (a contamination of control complexity).
Goal: determine origins of state, hope that most state turns out to be accidental.
All data is either directly provided to system (input) or derived.
Derived data is either immutable (used only for display) or mutable (because requirements specify user should update that data)
But just because all user input data is essential does not mean it must result in essential state. If it’s avoidable, it’s accidental state.
- there’s a possibility of referring to that data in the future: essential state
- there is no possibility (e.g. it’s used to cause some side effect but then can be forgotten)
Essential Derived Data - immutable
Always rederivable, accidental state if stored.
Essential Derived Data - mutable
I don’t understand this.
Accidental Derived Data
No caches, no stores of derived calculations of any kind. Result: all state is visible to the user (or tester) of the system, since (disallowed) caching is the main source of hidden state. If you’re not caching, then everything you calculate is presented to the user.
Control is accidental since it’s rare if ever that requirements are a view of execution.
Concurrency is accidental; assuming zero time computation, user doesn’t care whether something happens in sequence or parallel.
- state is required because most systems have state as part of their true essence (wtf does this mean?)
- control is accidental, but practically (and for efficiency purposes) it is needed, same goes w state (caching and what not)
- property-based: what is required rather than how. Includes algebraic approaches such as Larch and OBJ
- model-based / state-based: construct a model, often stateful, and specify how it must behave. Implies a stateful approach for how to solve the problem.
Sometimes ideal world approach (no accidental state, derive everything) does not best model program. Example: derived data is dependent on both series of user inputs over time, AND its own previous values. In such a case it can help to maintain accidental state (I don’t get this… why is this different than the conveniences of storing derived state that isn’t strictly historical?). Example: position of computer controlled opponent in interactive game: technically position is derivable as f(initialPosition, allMovementsSince), but “this is not the way it is most naturally expressed”. Go fuck yourself and your loose definition of what is natural.
Required Accidental Complexity:
- avoid explicit management of accidental state; instead: simply declare what accidental state should be use, leave to separate infrastructure to maintain.
- ease of expression: (e.g. position of computer controlled opponent)
- solution: pretend user is typing in this derived state, i.e. pretend that it is essential input
- structure of data
- manipulating data
- maintaining integrity and consistency of state
- insistence on separation b/w logical and physical layers of system
Data independence: app / logical model is separate from the data is actually stored
Structure: use of relations to represent all data Manipulation: a means to specify derived data Integrity: constraints Data Independence: clear separation is enforced b/w logical data and physical representation
Base Relations: raw tables Derived Relations: Views: defined in terms of other relations
Access path independence:
Relational structuring allows you to defer access paths (how the data will be queried, join, etc). Before relational model, you had to decide up front, e.g, whether employees would live inside top level departments, or departments within top level employees. This is the hierarchical approach. The network approach is a little better in that you can add cycles but in the end you’re still defining the primary retrieval requirements up front at the expense of not knowing what secondary/future retrieval requirements you’ll have. Again, joining is the canonical example.
OOP and XML suffer same hierarchical problems. Nesting. Who owns what. etc.
Manipulation: relational algebra
Restrict: unary operation for selecting subset (where clause?) Project: unary op which creates a new relation w various attributes removed (not added?) Product: cartesian product of tables (e.g. SELECT FROM foo, wat;) Union: binary operation, creates relation w all records in either arg relation Intersection: binary operation, creates a relation consisting of all records in both Difference: xor Join: binary operation construction all possible records that result from identical attrs Divide: ternary operation returning all records of first arg with occur in second arg associated w each record of 3rd arg (wat?
Functional Relational Programming
All essential state comes in the form of relations, and essential logic is expressed using relational algebra extended w pure user defined functions
Step 1, specify each of the following
Essential State: relational definition of stateful components Essential Logic: Derived-relation defitions, integrity constraints, and pure functions Accidental State and Control: declarative specification of a set of perf optimizations for the system Other: user and system interfaces for the outside world
Relations, tables, columns (the schema, not actual rows/records) Data should only be considered essential if directly input by user.
Pure functions about data transformation, set of db constraints.
Note that we ignore “denormalization for perf” for now because we’re just talking about formal specifications; the physical storage may or may not mirror what’s being specified here.
Accidental State and Control
Specify 1) what state should exist, 2) what physical storage mechanism used
- state-related hint: e.g. some derived-relvar should be stored rather than recalculated
- second kind of state-related hint: infrequently used subset of relvar should be stored elsewhere (e.g. partitioning a table)
- tweaking the evaluator
Declarative lets infrastructure optimize for you, e.g. avoid relational
intersection if it can be determined that two groups are mutually exclusive, not possible/easy with imperative.
Normalized relational everything: avoids subjective bias about data access paths. OOP and XML generally force you to do the opposite, choosing nestings ahead of time and other things.
Control is avoided relational approach (think of SQL; no order of evaluation; this is intentional).
Explicit parallelism is avoided, but allows for possibility for separated accidental control if required; whether it’s parallel or not shouldn’t matter to anyone other than implementor, i.e. if it really improves things, it’ll be parallel, but functionally it’s the same interface for infrastructure consumers.
Focus on true essentials avoids accidental complexity.
Creation of compound data types; to be avoided. Why:
Subjective: like baking in data paths in OOP/XML-ish representation of data, is brittle to future use cases. Pre-existing bias will force future use cases into inappropriate reuse of pre-established biased structures. (I like this; this is a source of refactors, when you know what you’re doing is gross because of some new use case)
Data Hiding: constructing giant objects often causes unneeded, irrelevant data to be supplied to function, and which data actually gets used is hidden at call site, hurts informal reasoning and testability. Avoiding composite objects helps avoiding this problem.
FRP (func rel) opens door to
- perf (decided by infrastructure)
- different dev teams focusing on different components (by components we mean accidental vs essential vs interfacing)… this arg seems weak, or i don’t really understand it
Can create limited types for essential state/logic components:
- disjoint union / enumeration types
- NO product types (types w subsidiary components)
Algebraic data type is a kind of composite type; type formed by combining other types. Product types (tuples and records), and sum types (disjoint unions or variant types). So you can have things like
Action = UserClickEvent | UserDragEvent | Blah
This is a sum type; the total values an Action could be are the total possible values of its variants, summed.
A Product type is, say, a type, e.g. (Int, String), where the total possible values are all the possible values of variants, multiplied (hence product).
Why sum and not product? because Sums don’t add new data types, really, they just categorize for pattern matching and other things. Whereas products create new compound datatypes.
Derived internal relations:
- RoomInfo, extend(Room, (roomSize = width*breadth))
- so it’s a room info with another type
- Acceptance takes accepted Decisions and strips away accepted bool. So an Acceptance is a Decision without an accepted flag. I like this because we’re keeping the domain simple; if we’re dealing with Acceptances, we don’t have to worry that it’s an acceptance that’s not an acceptance; the domain is constrained properly.
- Rejection is the opposite, but has the same attrs
Accidental state and control
This part is interesting because it suggests that defining relations (like you do when designing/committing a DB schema) is a premature accidental complexity. This paper suggests that you take your essential state types and hint that some of them should be cached. In relational databases, if you CREATE TABLE, you’re creating a cache. I guess this should be obvious. But the thing to note is that what this paper is suggesting is that there is a level above this at which we should be thinking. Everything below is accidental. Whether you CREATE TABLE or recompute on the fly is accidental. The user doesn’t care.
declare store PropertyInfo : create a cache / table for PropertyInfo rather than re-calc
declare store shared Room Floor : denormalize Room and Floor into shared storage structure (hmm, why? is this a join table?)
declare store separate Property (photo) : split out photo from other properties (perf hint).
TODO: read Kow79
Simple Made Easy
Classes as namespaces = bad.
Syntax is inferior to data.
Switching/pattern matching allegedly complects multiple pairs of who’s going to do something and what happens… how is this different than multi-methods?
variables complect value and time (they are state i guess)
for loops complecting by explicitly specifying how to do something.
folds still complect because they go from left to right…
Polymorphism a la cart, via Clojure protocols. 1) define data structures, 2) definitions of sets of functions, and 3) connect them together.
Favor declarative Prolog-ish logic programming to littering conditionals all over the place.
Resource contention is inherent complexity, not your fault.