Introduction ============ FormEncode is intended to be a general-purpose framework. It should make the easy stuff easy, and the hard stuff possible. Challenges ---------- The challenge of a high-level application library is to make something that is easy to start with, and yet complete and customizable enough to satisfy all your needs. In this case, how to make a library that can be used for both these forms: *Show a simple form with one-field-per-row.* *Show a complicated form with more layout, repeating fields, advanced widgets* The basic problems are common to other application libraries. There's two pitfalls: libraries that cannot be scaled to more complex use, and libraries that are too complicated to get started with or too complicated to integrate with your application (and you other libraries). People frequently put effort into these problems -- typically reimplementing a complex library you've become dissatisfied with, or adding features or wrappers to a library that doesn't fit all your needs. This is often a case of jumping from the frying pan into the fire, though the person who has escaped one trap to fall into another will often not realize it until the library gets wider use. FormEncode's History -------------------- FormEncode is preceded by FunFormKit, a Webware-specific form generation and validation library. FunFormKit possessed many of the features that FormEncode now has, but was itself structurally naive, and ultimately unmaintainable. As such, FormEncode is informed by a past failure, and avoids many mistakes as a result. Hello World ----------- You don't want to introduce complicated concepts before the user can create a functional application and use your library. Python's hello world looks like:: print "Hello World!" This is an extremely valuable feature of Python! A FormEncode hello world looks like:: # Framework boilerplate :( import cgi form = cgi.FieldStorage() print 'Content-type: text/html\n' import formencode class NameForm(formencode.Schema): name = formencode.htmlview.StringField(notEmpty=True) submit = formencode.htmlview.SubmitButton() successful, data = NameForm.process(form) if successful: print 'Hello %s!' % data['name'] else: print NameForm.html(action='', errors=data, httpRequest=form) Sigh; not as easy as the first example, but still not too bad. Compare to doing it without any library:: import cgi import os form = cgi.FieldStorage() print 'Content-type: text/html\n' if form.getvalue('name'): print 'Hello %s!' % form['name'].value else: print '
' Here FormEncode is just about even. But the library-less way we code the form is very unscalable -- we can't easily add more fields with more validation tests, we can't use more sophisticated validation tests, and we are tied to a specific HTTP method (POST). Fixing these requires a bit more infrastructure:: import cgi import sys form = cgi.FieldStorage() print 'Content-type: text/html\n' errors = {} if form.getvalue('save'): if not form.getvalue('name'): errors['name'] = "Please enter a value" if not errors: print 'Hello %s!' % form.getvalue('name') sys.exit() print '' Now the FormEncode example looks better, and perhaps a bit easier to read as well (even if you aren't familiar with FormEncode). FormEncode will continue to pull out ahead as the form becomes more complex. Writing Your Own ---------------- Frequently people write unnecessary libraries, which encapsulate a pattern that is more easily used as simply a pattern. So one of the first things we should ask of a library -- especially when considering writing a new library -- is if the result is truly more effective than a documented pattern (keeping in mind that documenting a pattern is almost always easier than documenting a library, and doesn't require any programming and little support). In the second form of the FormEncode-less examples I show above, I use a basic pattern of form processing, one which I've personally used in the past and found adaptable and acceptable in many situations. Why not just use this pattern? If not everyone uses good patterns of form processing (and they don't), perhaps this is an education issue, not an issue for a library. The advantages of a self-written library: * You know it inside and out. * You don't have to worry about crossing library boundaries -- you control both the library and the application, and if changes to one (including changes in requirements) require changes to another that's okay. * By definition, it does just what you need and doesn't do what you don't need (at least if you practice self-restraint). These advantages are part of why there are a large number of in-house form libraries. A replacement has to justify itself in three ways: **It must solve difficult problems**: It can be dead-easy, but if it doesn't solve problems that are worth solving then the home-grown solution is always going to be easier to work with. (Though this seems obvious, this is not an uncommon problem with libraries) **It must not require people to reorganize their code**: If you have something *really* great, maybe you can get someone to organize their code just the way you think it should be. But if there are *two* libraries/frameworks that want you to reorganize your code, you'll be forced to choose. And most of the time you won't quite like either option. Also, in this case, you usually grow into needing a form library; you get along without one quite well during early development. You don't want to recode your application everytime you bring in a new technology. ** It must not require people to reorganize their thoughts**: Developers find their own style and ways of dealing with code organization. It's all too easy to codify every personal tendency into the framework. Many MVC web frameworks spend a lot of time doing this. But the most successful frameworks are often quite neutral in this area: things like PHP, ASP, ColdFusion. Worse is better. In the effort to create smaller libraries of code, this is particularly important, because it has to adapt to many environments since it is not an environment of its own. This discipline is perhaps advantageous, as it encourages loose coupling. Taking Control (or not) ----------------------- Frameworks frequently take control from the programmer. The framework becomes the top-level element, and then it is customized. This is the style:: import framework class MyThing(framework.Framework): def postSubmit(self): ... def onFailure(self): ... data = {blah blah} ... thing = MyThing() MyThing.run() Looking in on this sort of code, you don't know when (or if) the methods are called and you don't know where the data is used. Thorough documentation can resolve this, but in reality very few frameworks are thoroughly documented (especially in the open source world). In the end you must read through the code to understand what's going on (especially when things don't work like you think they should). This can happen in any library, but it happens earlier with a framework, and tends to be more painful. Frameworks have a tendency to **not do what you want** rather than simply **breaking**. This is because the framework calls on your code, so *your* expectations aren't always met. But *your* expectations are the most important, because it's your application. The initial generic form processor I wrote (FunFormKit) took over control like this, but for no good reason. (Advise: when you're writing a library for yourself, it's all too easy to confuse the flow of control in your application with the flow of control for the library, because they are written by the same person. So be cautious.) This control of flow is also common when the form processor is build into the framework itself, often under the guise of MVC development. FormEncode very specifically does not do this at the top level, so that you can use the form processing using your own conditionals and flow control, and integrate it into an application without making it into a FormEncode (tm) application. The validation still takes control from the programmer, as the control proces in validation delegation is one of FormEncode's most central features, and not easily distilled into a pattern. Delegate Everything Possible ---------------------------- Another way of thinking of this: *keep the interface as small as possible*. In this case I'm thinking about the interface of the objects that publicly interact in a form processing system: validation, HTML generators, the request, the enclosing form. An anti-example might be something like:: def processForm(form, validators): result = {} errors = {} for name, validatorList in validators.items(): if not isinstance(validatorList, list): validatorList = [validatorList] for validator in validatorList: if validator.required and not form.has_key(name): errors[name] = 'Required' continue if not form.get(name): result[name] = getattr(validator, 'default', None) continue if validator.maxLength is not None and \ len(form[name]) > validator.maxLength: errors[name] = "Too long" continue msg = validator.validateErrorMessage(name) if msg: errors[name] = msg continue value = validator.convert(form[name]) return result, errors Specifically in this example, we're accessing different options of the validator (and going like we are here, this function would be three times this length before we dealt with all the instances). This is a somewhat extreme example, but this can happen fairly frequently as an interface is grown organically -- it isn't necessarily apparent when you started what the minimal interface would really be. This is bad for a framework, because someone who wants to do something outside of what you are expecting will have to fit into this complicated control structure. Instead, you want to allow the validator to do all the work of any validation and conversion in one single well-defined method, with only the minimal logic in the framework itself. In FormEncode, we minimalize the interface and allow for delegation by using a fairly simple signature for validators: * toPython(value, state): returns the converted value (coming from outside) * fromPython(value, state): returns the internal value, converted to work in the outside world (e.g., an HTML form). Don't Be Too Minimal -------------------- By delegating everything possible, we allow for considerable customization through subclassing. Nevertheless, this should not be used as an excuse to piss off your users. That "anything is possible" is not enough -- remember that we are trying to make life easier. This means that typical cases like should be allowed for and made accessible. It shouldn't require the complicated composition or overriding of methods to do normal stuff, or to use a validator that is foreign to the system. FormEncode offers these through a standard set of keywords for the validators to make an item required, or to make it optional (even if the empty string would not normally validate), etc. This is also where composition of objects becomes important -- it's relatively easy to combine and compose objects, but difficult to subclass objects. Subclassing is only encouraged when making new kinds of validators, never to compose validators. So if, for instance, you want to make sure an email address is valid but also shorter than a certain length, you can do so by composing two validators (using ``All(Email(), MaxLength(50))``). This composition also happens behind the scenes, when complex input widgets have to transform and validate input; e.g., ``SecureHiddenField`` checks a signature, and if the signature is correct it throws it away and returns only the value; if not it is signaled as an error. Programmatic Accessibility -------------------------- One of the potential advantages of a formalized library with clear boundaries (over an ad hoc library coupled with an application) is that it can be a building block for even higher level abstractions. In order to make this possible, we must provide a programmatic interface for creating validation schemas and composing these objects. Several techniques are in conflict with programmatic interfaces: * Code generation -- this is unwieldy at first, and untenable when layering the abstractions. * Static configuration; while static configuration can be used to build objects, it is needlessly indirect. Static configuration (e.g., with XML) can easily be achieved through a programmatic interface, but an interface that expects static configuration will not necessarily allow for runtime reconfiguration, or will use a pull-style configuration, where the framework queries the configuration and a complete mock configuration has to be constructed. Some of the strategies that FormEncode uses to make it easier to use on a higher level: * Immutable ("Value") objects where possible, avoiding complicated setup procedures or hidden initialization. Once you've created the code for a form by hand, you know (almost) everything you need to know to create it programmatically. Everything except: * Though nested structures can be declared with classes to make the definition more aesthetically pleasing, these map directly to methods. Example:: class MyForm(Schema): name = StringCol() Is equivalent to:: MyForm = Schema() MyForm.addField('name', StringCol()) Allow Hacks ----------- When you really want to get something done, every principle of loose coupling feels like an fence deliberately put up to hinder your progress. FormEncode allows for a nearly free-form object with almost no defined interface that can be used in an application-specific manner. It's called "state" because it's vaguely intended to represent the general program state. It takes the place of global variable hacks, which are often hard to track and infeasible in a threaded environment. Many form processing frameworks use a single-value model like:: def validate(value): # return True or False This works perfectly well in an abstract application (and FormEncode almost never uses the state object in its included validators). But without any context, many application-specific validations won't be possible. For instance, a form of validation might be to ensure the user has permission to access something -- we don't allow for this specifically, but by putting the user object into the "state" we allow the end user to get at this information.