++++++++++++++ FormEncode 0.1 ++++++++++++++ 28 September 2003 Ian Bicking .. contents:: Introduction ============ FormEncode is a validation and form generation package. The validation can be used separately from the form generation. The validation works on compound data structures, with all parts being nestable. It is separate from HTTP or any other input mechanism. Prerequesites ------------- FormEncode uses PyProtocols_, an interface/adapter implementation. FormEncode requires Python 2.2+. .. _PyProtocols: http://peak.telecommunity.com/PyProtocols.html SQLObject_ uses some of the validation from FormEncode, but the tie between the two is intended to be loose. Currently neither requires the other, but in the future SQLObject will require FormEncode (but not the other way around). .. _SQLObject: http://sqlobject.org History ------- FormEncode is a rewrite of FunFormKit_. FunFormKit was developed for the `Webware for Python`_ framework. .. _FunFormKit: http://funformkit.sf.net .. _`Webware for Python`: http://webware.sf.net FunFormKit had some architectural flaws, many of which were caused by unnecessary complexity. FormEncode is a simplification of FunFormKit, with reduced coupling and explicit boundaries (e.g., separating validation from form generation). FormEncode is also written to be framework-neutral -- FunFormKit was close to being framework-neutral, but like some of the other unnecessary complexities of FFK, this potential neutrality was something that was only recognized sometime into the project. FunFormKit's greatest flaws came from a rewrite that added compound and repeating fields (expressed now with Schemas_ and the ForEach_ validators). By combining so many notions into the idea of a "field", FunFormKit became overwhelmed. Adding repeating and compound fields was a hack job, and I now feel it has to be written into the system. HTML generation became increasingly inaccessible except through the complexity of the underlying objects. FormEncode is a rethinking of the whole thing. Many pieces of code were taken from FunFormKit, especially a lot of the guts of FFK which were quite manageable -- but the metaphors and overall architecture has been changed. Using Validation ================ Validation and conversion happen simultaneously. Frequently you cannot convert a value without ensuring its validity, and validation problems can occur in the middle of conversion. The "Encode" in "FormEncode" refers to this conversion. The basic metaphor for validation is *toPython* and *fromPython*. In this context "Python" is meant to refer to "here" -- the trusted application, your own Python objects. The "other" may be a web form, an external database, an XML-RPC request, or any data source that is not completely trusted or does not map directly to Python's object model. *toPython* is the process of taking external data and preparing it for internal use, *fromPython* generally reverses this process (*fromPython* is usually the less interesting of the pair). The core of this validation process is two functions and an exception:: >>> from FormEncode import validators >>> validator = validators.Int() >>> validators.toPython(validator, "10") 10 >>> validators.toPython(validator, "ten") Traceback (most recent call last): File "", line 1, in ? File "FormEncode/validators.py", line 102, in toPython return validator.toPython(value, state) File "FormEncode/validators.py", line 247, in toPython self.validatePython) File "FormEncode/validators.py", line 231, in attemptConvert converted = convert(value, state) File "FormEncode/validators.py", line 844, in _toPython {}, value, state) FormEncode.validators.Invalid: Please enter an integer value. Value: 'ten' Sorry about the traceback. ``"ten"`` isn't a valid integer, so we get a ``validators.Invalid`` exception. Typically we'd catch that exception, and use it for some sort of feedback. Like:: >>> def getInteger(): ... while 1: ... try: ... value = raw_input('Enter a number: ') ... return validators.toPython(validator, value) ... except validators.Invalid, e: ... print e ... >>> getInteger Enter a number: ten Please enter an integer value. Value: 'ten' Enter a number: 10 10 ``Invalid`` exceptions generally give a good, user-readable error message about the invalidity of the input. Using the exception gets more complicated when you use compound data structures (dictionaries and lists), which we'll talk about later__. .. __: `Compound Validators`_ While validators have a ``toPython`` method, we typically use the ``toPython`` *function* because it handles adaptation_ and a few other small details. Adaptation allows an object to turn into another object based on interface (the ``FormEncode.interfaces.IValidator`` interface, in this case). This is a common situation with validation, as objects typically require validation, but aren't necessarily validators themselves -- you can add your own adapters to generate validators for your objects. Read more about this later__. .. __: `Writing Your Own Validator`_ We'll talk more about these individual validators later, but first we need something more motivating than integer conversion... .. _Schemas: Compound Validators ------------------- While validating single values is useful, it's only a *little* useful. Much more interesting is validating a set of values. This is called a *Schema* in FormEncode. For instance, imagine a registration form for a website. It takes the following fields, with restrictions: * firstName (not empty) * lastName (not empty) * email (not empty, valid email) * username (not empty, unique) * password (reasonably secure) * passwordConfirm (matches password) We express this as a *single* validator:: from FormEncode.Schema import Schema from FormEncode import validators class Registration(Schema): firstName = validators.String(notEmpty=True) lastName = validators.String(notEmpty=True) email = validators.Email(resolveDomain=True) username = validators.All(validators.PlainText(), UniqueUsername()) password = SecurePassword() passwordConfirm = validators.String() chainedValidators = [validators.FieldsMatch('password', 'passwordConfirm')] Like any other validator, a ``Registration`` instance can be used with the ``toPython`` and ``fromPython`` functions. The input should be a dictionary, with keys like ``"firstName"``, ``"password"``, etc. The validators will be applied to each of the values of the dictionary. *All* the values will be validated, so if there are multiple invalid fields you will get information about all of them. .. _preValidators: .. _chainedValidators: ``chainedValidators`` are validators that are run on the entire dictionary after other validation is done (``preValidators`` are applied before the schema validation). In this case ``validators.FieldsMatch`` checks that the value of the two fields are the same (i.e., that the password matches the confirmation). Since a Schema is just another kind of validator, you can nest these indefinitely, validating dictionaries of dictionaries. .. _ForEach: You can also validate lists of items using ``ForEach``. For example, let's say we have a form where someone can edit a list of book titles. Each title has an associated book ID, so we can match up the new title and the book it is for:: class BookSchema(Schema): id = validators.Int() title = validators.String(notEmpty=True) validator = validators.ForEach(BookSchema()) The ``validator`` we've created will take a list of dictionaries as input (like ``[{"id": "1", "title": "War & Peace"}, {"id": "2", "title": "Brave New World"}, ...]``). It applies the ``BookSchema`` to each entry, and collects any errors and reraises them. Of course, when you are validating input from an HTML form you won't get well structured data like this (we'll talk about that later__). .. __: `HTTP/HTML Form Input`_ Writing Your Own Validator -------------------------- There were some validators in the ``Registration`` example which aren't part of FormEncode. It's fairly easy to write your own validator. For instance:: class SecurePassword(validators.FancyValidator): min = 3 nonLetter = 1 letterRegex = re.compile(r'[a-zA-Z]') messages = { 'tooFew': 'Your password must be longer than %(min)i ' 'characters long', 'nonLetter': 'You must include at least %(nonLetter)i ' 'characters in your password', } def _toPython(self, value, state): # _toPython gets run before validatePython. Here we # strip whitespace off the password, because leading and # trailing whitespace in a password is too elite. return value.strip() def validatePython(self, value, state): if len(value) < self.min: raise Invalid(self.message("tooFew", min=self.min), value, state) nonLetters = self.letterRegex.sub('', value) if len(nonLetters) < self.nonLetter: raise Invalid(self.message("nonLetter", nonLetter=self.nonLetter), value, state) (Your validator can be simpler than this too, but this example shows what a fairly complete validator will look like) Now you can use ``SecureValidator()`` just like any validator. With all validators, any arguments you pass to the constructor will be used to set instance variables. So ``SecureValidator(min=5)`` will be a minimum-five-character validator. This makes it easy to also subclass other validators, giving different default values. There's two methods we use: ``validatePython`` and ``_toPython``. ``validatePython`` doesn't have any return value, it simply raises an exception if it needs to. It validates the value *after* it has been converted (by ``_toPython``). ``validateOther`` validates before conversion, but that is seldom used. ``FancyValidator`` offers some `special features`__ in ``toPython`` which is why we ``_toPython`` (with the underscore) which ``FancyValidator`` calls during ``toPython``. .. __: FancyValidator_ This is also our first introduction to ``state``, but that is better described later__. .. __: State_ Implementation of ``UniqueUsername`` is left to the reader. While ``SecurePassword`` is potentially reusable, ``UniqueUsername`` is more closely tied to your particular application, and the user API you are using. It is assumed and intended that you will write some of your own validators. The use of ``self.message(...)`` is meant to make the messages easy to format for different environments, and replacable (with translations, or simply with different text). Each string should have an identifier (``"min"`` and ``"nonLetter"`` in this example) and a default string (the second argument to ``self.message()``). The second argument to ``Invalid`` is a dictionary of values to substitute into the string. See Messages_ for more. Other Validator Usage --------------------- Validators use instance variables to store their customization information. You can use either subclassing or normal instantiation to set these. These are (effectively) equivalent:: plain = Regex(regex='^[a-zA-Z]+$') # and... class Plain(Regex): regex = '^[a-zA-Z]+$' plain = Plain() When dealing with nested validators this class syntax is often easier to work with, and better displays the structure. .. _FancyValidator: There are several options that most validators support (including your own validators, if you subclass from ``FancyValidator``): ifMissing: If you set this attribute and the field is missing, this value will be substituted. This only occurs when the validator is contained in a Schema, and the dictionary key is missing. ifInvalid: If you set this attribute and the validator fails, this value will be returned. notEmpty: If true, before anything else is tested the input is tested to see if it is empty (which is anything that evaluates to false). State ----- All the validators receive a magic ``state`` argument. And both the ``toPython()`` and ``fromPython()`` functions take this argument as well (though it defaults to None). It's used for very little in the validation system. It is primarily intended to be an object you can use to hook your validator into the context of the larger system. For instance, imagine a validator that checks that a user is permitted access to some resource. How will the validator know what user is logged in? State! Imagine you are localizing it, how will the validator know the locale? State! Whatever else you need to pass in, just put it in the state object as an attribute, then look for that attribute in your validator. .. _`protocol attribute`: As a special instance, the state object can also have a ``protocol`` attribute. If present and not None, this should be a string representing the protocol (like ``"http"``, ``"xmlrpc"``, ``"sql"``, etc). If a validator has a ``protocol`` value (which should be a list of strings) and the state's protocol is not in that list, then the validator will not be used. This way you can use the same validator with multiple protocols, selectively leaving out portions of process depending on the needs of the protocol (e.g., HTTP needs strings to be converted to integers, but XML-RPC has native integers). Also, during compound validation (a ``Schema`` or ``ForEach``) the state (if not None) will have the instance variables added to it. During a ``Schema`` (dictionary) validation the instance variable ``key`` and ``fullDict`` will be added -- ``key`` is the current key (i.e., validator name), and ``fullDict`` is the rest of the values being validated. During a ``ForEeach`` (list) validation, ``index`` and ``fullList`` will be set. Invalid Exceptions ------------------ Besides the string error message, ``Invalid`` exceptions have a few other instance variables: value: The input to the validator that failed. state: The associated state_. errorList: If the exception happened in a ``ForEach`` (list) validator, then this will contain a list of ``Invalid`` exceptions. Each item from the list will have an entry, either None for no error, or an exception. errorDict: If the exception happened in a ``Schema`` (dictionary) validator, then this will contain ``Invalid`` exceptions for each failing field. Passing fields not be included in this dictionary. .. _Messages: Messages, Language Customization -------------------------------- All of the error messages can be customized. Each error message has a key associated with it, like ``"tooFew"`` in the registration example. You can overwrite these messages by using you own ``messages = {"key": "text"}`` in the class statement, or as an argument when you call a class. Either way, you do not lose messages that you do not define, you only overwrite ones that you specify. Messages often take arguments, like the number of characters, the invalid portion of the field, etc. These are always substituted as a dictionary, that is, by name. So you will use placeholders like ``%(key)s`` for each substitution. This way you can reorder or even ignore placeholders in your new message. When you are creating a validator, you should use the ``message`` function, like:: messages = { 'key': 'my message (with a %(substitution)s)', } def validatePython(self, value, state): raise Invalid(self.message('key', substitution='apples'), value, state) HTTP/HTML Form Input -------------------- The validation expects nested data structures; specifically Schema and ForEach deal with these structures well. HTML forms, however, do not produce nested structures -- they produce flat structures with keys (input names) and associated values. FormEncode includes the module ``VariableDecode``, which allows you to encode nested dictionary and list structures into a flat dictionary. To do this it uses keys with ``"."`` for nested dictionaries, and ``"-int"`` for (ordered) lists. So something like: +--------------------+--------------------+ | key | value | +====================+====================+ | names-1.fname | John | +--------------------+--------------------+ | names-1.lname | Doe | +--------------------+--------------------+ | names-2.fname | Jane | +--------------------+--------------------+ | names-2.lname | Brown | +--------------------+--------------------+ | names-3 | Tim Smith | +--------------------+--------------------+ | action | save | +--------------------+--------------------+ | action.option | overwrite | +--------------------+--------------------+ | action.confirm | yes | +--------------------+--------------------+ Will be mapped to:: {'names': [{'fname': "John", 'lname': "Doe"}, {'fname': "Jane", 'lname': 'Brown'}, "Tim Smith"], 'action': {None: "save", 'option': "overwrite", 'confirm': "yes"}, } In other words, ``'a.b'`` creates a dictionary in ``'a'``, with ``'b'`` as a key (and if ``'a'`` already had a value, then that value is associated with the key ``None``). Lists are created by appending ``'-int'``, where they are ordered by the integer (the integers are used for sorting, missing numbers in a sequence are ignored). ``VariableDecode.NestedVariables`` is a validator that decodes and encodes dictionaries using this algorithm. It has the `protocol attribute`_ set to ``"http"`` so that it won't encode XML-RPC or other input (for which it is not required). You can use it with a Schema's `preValidators`_ attribute. Of course, in the example we use the data is rather eclectic -- for instance, Tim Smith doesn't have his name separated into first and last. Validators work best when you keep lists homogeneous. Also, it is hard to access the ``'action'`` key; storing the options (action.option and action.confirm) under another key would be preferable. .. _Adaptation: Adaptation and Interfaces ------------------------- FormEncode keeps all its interfaces in a module named ``interfaces``, as is the convention. Validators use an ``IValidator`` interface. Objects that can be adapted to ``IValidator`` can be used with the ``toPython`` and ``fromPython`` functions. Also, any object that has a ``validator`` attribute can be adapted to a validator. You can add your own validator adapters like:: import protocols from FormEncode.interfaces import IValidator class MyClass(object): .... def adaptMyClass(obj, protocol): # Create a validator from a MyClass instance... protocols.declareAdapter(adaptMyClass, [IValidator] forTypes=[MyClass]) Alternate Methods of Defining a Validation Scheme ------------------------------------------------- There aren't any right now, but I want there to be. The module ``DictCall`` experiments with this, but was written before the validation module -- it takes a dictionary of inputs and calls a function given that. It can use a docstring, function attributes, or the function signature to decode input. I'd be interested in other ideas of ways to encode a validator into other structures, hopefully using adaptation_ to make it easier. Using Form Generation ===================== As you can see, FormEncode's validation is quite usable on its own, and can be used without generating HTML forms -- it can be used with non-HTML/HTTP data sources (like SQL databases, as it is in SQLObject_, or with XML-RPC or SOAP input), and not every form is long enough or interesting enough to warrant form generation, but validation can still be useful (especially if `validator generation is simplified`__). .. __: `Alternate Methods of Defining a Validation Scheme`_ The form generation portion of this package should be considered as less mature than the validation. HTMLView -------- "Fields" and "validators" become somewhat vague and intermixed, mostly due to the flexibility of adaptation_. Validators can be annotated with a field (using the ``htmlView`` attribute), and fields can be annotated with validators (using the ``validator`` attribute). They can be used fairly interchangeably. HTMLView fields represent HTML fields, typically ```` fields. These are typical things like ``TextField`` and ``CheckboxField``, and more complicated things like an ``OrderingField`` which combines Javascript and a ``SELECT`` box to allow users to order the items.