chores¶
Just about all programs process “items” of one sort or another. That’s what loops are for, right?
But with the exception of the current loop value or index, programming languages don’t help track how processing is going. How many items have been successfully processed? How many errors are there? How far along the total job are we right now? Which items had problems that need to be looked at later?
Even though these bookkeeping tasks are essential to just about every program, they’re “left to the reader.” “Here are some basic loops. Have fun!” So developers “reinvent the wheel,” tracking status with ad hoc containers, counters, and status flags for every new program. Not so high-level after all, huh?
chores
fights this needless complexity, errors, and effort by providing
a simple, repeatable pattern for processing items and tracking their status.
Usage¶
from chores import Chores
chores = Chores('Jones able baker charlie 8348 Smith Brown Davis'.split())
for c in chores:
status = 'name' if c.istitle() else 'other'
chores.mark(c, status)
print chores.count('name'), "names,", \
chores.count('^name'), "others"
Yields:
4 names, 4 others
Or if you decide you actually want more information, change just the output statements:
print todos.count('name'), "names:", todos.marked('name')
print todos.count('^name'), "others:", todos.marked('^name')
Now you get:
4 names: ['Jones', 'Smith', 'Brown', 'Davis']
4 others: ['able', 'baker', 'charlie', '8348']
Discussion¶
Many programs track the status of items being processed with various lists,
dictionaries, sets, counters, and status flags. chores
might not seem a
great advance at first, since it has the same kind of initialization and
looping.
But it gets more interesting at the end of the processing loop, where the summary or report of what was processed, the disposition of each item worked on, what items yielded errors or other conditions, and what special cases were handled is produced.
In the examples above, we never had to keep a counter of how many names were found,
or how many non-names. When we decided we wanted to change the output from
summary counts to a full listing, we didn’t have go back and collect
different information. We just differently displayed information already at
at hand. Also note that the order of the results is nicely maintained.
When we’re reviewing reports about “what transpired,” we don’t have to work
very hard to correlate the results with the inputs; unlike when using dict
and set
structures, items are reported on in the same order they arrived.
Typically a developer will start with only a little thought about various dispositions for each item being processed. Over time, she’ll start to realize: “I need to count those cases, so I can report on them!” Or, “I kept an error counter, but I really should have been keeping a list of which items broke, because I now have to tell the user not just how many went wrong, but which ones in particular.” Or “I need to keep track of which ones failed the main processing so that I can do more intensive processing on just those special cases.” Then she’ll go back and add counters, collection lists, and so on–adding a fair amount of ad hoc code that must be built, tested, and debugged.
This is especially tricky for data that needs to move through multiple stages or phases of work. The developer then has to add structures to communicate from earlier processing steps to later ones.
With chores
, there’s no need for such custom work. It takes over
tracking which items led to which outcomes. It’s always ready to render
quality information, either for reporting or for managing subsequent
processing. Bookkeeping information is readily available in
a tidy, logical format, with no additional development effort.
chores
especially shows its virtues as processing code becomes
more intricate and as program needs evolve over time.
Dropping Down¶
A Chores
is a specialized form of OrderedDict
used to
both loop over your items and to remember things about them
at the same time.
Each item must have a unique identifier. Ideally this would be a
human-readable name, but it can be as simple as an integer index. The second
most important attribute of each item is its “status.” Every item starts
with a default status of "new"
(though this is configurable with the
status
keyword argument when the Chores
is created).
It is not strictly necessary to loop over the Chores
. It will
“just” keep track of items regardless of what collection is being looped
over; it’s just often convenient to not have a separate collection. A
typical use case might be:
from chores import Chores
records = get_records_from_database() # external source
chores = Chores(rec.rowid for rec in records)
for c in chores:
try:
process_item(c)
chores.mark(c, 'done')
except Exception:
chores.mark(c, 'error')
tally = chores.tally()
print tally.done, "items completed,", tally.error, "errors"
if tally.error > 0:
print "ERRORS:", chores.marked('error')
Here the tally
method returns a counter (similar to collections.Counter
)
that counts how many of each status were seen.
Item Ids¶
Every item to be processed needs a unique identifier. Identifiers must be
hashable, so that they can serve as the key of a Python dict
. Strings such as
file paths or file names work well, though item numbers or tuples of strings
and numbers can also work. Good ids will be short and easily understood by those
running the program and analyzing its output.
For situations where you want to use titles or file names/paths as your keys, slugifying modules such as slugger, slugify, unicode-slugify, and python-slugify can help turn messy strings into tidy item ids.
In theory, the ids can be anything that makes a good dictionary key, but
in practice, any strings that include punctuation used by chores
as
status-selection meta-charactes (e.g. |
and ^
) is a bad idea;
including commas (,
) in your keys also not recommended.
Associated Data¶
If you need to carry data along with each item, todos[chore_id]
returns
an attribute-exposed dictionary that lets you fold information
related to each item into the status tracking. For example:
for t in todos:
todos[t].upcase = t.upper()
Or recast with a more dictionary idiom, dealing with full
Chore
objects not just a key:
for todo in todos.values():
todo.upcase = todo.id.upper()
In complex processing, there are always places you want
to associate “extra” information with each work item.
Here’s an easy way to do that. Tying supporting data
directly to each work item reduces the need to create or
manage auxiliary or “supporting” data structures.
(Just mind that you don’t overload the id
or status
attributes, which are used directly by chores
; using data
is also suspect.)
One example of where this might come in handy is error processing. For items that complete successfully, you might not need additional information. But for error conditions, you’ll want to know later why it failed. So:
for todo in todos:
try:
process_item(todo)
todo.mark('done')
except Exception as e:
todo.mark('error')
todo.errmsg = repr(e)
Now the error message is quickly appended for later inspection.
Selecting Items¶
chores
provides multiple mechanisms to select items.
The most important is the marked
method:
todos.marked('done')
Returns all items marked done. todo.count('done')
returns a
simple count of such items.
This “marked X” mechanism is the most common. It’s a bit more
general that it might first appear:
todos.marked(['done', 'partial'])
for example returns items marked either of those options. This is an inclusive or.
There
is also an exclude
keyword argument. So:
todos.marked(exclude='error')
Gives everything not marked as an error (which might be 'done'
, 'partial'
,
and 'other'
). The exclude
kwarg can also take an
iterable, to exclude multiple status tags. If no positive inclusion set is
provided, the exclusion is against the set of all possible markings.
The combination of include and exclude sets gives a very powerful selection mechanism.
There is also a simplified, less verbose form that depends on a string
specification. In this, the vertical bar (|
) stands for alternation.
E.g. The following are identical:
todos.marked('done|partial')
todos.marked(['done', 'partial'])
Exclusions can also be defined with the caret (^
), meaning not:
todos.marked('^done')
todos.marked(exclude='done')
Are identical, and return any items with statuses other than 'done'
.
Alternation can be used in either the inclusion or exclusion spec, with
two caveats: 1. The negation caret must be the first symbol, if present;
if used, all the alternatives are excluded. 2. String specifications can
be used in both the default selector and the exclude
kwarg, but if
the negation character is present in an exclude
argument, it only
adds up to single negation; it is not a fancy double-negative.
Other Methods¶
The keys
, values
, and items
methods work as though each
Chores
is a dictionary (which it more or less is). The “keys” are
the individual chore ids. The values are Chore
objects, which is
also an attribute-accessible kind of dict
,
including the status of each item and any user-defined values added to the
tracker.
One can loop over a Chores
collection, with each iteration getting an item
id as the loop value. You can use enumerate
with these loops as well:
for i, t in enumerate(todos, start=1):
print i, t
One divergence from standard Python loops is that items can be added to
a Chores
while looping over it.
todos = Chores(range(4)) for i, t in enumerate(todos, start=1):
- if t % 2 == 1 and t < 10:
- todos.add(len(todos), status=’dynamic’)
print i, t
Items are added at the end of the collection, and will be processed at the
end of the loop. This is an important feature, for example in
tasks like directory and web crawling where some work items discover
further work items that need to be done later. You should never
remove a work item from the collection. Mark it as 'dead'
, 'junk'
,
'invalid'
or some other dustbin status, but work items should never
be removed.
todos.statuses()
returns a list of all the current statuses (marks).
todos.tally()
returns a counter indicating how many of each kind of
status there are.
todos.bystatus()
returns a mapping of status to a list of the keys
/ ids associated with that status. Or, todos.bystatus(justkeys=False)
returns a similar mapping but with the values being full Chore
objects, not just their keys/ids.
Performance¶
chores
clearly adds some performance overhead to loops, because it’s
doing some additional work for every item processed. It’s therefore probably
not a good choice for the inner loops of performance-critical code or
numerical routines. But inner computational loops are not really what it’s
designed for.
chores
is intended first and foremost for the macro loops of utility
programs and applications. Here, the small additional overhead is
inconsequential. The real performance “cost” lies in the processing of each
element, not in a tiny bit of extra housekeeping.
The other cost–and the one chores
is most aimed at reducing–is
programming and debugging time. There is a typical assumption that the
housekeeping associated with application loops is “extra.” But that’s a
false assumption; most programs have to do at least some housekeeping
already.
So in many cases, any chores
performance overhead is nominal, and
well-compensated by the additional ease and correctness of high-function
program construction.
Notes¶
- I’ve successfully used
chores
in my own projects, and it has a real test suite. But realistically it should be considered “early beta” and/or “still experimental” code. Its API and mode of use will evolve.- In the future, it may be possible to assign multiple tags to each chore, rather than just a single status indicator. Currently, one status per item is it. A
Stage
class is also under development to create a reporting framework for multi-stage processing.- Automated multi-version testing managed with the wonderful pytest and tox. Successfully packaged for, and tested against, all late-model versions of Python: 2.6, 2.7, 3.2, 3.3, and 3.4, as well as PyPy 2.6.0 (based on 2.7.9) and PyPy3 2.4.0 (based on 3.2.5). Should run fine on Python 3.5, though py.test is broken on its pre-release iterations.
- The author, Jonathan Eunice or @jeunice on Twitter welcomes your comments and suggestions.
Installation¶
To install the latest version:
pip install -U chores
To easy_install
under a specific Python version (3.3 in this example):
python3.3 -m easy_install --upgrade chores
(You may need to prefix these with “sudo ” to authorize installation.)