Credit: Jürgen Hermann, Nick Perkins, Peter Cogolo
Problem
Given a set of characters to keep, you need to build a filtering function that, applied to any string s, returns a copy of s that contains only characters in the set.
Solution
The translate method of string objects is fast and handy for all tasks of this ilk. However, to call translate effectively to solve this recipe’s task, we must do some advance preparation. The first argument to translate is a translation table: in this recipe, we do not want to do any translation, so we must prepare a first argument that specifies “no translation”. The second argument to translate specifies which characters we want to delete: since the task here says that we’re given, instead, a set of characters to keep (i.e., to not delete), we must prepare a second argument that gives the set complement—deleting all characters we must not keep. A closure is the best way to do this advance preparation just once, obtaining a fast filtering function tailored to our exact needs:
import string
# Make a reusable string of all characters, which does double duty
# as a translation table specifying "no translation whatsoever"allchars = string.maketrans('', '')
def makefilter(keep):
""" Return a function that takes a string and returns a partial copy
of that string consisting of only the characters in 'keep'.
Note that `keep' must be a plain string.
"""
# Make a string of all characters that are not in 'keep': the "set
# complement" of keep, meaning the string of characters we must delete
delchars = allchars.translate(allchars, keep)
# Make and return the desired filtering function (as a closure)
def thefilter(s):
return s.translate(allchars, delchars)
return thefilter
if _ _name_ _ == '_ _main_ _':
just_vowels = makefilter('aeiouy')
print just_vowels('four score and seven years ago')
Python can use a plain string to hold either text or arbitrary bytes, and you need to determine (heuristically, of course: there can be no precise algorithm for this) which of the two cases holds for a certain string.
Solution
We can use the same heuristic criteria as Perl does, deeming a string binary if it contains any nulls or if more than 30% of its characters have the high bit set (i.e., codes greater than 126) or are strange control codes. We have to code this ourselves, but this also means we easily get to tweak the heuristics for special application needs:
from _ _future_ _ import division # ensure / does NOT truncate
You need to convert a string from uppercase to lowercase, or vice versa.
Solution
That’s what the upper and lower methods of string objects are for. Each takes no arguments and returns a copy of the string in which each letter has been changed to upper- or lowercase, respectively.
You have a string made up of multiple lines, and you need to build another string from it, adding or removing leading spaces on each line so that the indentation of each line is some absolute number of spaces.
Solution
The methods of string objects are quite handy, and let us write a simple function to perform this task:
You want to convert tabs in a string to the appropriate number of spaces, or vice versa.
Solution
Changing tabs to the appropriate number of spaces is a reasonably frequent task, easily accomplished with Python strings’ expandtabs method. Because strings are immutable, the method returns a new string object, a modified copy of the original one. However, it’s easy to rebind a string variable name from the original to the modified-copy value:
mystring = mystring.expandtabs( )
This doesn’t change the string object to which mystring originally referred, but it does rebind the name mystring to a newly created string object, a modified copy of mystring in which tabs are expanded into runs of spaces. expandtabs, by default, uses a tab length of 8; you can pass expandtabs an integer argument to use as the tab length.
Changing spaces into tabs is a rare and peculiar need. Compression, if that’s what you’re after, is far better performed in other ways, so Python doesn’t offer a built-in way to “unexpand” spaces into tabs. We can, of course, write our own function for the purpose. String processing tends to be fastest in a split/process/rejoin approach, rather than with repeated overall string transformations:
def unexpand(astring, tablen=8):
import re
# split into alternating space and non-space sequences