参考:https://docs.python.org/3.6/library/textwrap.html
textwrap模块提供了一些方便的函数,以及TextWrapper类,它执行所有的工作。如果您只是包装或填充一个或两个文本字符串,方便的函数应该足够好;否则,为了提高效率,应该使用TextWrapper实例。
1.textwrap.
wrap
textwrap.wrap(text, width=70, **kwargs)
用文本text(字符串)包装单个段落,因此每行最多是width长度的字符。返回一个输出行列表,不包含最终的换行。如果包装的输出没有内容,则返回的列表为空。
可选关键字参数对应于TextWrapper的实例属性,如下所示。width默认为70。举例:#-*- coding: utf-8 -*-import textwraptext = "Hello there, how are you this fine day? I'm glad to hear it!"result = textwrap.wrap(text,12)print(result)
返回:
(deeplearning) userdeMBP:pytorch user$ python test.py['Hello there,', 'how are you', 'this fine', "day? I'm", 'glad to hear', 'it!']
从上面的结果可以看出来wrap的作用是将text封装成一段段长度小于12的段落。如果加上下一个单词长度就超过width的话就不添加下一个单词,在这里不会将单词截断
如果text为空:
#-*- coding: utf-8 -*-import textwraptext = ""result = textwrap.wrap(text,12)print(result)
返回:
(deeplearning) userdeMBP:pytorch user$ python test.py[]
2.textwrap.
fill
textwrap.fill(text, width=70, **kwargs)
将单个段落包装在文本text中,并返回包含已包装段落的单个字符串。fill()其实是下面的缩写:
"\n".join(wrap(text, ...))
特别是,fill()接受与wrap()完全相同的关键字参数。
举例:
#-*- coding: utf-8 -*-import textwraptext = """\This is a paragraph that already hasline breaks. But some of its lines are much longer than the others,so it needs to be wrapped.Some lines are \ttabbed too.What a mess!"""print('------------ after wrap------------')wrap_result = textwrap.wrap(text, 45)print(wrap_result)fill_result = textwrap.fill(text, 45)print('------------ after fill------------')print(fill_result)print('------------ compare --------------')print('\n'.join(wrap_result) == fill_result)
返回:
(deeplearning) userdeMBP:pytorch user$ python test.py------------ after wrap------------['This is a paragraph that already has line', 'breaks. But some of its lines are much', 'longer than the others, so it needs to be', 'wrapped. Some lines are tabbed too. What a', 'mess!']------------ after fill------------This is a paragraph that already has linebreaks. But some of its lines are muchlonger than the others, so it needs to bewrapped. Some lines are tabbed too. What amess!------------ compare --------------True
可见fill()的作用和wrap()相反,它的作用就是在段落之间添加'\n'然后将其重新变成一个文本
3.textwrap.
shorten
textwrap.shorten(text, width, **kwargs)
折叠并截断给定的文本以适应给定的宽度。
首先,文本中的空格被折叠(所有空格都被单个空格替换)。如果结果与宽度width相符,则返回。否则,从末尾删除足够多的单词,使剩余的单词加上占位符符合宽度:>>> import textwrap>>> textwrap.shorten("hello world!", width=12)'hello world!'>>> textwrap.shorten("hello world!", width=11)'hello [...]'>>> textwrap.shorten("hello world!", width=10, placeholder="...")'hello...'
可选关键字参数对应于TextWrapper的实例属性,如下所示。注意,在将文本传递给TextWrapper fill()函数之前,空格是折叠的,因此更改tabsize、expand_tabs、drop_whitespace和replace_whitespace的值不会产生任何影响。
New in version 3.4.
4.textwrap.
dedent
textwrap.dedent(text)
从文本的每一行中删除任何常见的前导空格。
这可以用来使三引号的字符串与显示的左边缘对齐,同时仍然以缩进的形式在源代码中显示它们。注意,制表符和空格都被视为空格,但它们并不相等:“hello”和“\thello”行被认为没有公共的前导空格。这个方法是用来移除缩进
>>> parser = argparse.ArgumentParser(... prog='PROG',... formatter_class=argparse.RawDescriptionHelpFormatter, ... description=textwrap.dedent('''\ ... Please do not mess up this text! ... -------------------------------- ... I have indented it ... exactly the way ... I want it ... ''')) >>> parser.print_help() usage: PROG [-h] Please do not mess up this text! -------------------------------- I have indented it #会删掉与please不一行的前导空格 exactly the way I want it optional arguments: -h, --help show this help message and exit
如果没有textwrap.dedent(text)。返回:
>>> parser = argparse.ArgumentParser(... prog='PROG',... formatter_class=argparse.RawDescriptionHelpFormatter,... description='''\ ... Please do not mess up this text! #缩进都不会被删除... --------------------------------... I have indented it... exactly the way... I want it... ''') #这里也有一个缩进,会导致下面会有两行空行>>> parser.print_help()usage: PROG [-h] Please do not mess up this text! -------------------------------- I have indented it exactly the way I want it#这行是缩进生成的空行 optional arguments: -h, --help show this help message and exit
例子:
#-*- coding: utf-8 -*-import textwrapdef test(): # end first line with \ to avoid the empty line! s = '''\ hello world ''' print(repr(s)) print(repr(textwrap.dedent(s)))if __name__ == '__main__': test()
返回:
(deeplearning) userdeMBP:pytorch user$ python test.py' hello\n world\n ''hello\n world\n'
从结果可见会删掉的是前导空格,即s左边缘对其后左边多出的部分
5.textwrap.
indent
textwrap.indent(text, prefix, predicate=None)
在文本中选定行的开头添加前缀。
通过调用text.splitlines(True)分隔行。默认情况下,前缀被添加到所有不完全由空格组成的行(包括任何行尾)。>>> s = 'hello\n\n \nworld'>>> textwrap.indent(s, '-----')'-----hello\n\n \n-----world'
可选谓词参数可用于控制缩进哪些行。
例如,即使空行和空白行也很容易添加前缀:
>>> print(textwrap.indent(s, '+ ', lambda line: True))+ hello+ + + world
New in version 3.3.
6.class textwrap.TextWrapper
通过创建一个TextWrapper实例并在其上调用一个方法,wrap()、fill()和shorten()可以完成工作。该实例没有被重用,因此对于使用wrap()和/或fill()处理许多文本字符串的应用程序,创建自己的TextWrapper对象可能更有效。
文本text最好是在空格和连字符后面加上连字符;只有在必要的时候,长单词才会被打断,除非是TextWrapper.break_long_words被设置为false。textwrap.TextWrapper(**kwargs)
TextWrapper构造函数接受许多可选关键字参数。每个关键字参数对应一个实例属性,例如
wrapper = TextWrapper(initial_indent="* ")
等价于:
wrapper = TextWrapper()wrapper.initial_indent = "* "
您可以多次重用同一个TextWrapper对象,并且可以通过在使用之间直接分配实例属性来更改它的任何选项。
TextWrapper实例属性(以及构造函数的关键字参数)如下:-
(默认值:70)包装线的最大长度。只要输入文本中没有单个单词超过宽度,TextWrapper就保证输出行不会超过宽度字符。
width
-
(默认值:True)如果为真,那么文本中的所有制表符都将使用text的expandtabs()方法扩展为空格。
expand_tabs
-
(默认值:8)如果expand_tabs为真,那么文本中的所有制表符将根据当前列和给定的制表符大小扩展为零或多个空格。其实就是tab制表符大小一般为4个空格大小,如果想要将tab的的大小扩展,变成8、16、32等个空格的大小,就可以使用这个定义
举例:
#-*- coding: utf-8 -*-import textwraptext = "\tTest\tdefault\t\ttabsize."wrapper = textwrap.TextWrapper(80)tabsize_without = wrapper.wrap(text)print(tabsize_without)
返回:
(deeplearning) userdeMBP:pytorch user$ python test.py[' Test default tabsize.']
开头的tab大小为8,其他的就根据情况扩展,第二个为4个空格,第三个为9个空格。等价于设置tabsize=8
如果设置为tabsize=16,返回:
(deeplearning) userdeMBP:pytorch user$ python test.py[' Test default tabsize.']
实在不明白后面的制表符的大小是怎么个扩展的??????
New in version 3.3.
tabsize
-
(默认值:True)如果为真,则在展开制表符后,但在包装之前,wrap()方法将用单个空格替换每个空格字符。替换的空白字符如下:制表符、换行符、垂直制表符、formfeed和回车符('\t\n\v\f\r')。
⚠️如果expand_tabs为false, replace_whitespace为true,则每个制表符将被一个空格替换,这与制表符展开不同。
举例:
#-*- coding: utf-8 -*-import textwraptext = "\tTest\tdefault\t\ttabsize."wrapper = textwrap.TextWrapper(80,expand_tabs=False, replace_whitespace=True)result = wrapper.wrap(text)print(result)
返回:
(deeplearning) userdeMBP:pytorch user$ python test.py[' Test default tabsize.']
可见这是将空格字符,这里是制表符都使用一个空格来替换,而不是像上面进行制表符扩展
⚠️如果replace_whitespace为false,可能会在一行中间出现新行,从而导致奇怪的输出。因此,文本应该被分成段落(使用struts .splitlines()或类似的段落),并单独包装举例:
#-*- coding: utf-8 -*-import textwraptext = "\tTest\tdefault\t\ttabsize."wrapper = textwrap.TextWrapper(80,expand_tabs=False, replace_whitespace=False)result = wrapper.wrap(text)print(result)
返回:
(deeplearning) userdeMBP:pytorch user$ python test.py['\tTest\tdefault\t\ttabsize.']
replace_whitespace
-
(默认值:True)如果为真,则删除每行开头和结尾的空格(换行后但缩进前)。但是,如果后面没有空格,则不删除段落开头的空格。如果正在删除的空白占整行,则整行将被删除。
举例:
#-*- coding: utf-8 -*-import textwraptext = " This is a sentence with much whitespace."wrapper = textwrap.TextWrapper(10, drop_whitespace=False)result = wrapper.wrap(text)print(result)
返回:
(deeplearning) userdeMBP:pytorch user$ python test.py[' This is a', ' ', 'sentence ', 'with ', 'much white', 'space.']
如果为True的返回值为:
[' This is a', 'sentence', 'with', 'much white', 'space.']
可见能够保留空格,段落开头的空格都不会删除
drop_whitespace
-
(默认值:")该字符串将被前缀到包装输出的第一行。计算第一行的长度。空字符串没有缩进
举例:
#-*- coding: utf-8 -*-import textwraptext = "This is a short line."wrapper = textwrap.TextWrapper(10, initial_indent="(1) ")initial_with = wrapper.wrap(text)print(initial_with)
返回:
(deeplearning) userdeMBP:pytorch user$ python test.py['(1) This', 'is a short', 'line.']
可见只会在第一样添加前缀
initial_indent
-
(默认值:")字符串,该字符串将被前缀到除第一行外的所有包装输出行。除第一行外,计算每一行的长度。
举例:
#-*- coding: utf-8 -*-import textwraptext = "This is a short line."wrapper = textwrap.TextWrapper(10, subsequent_indent="(1) ")subsequent_with = wrapper.wrap(text)print(subsequent_with)
返回:
(deeplearning) userdeMBP:pytorch user$ python test.py['This is a', '(1) short', '(1) line.']
可见前缀加到了除了第一行以外的其他行
subsequent_indent
-
(默认值:False)如果为真,TextWrapper将尝试检测句子的结尾,并确保始终使用两个空格分隔句子。这通常适用于单步字体的文本。然而,句子检测算法并不完美:它假定一个句子的结尾由一个小写字母后面跟一个'.','!'或者'?',但可能后面跟一个'"'或"'",或者后面跟一个空格。这个算法的一个问题是它无法检测出"Dr.",如:
fix_sentence_endings
[...] Dr. Frankenstein's monster [...]
也无法检测出"Spot."
[...] See Spot. See Spot run [...]
-
因为句子检测算法依赖于字符串。小写字母用于“小写字母”的定义,以及在同一行上使用句号后的两个空格分隔句子的惯例,这是特定于英语文本的。
举例:
#-*- coding: utf-8 -*-import textwraptext = """\This is a paragraph that already hasline breaks. But some of its lines are much longer than the others,so it needs to be wrapped.Some lines are \ttabbed too.What a mess!"""wrapper = textwrap.TextWrapper(45, fix_sentence_endings=True)print(text)print('------------ after ------------')result = wrapper.wrap(text)print(result)
返回:
(deeplearning) userdeMBP:pytorch user$ python test.pyThis is a paragraph that already hasline breaks. But some of its lines are much longer than the others,so it needs to be wrapped.Some lines are tabbed too.What a mess!------------ after ------------['This is a paragraph that already has line', 'breaks. But some of its lines are much', 'longer than the others, so it needs to be', 'wrapped. Some lines are tabbed too. What a', 'mess!']
如果没有使用fix_sentence_endings=True的返回是:
['This is a paragraph that already has line', 'breaks. But some of its lines are much', 'longer than the others, so it needs to be', 'wrapped. Some lines are tabbed too. What a', 'mess!']
可见两者是有不同的,使用了的保证了'wrapped. Some...'和'too. What'这两个的判断为句子结尾的换行处是使用两个空格来分隔的
-
(默认值:True)如果为真,那么大于宽度的单词将被打破,以确保没有行大于宽度。如果它是False的,长单词就不会被打断,有些行可能比宽度还长。(为了尽量减少超出宽度的部分,长篇大论的文字会被单独排在一行。)
举例:
当为true时,对连字符生成的单词进行打断:
#-*- coding: utf-8 -*-import textwraptext = ("this-is-a-useful-feature-for-" "reformatting-posts-from-tim-peters'ly")print(text)wrapper = textwrap.TextWrapper(30)result = wrapper.wrap(text)print(result)
返回:
(deeplearning) userdeMBP:pytorch user$ python test.pythis-is-a-useful-feature-for-reformatting-posts-from-tim-peters'ly['this-is-a-useful-feature-for-', 'reformatting-posts-from-tim-', "peters'ly"]
当为false时效果如下,可见能够用来得到以连字符分隔的单词,并保存连字符:
#-*- coding: utf-8 -*-import textwraptext = ("this-is-a-useful-feature-for-" "reformatting-posts-from-tim-peters'ly")print(text)wrapper = textwrap.TextWrapper(1, break_long_words=False)result = wrapper.wrap(text)print(result)
返回:
(deeplearning) userdeMBP:pytorch user$ python test.pythis-is-a-useful-feature-for-reformatting-posts-from-tim-peters'ly['this-', 'is-', 'a-', 'useful-', 'feature-', 'for-', 'reformatting-', 'posts-', 'from-', 'tim-', "peters'ly"]
可见上面的返回值都大于1。这里如果设为True,只会返回所有字符长度为1的列表
break_long_words
-
(默认值:True)如果为真,则最好在空格和复合词中的连字符后面进行换行,这在英语中很常见。如果为false,则只有空白空间被认为是潜在的换行好位置,但是如果您想要真正的可插入的单词,则需要将break_long_words设置为false。在以前的版本中,默认的行为总是允许打断带连字符的单词。
举例:
设置为True时:
#-*- coding: utf-8 -*-import textwraptext = "yaba daba-doo"wrapper = textwrap.TextWrapper(10)result = wrapper.wrap(text)print(result)
返回:
(deeplearning) userdeMBP:pytorch user$ python test.py['yaba daba-', 'doo']
设置为False时:
#-*- coding: utf-8 -*-import textwraptext = "yaba daba-doo"wrapper = textwrap.TextWrapper(10, break_on_hyphens=False)result = wrapper.wrap(text)print(result)
返回:
(deeplearning) userdeMBP:pytorch user$ python test.py['yaba', 'daba-doo']
break_on_hyphens
-
(默认值:None)如果没有,那么输出将包含在应该的max_lines行中,占位符出现在输出的末尾。
举例:
#-*- coding: utf-8 -*-import textwraptext = "Hello there, how are you this fine day? I'm glad to hear it!"wrapper = textwrap.TextWrapper(12, max_lines=0)result = wrapper.wrap(text)print(result)
返回:
(deeplearning) userdeMBP:pytorch user$ python test.py['Hello [...]']
#-*- coding: utf-8 -*-import textwraptext = "Hello there, how are you this fine day? I'm glad to hear it!"wrapper = textwrap.TextWrapper(12, max_lines=1)result = wrapper.wrap(text)print(result) #返回['Hello [...]']
#-*- coding: utf-8 -*-import textwraptext = "Hello there, how are you this fine day? I'm glad to hear it!"wrapper = textwrap.TextWrapper(12, max_lines=2)result = wrapper.wrap(text)print(result) #返回['Hello there,', 'how [...]']
#-*- coding: utf-8 -*-import textwraptext = "Hello there, how are you this fine day? I'm glad to hear it!"wrapper = textwrap.TextWrapper(13, max_lines=2)result = wrapper.wrap(text)print(result) #返回['Hello there,', 'how are [...]']
how are [...]的长度为13
#-*- coding: utf-8 -*-import textwraptext = "Hello there, how are you this fine day? I'm glad to hear it!"wrapper = textwrap.TextWrapper(12, max_lines=6)result = wrapper.wrap(text)print(result) #返回['Hello there,', 'how are you', 'this fine', "day? I'm", 'glad to hear', 'it!']
该结果等价于不设置max_lines
New in version 3.4.
max_lines
-
(默认值:'[…]')字符串,如果已被截断,该字符串将出现在输出文本的末尾。
举例:
#-*- coding: utf-8 -*-import textwraptext = "Hello there, how are you this fine day? I'm glad to hear it!"wrapper = textwrap.TextWrapper(12, max_lines=0, placeholder='***')result = wrapper.wrap(text)print(result) #返回['Hello***']
New in version 3.4.
placeholder
TextWrapper还提供了一些公共方法,类似于模块级的便利函数:
-
将单个段落用文本text(字符串)包装起来,这样每一行的宽度都不超过字符的长度。所有包装选项都取自TextWrapper实例的实例属性。返回一个输出行列表,不包含最终的换行。如果包装的输出没有内容,则返回的列表为空。
举例:
#-*- coding: utf-8 -*-import textwraptext = "Hello there, how are you this fine day? I'm glad to hear it!"result = textwrap.wrap(text,4)print(result)print('------------- compare --------------')wrapper = textwrap.TextWrapper(4)result1 = wrapper.wrap(text)print(result1 == result)
返回:
(deeplearning) userdeMBP:pytorch user$ python test.py['Hell', 'o th', 'ere,', 'how', 'are', 'you', 'this', 'fine', 'day?', "I'm", 'glad', 'to', 'hear', 'it!']------------- compare --------------True
wrap
(text) -
将单个段落包装在文本text中,并返回包含已包装段落的单个字符串。
fill
(text)