collections——高性能容器數據類型

2019-11-14 17:06:13

字體：大中小

來源：轉載

供稿：網友

　　由于最近對機器學習算法感興趣，一直知道python有一個包collections封裝了一些比dict,list之類高級點的類，所以抽空研究下，為接下來的工作準備。

　　主要參考是https://docs.python.org/2/library/collections.html#defaultdict-objects官方的文檔，根據不高的英文水平翻譯和理解總結出來的，如果有錯誤歡迎提醒，萬一，您有興趣轉載的也請注明是@瓜棚

collections封裝的結構主要有5個:

###########################################################################################################################################Counter            *            字典(dict)的子類用來統計可哈希對象的值                                    *      new in py 2.7   *deque              *            雙端隊列，兩端都可以作為隊列的結束，方便前端插入需求                         *      new in py 2.4   *namedtuple         *            tuple的子類，可以用于按名字標識元組                                       *      new in py 2.6   *OrderedDict        *            dict的子類，創建一個可哈希的有序字典                                      *      new in py 2.7   *defaultdict        *            dict的子類，當某個key不存在時，一共一個默認值，而不是報KeyError              *      new in py 2.5   *

Counter類

example:

from collections import Countercnt = Counter()for Word in ['1','2','3','1','2','1']:    cnt[word] +=1cnt#Counter({'1':3,'2':2,'3':1})######################################統計一段話里出現次數最多的10個詞和各自的次數text = ['a', 'an', 'and', 'are', 'as', 'at', 'be', 'by', 'can','for', 'from', 'have', 'if', 'in', 'is', 'it', 'may','not', 'of', 'on', 'or', 'tbd', 'that', 'the', 'this','to', 'us', 'we', 'when', 'will', 'with', 'yet','you', 'your', '的', '了', '和','or', 'tbd', 'that', 'the', 'this','to', 'us', 'we', 'when', 'will','when']Counter(text).most_common(10)#[('when', 3), ('to', 2), ('we', 2), ('that', 2), ('tbd', 2), ('this', 2), ('us',2), ('will', 2), ('the', 2), ('or', 2)]

Counter類是dict的子類，接受參數可以是iterable或者mapping.Counter是一個無序的集合。

c = Counter()     #一個新的空的Counterc = Counter('kwejrkhdskf')    #以可迭代對象'kwejrkhdskf'為基礎創建Counterc = Counter({'red': 4, 'blue': 2})    #以mapping{'red': 4, 'blue': 2}為基礎創建Counterc = Counter(cats=4, dogs=8)    #以keywords args為基礎創建Counter

如果索引的key不存在，Counter類不會報一個KeyError,相應地，它擴展了dict類，如果索引的key不存在，則返回0，如果key存在，則返回對應的值。

>>> c = Counter(['eggs', 'ham','eggs'])>>> c['bacon'] 0    #在c.keys中不存在'bacon'，故返回0>>> c['eggs'] 2    #在c.keys中存在'bacon'，故返回對應的value

設置一個key的值為0，并不意味著把key從Counter中刪除，相應的用del 關鍵詞可以達到想要的效果。

>>> c['sausage'] = 0  #設置'sausage'的個數為0>>> del c['sausage']    #從Counter中刪除'sausage'

Counter的一些常用方法

 ##########elements()    Return an iterator over elements repeating each as many times as its count. Elements are returned in arbitrary order. If an element’s count is less than one, elements() will ignore it.##########most_common([n])Return a list of the n most common elements and their counts from the most common to the least. If n is omitted or None, most_common() returns all elements in the counter. Elements with equal counts are ordered arbitrarily:#########################subtract([iterable-or-mapping])Elements are subtracted from an iterable or from another mapping (or counter). Like dict.update() but subtracts counts instead of replacing them. Both inputs and outputs may be zero or negative.##########################sum(c.values())                 # 求和c.clear()                       # 重置list(c)                         # 轉換為listset(c)                          # 轉換為setdict(c)                         # 轉換為dictc.items()                       # 類似dict的items()c += Counter()                  # 刪除掉值為0和負數的count

Counter還提供加、減、與、或運算。

>>> c = Counter(a=3, b=1)>>> d = Counter(a=1, b=2)>>> c + d                       # 相加:  c[x] + d[x]Counter({'a': 4, 'b': 3})>>> c - d                       # 相減 (舍掉非整數值)Counter({'a': 2})>>> c & d                       # 取最小值:  min(c[x], d[x])Counter({'a': 1, 'b': 1})>>> c | d                       # 取最大值:  max(c[x], d[x])Counter({'a': 3, 'b': 2})

deque

example:

>>> from collections import deque>>> d = deque('ghi')                 # 創建實例>>> for elem in d:                   # 迭代d...     PRint elem.upper()GHI>>> d.append('j')                    # 在最右邊加>>> d.appendleft('f')                # 在最左邊加>>> d                                # showdeque(['f', 'g', 'h', 'i', 'j'])>>> d.pop()                          # 在右邊pop'j'>>> d.popleft()                      #在左邊pop'f'>>> list(d)                          # 轉換為list['g', 'h', 'i']>>> d[0]                             #用下標獲取元素'g'>>> d[-1]                            # 類比list語法'i'>>> list(reversed(d))                ['i', 'h', 'g']>>> 'h' in d                         True>>> d.extend('jkl')                  # 拼接一個可迭代對象>>> ddeque(['g', 'h', 'i', 'j', 'k', 'l'])>>> d.rotate(1)                      #順時針旋轉>>> ddeque(['l', 'g', 'h', 'i', 'j', 'k'])>>> d.rotate(-1)                     #逆時針旋轉>>> ddeque(['g', 'h', 'i', 'j', 'k', 'l'])>>> deque(reversed(d))               deque(['l', 'k', 'j', 'i', 'h', 'g'])>>> d.clear()                        # 清空>>> d.pop()                          # 沒有元素不可以popTraceback (most recent call last):  File "<pyshell#6>", line 1, in -toplevel-    d.pop()IndexError: pop from an empty deque>>> d.extendleft('abc')              # reverse input>>> ddeque(['c', 'b', 'a'])

del d[n] 相當于：

def del_(d,n):    d.rorate(-n)    d.popleft()    d.rorate(n)

一個有趣的例子是計算MA：

#####################算法：    例如：[40, 30, 50, 46, 39, 44] --> 40.0 42.0 45.0 43.0計算公式： MA = (C1+C2+C3+C4+C5+....+Cn)/n C 為收盤價，n 為移動平均周期數例如，現貨黃金的 5 日移動平均價格計算方法為： MA 5 = （前四天收盤價+前三天收盤價+前天收盤價+昨天收盤價+今天收盤價）/5 #####################  def moving_average(iterable, n=3):    # http://en.wikipedia.org/wiki/Moving_average    it = iter(iterable)    d = deque(itertools.islice(it, n-1))    d.appendleft(0)    s = sum(d)    for elem in it:        s += elem - d.popleft()        d.append(elem)        yield s / float(n)

namedtuple()

example:

>>> Point = namedtuple('Point', ['x', 'y'], verbose=True)class Point(tuple):    'Point(x, y)'    __slots__ = ()    _fields = ('x', 'y')    def __new__(_cls, x, y):        'Create a new instance of Point(x, y)'        return _tuple.__new__(_cls, (x, y))    @classmethod    def _make(cls, iterable, new=tuple.__new__, len=len):        'Make a new Point object from a sequence or iterable'        result = new(cls, iterable)        if len(result) != 2:            raise TypeError('Expected 2 arguments, got %d' % len(result))        return result    def __repr__(self):        'Return a nicely formatted representation string'        return 'Point(x=%r, y=%r)' % self    def _asdict(self):        'Return a new OrderedDict which maps field names to their values'        return OrderedDict(zip(self._fields, self))    def _replace(_self, **kwds):        'Return a new Point object replacing specified fields with new values'        result = _self._make(map(kwds.pop, ('x', 'y'), _self))        if kwds:            raise ValueError('Got unexpected field names: %r' % kwds.keys())        return result    def __getnewargs__(self):        'Return self as a plain tuple.   Used by copy and pickle.'        return tuple(self)    __dict__ = _property(_asdict)    def __getstate__(self):        'Exclude the OrderedDict from pickling'        pass    x = _property(_itemgetter(0), doc='Alias for field number 0')    y = _property(_itemgetter(1), doc='Alias for field number 1')>>> p = Point(11, y=22)     # 實例化一個點對象>>> p[0] + p[1]             # 索引方式相加33>>> x, y = p                # unpack like a regular tuple>>> x, y(11, 22)>>> p.x + p.y               # 屬性方式相加33>>> p                       # __repr__實例的值Point(x=11, y=22)

namedtuple對于導入csv和sqlite3的數據十分方便。以下是官方的demo

EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade')import csvfor emp in map(EmployeeRecord._make, csv.reader(open("employees.csv", "rb"))):    print emp.name, emp.titleimport sqlite3conn = sqlite3.connect('/companydata')cursor = conn.cursor()cursor.execute('SELECT name, age, title, department, paygrade FROM employees')for emp in map(EmployeeRecord._make, cursor.fetchall()):    print emp.name, emp.title

namedtuple的一些封裝方法

>>> t = [11, 22]>>> Point._make(t)    #通過_make(可迭代對象)對實例傳值Point(x=11, y=22)>>> p._asdict()    #返回一個有序字典（py2.7更新的功能）OrderedDict([('x', 11), ('y', 22)])>>> p = Point(x=11, y=22)>>> p._replace(x=33) #_replace方法替換值Point(x=33, y=22)>>> for partnum, record in inventory.items():        inventory[partnum] = record._replace(price=newprices[partnum], timestamp=time.now())>>> p._fields            # 查看 fields名字('x', 'y')>>> Color = namedtuple('Color', 'red green blue')>>> Pixel = namedtuple('Pixel', Point._fields + Color._fields)>>> Pixel(11, 22, 128, 255, 0)Pixel(x=11, y=22, red=128, green=255, blue=0)>>> getattr(p, 'x') #獲取p實例的x的值11>>> d = {'x': 11, 'y': 22}>>> Point(**d)    #用"**"表示傳的參數是一個字典Point(x=11, y=22)>>> class Point(namedtuple('Point', 'x y')):        __slots__ = ()          @property        def hypot(self):            return (self.x ** 2 + self.y ** 2) ** 0.5        def __str__(self):            return 'Point: x=%6.3f  y=%6.3f  hypot=%6.3f' % (self.x, self.y, self.hypot)>>> Point3D = namedtuple('Point3D', Point._fields + ('z',))>>> Account = namedtuple('Account', 'owner balance transaction_count')>>> default_account = Account('<owner name>', 0.0, 0)>>> johns_account = default_account._replace(owner='John')>>> Status = namedtuple('Status', 'open pending closed')._make(range(3)) #實例化時，也可以同時初始化對象>>> Status.open, Status.pending, Status.closed(0, 1, 2)>>> class Status:        open, pending, closed = range(3)

OrderedDict

普通字典是無序結構，不是可哈希的值，對于某些應用情況可能不方便，OrderedDict提供的就是無序字典結構有序化的方法。

example:

>>> # 普通的字典>>> d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}>>> # 以key排序>>> OrderedDict(sorted(d.items(), key=lambda t: t[0]))OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])>>> # 以value排序>>> OrderedDict(sorted(d.items(), key=lambda t: t[1]))OrderedDict([('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)])>>> # 以key的長度排序>>> OrderedDict(sorted(d.items(), key=lambda t: len(t[0])))OrderedDict([('pear', 1), ('apple', 4), ('orange', 2), ('banana', 3)])

defaultdict

dict的子類，當某個key不存在時，提供一個默認值，而不是報錯"keyerror"。

example

>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]>>> d = defaultdict(list) #以list格式儲存字典values>>> for k, v in s:...     d[k].append(v)...>>> d.items()[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

>>> d = {}>>> for k, v in s:...     d.setdefault(k, []).append(v) #另一種方式...>>> d.items()[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

>>> s = 'mississippi'>>> d = defaultdict(int) #這種風格比較像counter>>> for k in s:...     d[k] += 1...>>> d.items()[('i', 4), ('p', 2), ('s', 4), ('m', 1)]

>>> def constant_factory(value):...     return itertools.repeat(value).next>>> d = defaultdict(constant_factory('<missing>'))>>> d.update(name='John', action='ran')>>> '%(name)s %(action)s to %(object)s' % d #key=object是缺失的值，采用默認值'John ran to <missing>'

>>> s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)]>>> d = defaultdict(set) #以集合形式存儲字典的values>>> for k, v in s:...     d[k].add(v)...>>> d.items()[('blue', set([2, 4])), ('red', set([1, 3]))]