Welcome to cereal!

Utility functions for pointers & data traversals

Why

At computation cereal supports with controlling dataflows which act on a store, while permitting batched sampling without creating deep-memory copies. This allows insertions, traversals, and replacements over n-dimensional data while maintaining strict memory allocation methods. Beneath the hood, cereal will create shallow memory copies of an object using Pointers such that asynchronous traversal and access methods can be easily implemented without initializing a new object at compute time. This is particularly useful for distributed systems with shared memory spaces - wherein that we can control computation distribution accordingly without the requirement for managing distribution and re-collection of data (items are interacted with directly, across the traversal per system).

Pointers.py

This package is based on works from ZeroIntensity's pointers.py - Although they insist it started as a joke, it's pretty handy for managing memory easily in Python as we would in C (with the bonus of inheriting PyObject properties).

Example

import numpy as np
from cereal import eat

x = [1, 2, 3, 4]

yum = eat(x, its = 4, method = np.array, compute = False, copy = False)

>>> yum
array([<pointer to list object at 0x113b3edc0>,
       <pointer to list object at 0x113b3edc0>,
       <pointer to list object at 0x113b3edc0>,
       <pointer to list object at 0x113b3edc0>], dtype=object) # 4 pointers to the original object

>>> (~yum[0])[0]
1

Asynchronous Example

import random
import asyncio
import numpy as np
from cereal import eat
from time import process_time

x = np.arange(10000).reshape(16, 625)

>>> x
array([[   0,    1,    2, ...,  622,  623,  624],
       [ 625,  626,  627, ..., 1247, 1248, 1249],
       [1250, 1251, 1252, ..., 1872, 1873, 1874],
         ...,
       [8125, 8126, 8127, ..., 8747, 8748, 8749],
       [8750, 8751, 8752, ..., 9372, 9373, 9374],
       [9375, 9376, 9377, ..., 9997, 9998, 9999]])

yum = eat(x, 16, np.array, compute=False, copy = False) # no creation time as we delay computation

async def addone(slice, n):
    await asyncio.sleep(random.random())
    slice[n] += 1

async def traverse(x, pos):
    mem = [asyncio.create_task(addone(x[pos], n)) for n in range(625)]
    return asyncio.gather(*mem)

async def traverse_gather(x, n):
    fut = await traverse(x, n)

start = process_time()
fut = [await traverse_gather(x, n) for n, x in enumerate(yum)] # in jupyter, asyncio.run() is proper form in pure python
end = process_time()

>>> print(end - start)
0.11029499999995096 # submission only - this allows you to dereference procedures from the data you are interacting with and retrieve values from the store as coroutines are completing.

>>> x # could be complete, or incomplete
array([[    1,     2,     3, ...,   623,   624,   625],
       [  626,   627,   628, ...,  1248,  1249,  1250],
       [ 1251,  1252,  1253, ...,  1873,  1874,  1875],
          ...,
       [ 8126,  8127,  8128, ...,  8748,  8749,  8750],
       [ 8751,  8752,  8753, ...,  9373,  9374,  9375],
       [ 9376,  9377,  9378, ...,  9998,  9999, 10000]])