utils

Helper functions

GzipInputStream(fileobj[, block_size])

Simple class that allow streaming reads from GZip files (from https://gist.github.com/beaufour/4205533).

bytes2human(nbytes)

Return string representation of bytes.

clean_object_name(input_function)

Remove leading “/” from object_name

generate_ndarray_chunks(arr[, axis, buffersize])

A generator that splits an array into chunks of desired byte size

get_fileobject_size(file_object)

Return byte size of file-object

get_key_from_environ()

Get AWS keys from environmental variables if available

get_key_from_s3fs()

Get AWS keys from default S3fs location if available.

get_keys()

Read AWS keys from S3fs configuration or environmental variables.

get_object_size(boto_s3_object)

Return the size of the S3 object in MB

has_magic(s)

Check string to see if it has any glob magic

has_real_magic(s)

Check if string has non-trivial glob pattern

has_start_digit(s)

has_trivial_magic(s)

Check string to see if it has trivial glob magic (e.g. “path/*”).

mk_aws_path(path)

Make the path behave as expected when querying S3 with list_objects.

objects2names(objects)

Return the name of all objects in a list

pathjoin(a, *p)

Join two or more pathname components, inserting SEPARATOR as needed.

print_objects(object_list)

Print name, size, and creation date of objects in list.

read_buffered(frm, to[, buffersize])

Fill a numpy n-d array with file-like object contents

remove_root(string_)

remove leading “/” from a string

remove_trivial_magic(s)

  • xxx/* -> xxx/

sanitize_metadata(metadict)

split_uri(uri[, pattern, separator])

Convert a URI to a bucket, object name tuple.

string2bool(mstring)

unquote_names(object_names)

Clean URL names from a list.

GzipInputStream

class cottoncandy.utils.GzipInputStream(fileobj, block_size=16384)

Bases: object

Simple class that allow streaming reads from GZip files (from https://gist.github.com/beaufour/4205533).

Python 2.x gzip.GZipFile relies on .seek() and .tell(), so it doesn’t support this (@see: http://bo4.me/YKWSsL).

Adapted from: http://effbot.org/librarybook/zlib-example-4.py

__init__(fileobj, block_size=16384)

Initialize with the given file-like object.

@param fileobj: file-like object,

next()
read(size=0)
readline()
readlines()
seek(offset, whence=0)
tell()

bytes2human

cottoncandy.utils.bytes2human(nbytes)

Return string representation of bytes.

Parameters

nbytes (int) – Number of bytes

Returns

human_bytes – Human readable byte size (e.g. “10.00MB”, “1.24GB”, etc.).

Return type

str

clean_object_name

cottoncandy.utils.clean_object_name(input_function)

Remove leading “/” from object_name

This is important for compatibility with S3fs. S3fs does not list objects with a “/” prefix.

generate_ndarray_chunks

cottoncandy.utils.generate_ndarray_chunks(arr, axis=None, buffersize=104857600)

A generator that splits an array into chunks of desired byte size

Parameters
  • arr (np.ndarray) –

  • axis (int, None) – The axis along which to slice the array. If None is given, the array is chunked into ideal isotropic voxels.

  • buffersize (scalar) – Byte size of the desired array chunks

Returns

iterator – The object yields the tuple: (chunk_coordinates, chunk_data_slice)

  • chunk_coordinates: Indices of the current chunk along each dimension

  • chunk_data_slice: Data for this chunk

Return type

generator object

Notes

axis=None is WIP and only works well for near isotropic matrices.

get_fileobject_size

cottoncandy.utils.get_fileobject_size(file_object)

Return byte size of file-object

Parameters

file_object (file object) –

Returns

nbytes

Return type

int

get_key_from_environ

cottoncandy.utils.get_key_from_environ()

Get AWS keys from environmental variables if available

Returns

  • ACCESS_KEY (str)

  • SECRET_KEY (str)

Notes

Reads AWS_ACCESS_KEY and AWS_SECRET_KEY

get_key_from_s3fs

cottoncandy.utils.get_key_from_s3fs()

Get AWS keys from default S3fs location if available.

Returns

  • ACCESS_KEY (str)

  • SECRET_KEY (str)

Notes

Reads ~/.passwd-s3fs to get ACCESSKEY and SECRET KEY

get_keys

cottoncandy.utils.get_keys()

Read AWS keys from S3fs configuration or environmental variables.

Returns

  • ACCESS_KEY (str)

  • SECRET_KEY (str)

get_object_size

cottoncandy.utils.get_object_size(boto_s3_object)

Return the size of the S3 object in MB

Parameters

boto_s3_object (boto object) –

Returns

object_size

Return type

float (in MB)

has_magic

cottoncandy.utils.has_magic(s)

Check string to see if it has any glob magic

has_real_magic

cottoncandy.utils.has_real_magic(s)

Check if string has non-trivial glob pattern

has_start_digit

cottoncandy.utils.has_start_digit(s)

has_trivial_magic

cottoncandy.utils.has_trivial_magic(s)

Check string to see if it has trivial glob magic (e.g. “path/*”)

mk_aws_path

cottoncandy.utils.mk_aws_path(path)

Make the path behave as expected when querying S3 with list_objects.

  • xxx/yyy -> xxx/yyy/

  • xxx/ -> xxx/

  • xxx -> xxx/

  • / -> ‘’

  • ‘’ -> ‘’

objects2names

cottoncandy.utils.objects2names(objects)

Return the name of all objects in a list

Parameters

objects (list (of boto3 objects)) –

Returns

object_names

Return type

list (of strings)

pathjoin

cottoncandy.utils.pathjoin(a, *p)

Join two or more pathname components, inserting SEPARATOR as needed. If any component is an absolute path, all previous path components will be discarded. An empty last part will result in a path that ends with a separator.

read_buffered

cottoncandy.utils.read_buffered(frm, to, buffersize=64)

Fill a numpy n-d array with file-like object contents

Parameters
  • frm (buffer) – Object with a read method

  • to (np.ndarray) – Array to which the contents will be put

remove_root

cottoncandy.utils.remove_root(string_)

remove leading “/” from a string

remove_trivial_magic

cottoncandy.utils.remove_trivial_magic(s)
  • xxx/* -> xxx/

  • xxx/ -> xxx/

  • xxx//yyy/ -> xxx//yyy/

sanitize_metadata

cottoncandy.utils.sanitize_metadata(metadict)

split_uri

cottoncandy.utils.split_uri(uri, pattern='s3://', separator='/')

Convert a URI to a bucket, object name tuple.

‘s3://bucket/path/to/thing’ -> (‘bucket’, ‘path/to/thing’)

string2bool

cottoncandy.utils.string2bool(mstring)

unquote_names

cottoncandy.utils.unquote_names(object_names)

Clean URL names from a list.

Parameters

object_names (list (of strings)) –

Returns

clean_object_names

Return type

list (of strings)