data¶

dataset¶

class brightfield2fish.data.dataset.FishDataframeDatasetTIFF(df, csv=False, channel_content='DNA', resize_original=None, random_crop=None, math_dtype=<class 'numpy.float64'>, out_dtype=<class 'numpy.float32'>, output_torch=True, channel_dim=True, return_tuple=True)[source]¶

Dataset class for Brightfield -> FISH prediction that reads single channel tiffs.

Parameters:

df (pd.DataFrame) – input dataframe that specifies dataset
csv (bool) – if True, accept a csv file path rahter than a DataFrame
channel_content (str) – what content to pair with brightfiled, e.g. DNA
resize_original (float, tuple, or None) – if not None, how to resize the original 3D images
random_crop (tuple, or None) – if not None, tuple of z,y,x sizes (in pixels) to which image woll be randomly cropped
math_dtype (numpy.dpython:type) – data type in which internal computations will be done
out_dtype (numpy.dpython:type) – data type that will be output
output_torch (boool) – if True, output a torch.tensor rather than a np.array
channel_dim (bool) – if True, include a singleton channel dimension for output 3D images
return_tuple (bool) – if True, return images as (brightfield, target), else return as a dict

class brightfield2fish.data.dataset.FishSegDataframeDatasetTIFF(df, csv=False, channel_content='MYH7', resize_original=None, random_crop=None, math_dtype=<class 'numpy.float64'>, out_dtype=<class 'numpy.float32'>, output_torch=True, channel_dim=True, return_tuple=True, fish_3d=True, bf_clip_percentiles=[0.01, 99.99], normalize=True)[source]¶

Dataset class for Brghtfield -> FISH prediction that reads in 3D tiffs for inputs and 2d fish segs for targets. Extrudes the 2d data along z for image to image prediction task.

Parameters:

df (pd.DataFrame) – input dataframe that specifies dataset
csv (bool) – if True, accept a csv file path rahter than a DataFrame
channel_content (str) – what content to pair with brightfiled, e.g. DNA
resize_original (float, tuple, or None) – if not None, how to resize the original 3D images
random_crop (tuple, or None) – if not None, tuple of z,y,x sizes (in pixels) to which image woll be randomly cropped
math_dtype (numpy.dpython:type) – data type in which internal computations will be done
out_dtype (numpy.dpython:type) – data type that will be output
output_torch (boool) – if True, output a torch.tensor rather than a np.array
channel_dim (bool) – if True, include a singleton channel dimension for output 3D images
return_tuple (bool) – if True, return images as (brightfield, target), else return as a dict
fish_3d (bool) – if True, return fish image as 3D, extruded along z axis
bf_clip_percentiles (list) – lower and upper percentiales of pixel intesity at which to clip the brightfield image
normalize (bool) – if True, normalize the brightfield image to zero mean and unit varinace, and normalize the fish image to min zero and max one

split_data¶

brightfield2fish.data.split_data.hashsplit(X, splits={'test': 0.2, 'train': 0.8}, salt=1, N=5)[source]¶

Splits a list of items pseudorandomly (but deterministically) based on the hashes of the items.

Parameters:	X (list) – list of items to be split into non-overlapping groups splits (dict) – dict of {name:weight} pairs definiting the desired split salt (str) – str(salt) is appended to each list item before hashing N (int) – number of significant figures to compute for binning each list item
Returns:	{name:indices} for all names in the input split dict
Return type:	(dict)

Example

>>> hashsplit(list("allen cell institute"), {'train':0.7,'test':0.3}, salt=3, N=8)
{'test': [4, 12, 17],
'train': [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 18, 19]}

brightfield2fish.data.split_data.split_and_save(csv_name='data_by_images_normalized.csv', csv_dir='/allen/aics/modeling/data/brightfield2fish/preprocessed', split_col='file', save_dir='data/splits', splits={'test': 0.15, 'train': 0.7, 'valid': 0.15}, seed=0)[source]¶

Split a csv dataset and save the splits and indices to disk.

Parameters:

csv_name (str) – csv to be split into non-overlapping groups
csv_dir (str) – path to directory in which csv resides
split_col (str) – column to use as id for splitting into groups
save_dir (str) – path to directory where split csvs and indices should be saved
splits (dict) – dict of {name:size} by which to split data
seed (int) – salt fir the hash fuction that does the splitting

utils¶

class brightfield2fish.data.utils.RandomCrop(array, crop_size)[source]¶

Takes an input numpy array (e.g. a 3D image) and randomly sets a crop region of size crop_size. Can then apply that specific random crop to other images with the crop method. Useful for data augmentation on paired images.

Parameters:	array (numpy.ndarray) – numpy array whose size and shape will be used in selecting a random region to crop crop_size (tuple) – tuple of ints of length array.ndim, e.g. (z,y,x) for 3D, specifying the size of the region to select for cropping within the bounds set by array.shape

Example

>>> A = np.random.randn(10,20,30)
>>> B = A + 1
>>> crop_size = (5,10,15)
>>> rc = RandomCrop3D(A, crop_size)
>>> B_cropped = rc.crop(B)

crop(X)[source]¶

Perform random crop on a new data array.

Parameters:	X (numpy.ndarray) – array to crop, same size as the array used to initialize the RandomCrop object
Returns:	cropped array
Return type:	(numpy.ndarray)

brightfield2fish.data.utils.float_to_uint(im, uint_dtype=<class 'numpy.uint8'>)[source]¶

Convert an array of floats to unsigned ints, contrast stretrching so to the dynamic range of the output data type.

Parameters:	im (numpy.ndarray) – data matrix uint_dtype (numpy.dpython:type) – numpy data type e.g. np.uint8
Returns:	integer data matrix
Return type:	(numpy.ndarray)

brightfield2fish.data.utils.normalize(im, content='Brightfield')[source]¶

Normalize a numpy array to either have min zero and max one, or mean zero and unit variance, depending on the content arg.

Parameters:	im (numpy.ndarray) – data matrix content (str) – content of the image to normalize. If content=”Brightfield”, normalize to mean zero and unit variaince, else normalize to min zero and max one.
Returns:	normalized data matrix
Return type:	(numpy.ndarray)

brightfield2fish.data.utils.normalize_image_center_scale(im)[source]¶

Normalize a Numpy array to have mean zero and variance one.

Parameters:	im (numpy.ndarray) – data matrix
Returns:	normalized data matrix
Return type:	(numpy.ndarray)

brightfield2fish.data.utils.normalize_image_zero_one(im)[source]¶

Normalize a Numpy array to have min zero and max one.

Parameters:	im (numpy.ndarray) – data matrix
Returns:	normalized data matrix
Return type:	(numpy.ndarray)

brightfield2fish.data.utils.normalize_image_zero_one_torch(im)[source]¶

Normalize a Pytorch tensor to have min zero and max one.

Parameters:	im (torch.Tensor) – data matrix
Returns:	normalized data matrix
Return type:	(torch.Tensor)

brightfield2fish.data.utils.plot_prepped(img3d, reduce_3D_to_2D=functools.partial(<function percentile>, q=100, axis=0))[source]¶

Plots a 2D projection of a 3D image by returning a PIL Image object.

Parameters:	img3d (numpy.ndarray) – data array, prepped and normalized reduce_3D_to_2D (functools.partial) – function for converting a 3D image to a 2D image, e.g. functools.partial(np.percentile, q=99, axis=0)
Returns:	single channel PIL.Image
Return type:	(PIL.Image)

brightfield2fish.data.utils.prep_fish(image, channel=1, T=0, clip_percentiles=[0, 99.99], median_subtract=True, math_dtype=<class 'numpy.float64'>, out_dtype=<class 'numpy.uint16'>)[source]¶

Normalize a Numpy array to have min zero and max one.

Parameters:	image (aicsimageio.AICSImage) – input image object channel (int) – channel to select for prep T (int) – time point to select for prep clip_percentiles (list) – min and max pixel values at which to clip image signal median_subtract (bool) – if True, set all pixels below the median value to zero math_dtype (numpy.dpython:type) – numpy dtype in which internal computations are performed out_dtype (numpy.dpython:type) – numpy dtype in for output array
Returns:	normalized data single channel 3D array
Return type:	(numpy.ndarray)