data

dataset

class brightfield2fish.data.dataset.FishDataframeDatasetTIFF(df, csv=False, channel_content='DNA', resize_original=None, random_crop=None, math_dtype=<class 'numpy.float64'>, out_dtype=<class 'numpy.float32'>, output_torch=True, channel_dim=True, return_tuple=True)[source]

Dataset class for Brightfield -> FISH prediction that reads single channel tiffs.

Parameters:
  • df (pd.DataFrame) – input dataframe that specifies dataset
  • csv (bool) – if True, accept a csv file path rahter than a DataFrame
  • channel_content (str) – what content to pair with brightfiled, e.g. DNA
  • resize_original (float, tuple, or None) – if not None, how to resize the original 3D images
  • random_crop (tuple, or None) – if not None, tuple of z,y,x sizes (in pixels) to which image woll be randomly cropped
  • math_dtype (numpy.dpython:type) – data type in which internal computations will be done
  • out_dtype (numpy.dpython:type) – data type that will be output
  • output_torch (boool) – if True, output a torch.tensor rather than a np.array
  • channel_dim (bool) – if True, include a singleton channel dimension for output 3D images
  • return_tuple (bool) – if True, return images as (brightfield, target), else return as a dict
class brightfield2fish.data.dataset.FishSegDataframeDatasetTIFF(df, csv=False, channel_content='MYH7', resize_original=None, random_crop=None, math_dtype=<class 'numpy.float64'>, out_dtype=<class 'numpy.float32'>, output_torch=True, channel_dim=True, return_tuple=True, fish_3d=True, bf_clip_percentiles=[0.01, 99.99], normalize=True)[source]

Dataset class for Brghtfield -> FISH prediction that reads in 3D tiffs for inputs and 2d fish segs for targets. Extrudes the 2d data along z for image to image prediction task.

Parameters:
  • df (pd.DataFrame) – input dataframe that specifies dataset
  • csv (bool) – if True, accept a csv file path rahter than a DataFrame
  • channel_content (str) – what content to pair with brightfiled, e.g. DNA
  • resize_original (float, tuple, or None) – if not None, how to resize the original 3D images
  • random_crop (tuple, or None) – if not None, tuple of z,y,x sizes (in pixels) to which image woll be randomly cropped
  • math_dtype (numpy.dpython:type) – data type in which internal computations will be done
  • out_dtype (numpy.dpython:type) – data type that will be output
  • output_torch (boool) – if True, output a torch.tensor rather than a np.array
  • channel_dim (bool) – if True, include a singleton channel dimension for output 3D images
  • return_tuple (bool) – if True, return images as (brightfield, target), else return as a dict
  • fish_3d (bool) – if True, return fish image as 3D, extruded along z axis
  • bf_clip_percentiles (list) – lower and upper percentiales of pixel intesity at which to clip the brightfield image
  • normalize (bool) – if True, normalize the brightfield image to zero mean and unit varinace, and normalize the fish image to min zero and max one

split_data

brightfield2fish.data.split_data.hashsplit(X, splits={'test': 0.2, 'train': 0.8}, salt=1, N=5)[source]

Splits a list of items pseudorandomly (but deterministically) based on the hashes of the items.

Parameters:
  • X (list) – list of items to be split into non-overlapping groups
  • splits (dict) – dict of {name:weight} pairs definiting the desired split
  • salt (str) – str(salt) is appended to each list item before hashing
  • N (int) – number of significant figures to compute for binning each list item
Returns:

{name:indices} for all names in the input split dict

Return type:

(dict)

Example

>>> hashsplit(list("allen cell institute"), {'train':0.7,'test':0.3}, salt=3, N=8)
{'test': [4, 12, 17],
'train': [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 18, 19]}
brightfield2fish.data.split_data.split_and_save(csv_name='data_by_images_normalized.csv', csv_dir='/allen/aics/modeling/data/brightfield2fish/preprocessed', split_col='file', save_dir='data/splits', splits={'test': 0.15, 'train': 0.7, 'valid': 0.15}, seed=0)[source]

Split a csv dataset and save the splits and indices to disk.

Parameters:
  • csv_name (str) – csv to be split into non-overlapping groups
  • csv_dir (str) – path to directory in which csv resides
  • split_col (str) – column to use as id for splitting into groups
  • save_dir (str) – path to directory where split csvs and indices should be saved
  • splits (dict) – dict of {name:size} by which to split data
  • seed (int) – salt fir the hash fuction that does the splitting

utils

class brightfield2fish.data.utils.RandomCrop(array, crop_size)[source]

Takes an input numpy array (e.g. a 3D image) and randomly sets a crop region of size crop_size. Can then apply that specific random crop to other images with the crop method. Useful for data augmentation on paired images.

Parameters:
  • array (numpy.ndarray) – numpy array whose size and shape will be used in selecting a random region to crop
  • crop_size (tuple) – tuple of ints of length array.ndim, e.g. (z,y,x) for 3D, specifying the size of the region to select for cropping within the bounds set by array.shape

Example

>>> A = np.random.randn(10,20,30)
>>> B = A + 1
>>> crop_size = (5,10,15)
>>> rc = RandomCrop3D(A, crop_size)
>>> B_cropped = rc.crop(B)
crop(X)[source]

Perform random crop on a new data array.

Parameters:X (numpy.ndarray) – array to crop, same size as the array used to initialize the RandomCrop object
Returns:cropped array
Return type:(numpy.ndarray)
brightfield2fish.data.utils.float_to_uint(im, uint_dtype=<class 'numpy.uint8'>)[source]

Convert an array of floats to unsigned ints, contrast stretrching so to the dynamic range of the output data type.

Parameters:
  • im (numpy.ndarray) – data matrix
  • uint_dtype (numpy.dpython:type) – numpy data type e.g. np.uint8
Returns:

integer data matrix

Return type:

(numpy.ndarray)

brightfield2fish.data.utils.normalize(im, content='Brightfield')[source]

Normalize a numpy array to either have min zero and max one, or mean zero and unit variance, depending on the content arg.

Parameters:
  • im (numpy.ndarray) – data matrix
  • content (str) – content of the image to normalize. If content=”Brightfield”, normalize to mean zero and unit variaince, else normalize to min zero and max one.
Returns:

normalized data matrix

Return type:

(numpy.ndarray)

brightfield2fish.data.utils.normalize_image_center_scale(im)[source]

Normalize a Numpy array to have mean zero and variance one.

Parameters:im (numpy.ndarray) – data matrix
Returns:normalized data matrix
Return type:(numpy.ndarray)
brightfield2fish.data.utils.normalize_image_zero_one(im)[source]

Normalize a Numpy array to have min zero and max one.

Parameters:im (numpy.ndarray) – data matrix
Returns:normalized data matrix
Return type:(numpy.ndarray)
brightfield2fish.data.utils.normalize_image_zero_one_torch(im)[source]

Normalize a Pytorch tensor to have min zero and max one.

Parameters:im (torch.Tensor) – data matrix
Returns:normalized data matrix
Return type:(torch.Tensor)
brightfield2fish.data.utils.plot_prepped(img3d, reduce_3D_to_2D=functools.partial(<function percentile>, q=100, axis=0))[source]

Plots a 2D projection of a 3D image by returning a PIL Image object.

Parameters:
  • img3d (numpy.ndarray) – data array, prepped and normalized
  • reduce_3D_to_2D (functools.partial) – function for converting a 3D image to a 2D image, e.g. functools.partial(np.percentile, q=99, axis=0)
Returns:

single channel PIL.Image

Return type:

(PIL.Image)

brightfield2fish.data.utils.prep_fish(image, channel=1, T=0, clip_percentiles=[0, 99.99], median_subtract=True, math_dtype=<class 'numpy.float64'>, out_dtype=<class 'numpy.uint16'>)[source]

Normalize a Numpy array to have min zero and max one.

Parameters:
  • image (aicsimageio.AICSImage) – input image object
  • channel (int) – channel to select for prep
  • T (int) – time point to select for prep
  • clip_percentiles (list) – min and max pixel values at which to clip image signal
  • median_subtract (bool) – if True, set all pixels below the median value to zero
  • math_dtype (numpy.dpython:type) – numpy dtype in which internal computations are performed
  • out_dtype (numpy.dpython:type) – numpy dtype in for output array
Returns:

normalized data single channel 3D array

Return type:

(numpy.ndarray)