data¶
dataset¶
-
class
brightfield2fish.data.dataset.FishDataframeDatasetTIFF(df, csv=False, channel_content='DNA', resize_original=None, random_crop=None, math_dtype=<class 'numpy.float64'>, out_dtype=<class 'numpy.float32'>, output_torch=True, channel_dim=True, return_tuple=True)[source]¶ Dataset class for Brightfield -> FISH prediction that reads single channel tiffs.
Parameters: - df (pd.DataFrame) – input dataframe that specifies dataset
- csv (bool) – if True, accept a csv file path rahter than a DataFrame
- channel_content (str) – what content to pair with brightfiled, e.g. DNA
- resize_original (float, tuple, or None) – if not None, how to resize the original 3D images
- random_crop (tuple, or None) – if not None, tuple of z,y,x sizes (in pixels) to which image woll be randomly cropped
- math_dtype (numpy.dpython:type) – data type in which internal computations will be done
- out_dtype (numpy.dpython:type) – data type that will be output
- output_torch (boool) – if True, output a torch.tensor rather than a np.array
- channel_dim (bool) – if True, include a singleton channel dimension for output 3D images
- return_tuple (bool) – if True, return images as (brightfield, target), else return as a dict
-
class
brightfield2fish.data.dataset.FishSegDataframeDatasetTIFF(df, csv=False, channel_content='MYH7', resize_original=None, random_crop=None, math_dtype=<class 'numpy.float64'>, out_dtype=<class 'numpy.float32'>, output_torch=True, channel_dim=True, return_tuple=True, fish_3d=True, bf_clip_percentiles=[0.01, 99.99], normalize=True)[source]¶ Dataset class for Brghtfield -> FISH prediction that reads in 3D tiffs for inputs and 2d fish segs for targets. Extrudes the 2d data along z for image to image prediction task.
Parameters: - df (pd.DataFrame) – input dataframe that specifies dataset
- csv (bool) – if True, accept a csv file path rahter than a DataFrame
- channel_content (str) – what content to pair with brightfiled, e.g. DNA
- resize_original (float, tuple, or None) – if not None, how to resize the original 3D images
- random_crop (tuple, or None) – if not None, tuple of z,y,x sizes (in pixels) to which image woll be randomly cropped
- math_dtype (numpy.dpython:type) – data type in which internal computations will be done
- out_dtype (numpy.dpython:type) – data type that will be output
- output_torch (boool) – if True, output a torch.tensor rather than a np.array
- channel_dim (bool) – if True, include a singleton channel dimension for output 3D images
- return_tuple (bool) – if True, return images as (brightfield, target), else return as a dict
- fish_3d (bool) – if True, return fish image as 3D, extruded along z axis
- bf_clip_percentiles (list) – lower and upper percentiales of pixel intesity at which to clip the brightfield image
- normalize (bool) – if True, normalize the brightfield image to zero mean and unit varinace, and normalize the fish image to min zero and max one
split_data¶
-
brightfield2fish.data.split_data.hashsplit(X, splits={'test': 0.2, 'train': 0.8}, salt=1, N=5)[source]¶ Splits a list of items pseudorandomly (but deterministically) based on the hashes of the items.
Parameters: Returns: {name:indices} for all names in the input split dict
Return type: (dict)
Example
>>> hashsplit(list("allen cell institute"), {'train':0.7,'test':0.3}, salt=3, N=8) {'test': [4, 12, 17], 'train': [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 18, 19]}
-
brightfield2fish.data.split_data.split_and_save(csv_name='data_by_images_normalized.csv', csv_dir='/allen/aics/modeling/data/brightfield2fish/preprocessed', split_col='file', save_dir='data/splits', splits={'test': 0.15, 'train': 0.7, 'valid': 0.15}, seed=0)[source]¶ Split a csv dataset and save the splits and indices to disk.
Parameters: - csv_name (str) – csv to be split into non-overlapping groups
- csv_dir (str) – path to directory in which csv resides
- split_col (str) – column to use as id for splitting into groups
- save_dir (str) – path to directory where split csvs and indices should be saved
- splits (dict) – dict of {name:size} by which to split data
- seed (int) – salt fir the hash fuction that does the splitting
utils¶
-
class
brightfield2fish.data.utils.RandomCrop(array, crop_size)[source]¶ Takes an input numpy array (e.g. a 3D image) and randomly sets a crop region of size crop_size. Can then apply that specific random crop to other images with the crop method. Useful for data augmentation on paired images.
Parameters: - array (numpy.ndarray) – numpy array whose size and shape will be used in selecting a random region to crop
- crop_size (tuple) – tuple of ints of length array.ndim, e.g. (z,y,x) for 3D, specifying the size of the region to select for cropping within the bounds set by array.shape
Example
>>> A = np.random.randn(10,20,30) >>> B = A + 1 >>> crop_size = (5,10,15) >>> rc = RandomCrop3D(A, crop_size) >>> B_cropped = rc.crop(B)
-
crop(X)[source]¶ Perform random crop on a new data array.
Parameters: X (numpy.ndarray) – array to crop, same size as the array used to initialize the RandomCrop object Returns: cropped array Return type: (numpy.ndarray)
-
brightfield2fish.data.utils.float_to_uint(im, uint_dtype=<class 'numpy.uint8'>)[source]¶ Convert an array of floats to unsigned ints, contrast stretrching so to the dynamic range of the output data type.
Parameters: - im (numpy.ndarray) – data matrix
- uint_dtype (numpy.dpython:type) – numpy data type e.g. np.uint8
Returns: integer data matrix
Return type:
-
brightfield2fish.data.utils.normalize(im, content='Brightfield')[source]¶ Normalize a numpy array to either have min zero and max one, or mean zero and unit variance, depending on the content arg.
Parameters: - im (numpy.ndarray) – data matrix
- content (str) – content of the image to normalize. If content=”Brightfield”, normalize to mean zero and unit variaince, else normalize to min zero and max one.
Returns: normalized data matrix
Return type:
-
brightfield2fish.data.utils.normalize_image_center_scale(im)[source]¶ Normalize a Numpy array to have mean zero and variance one.
Parameters: im (numpy.ndarray) – data matrix Returns: normalized data matrix Return type: (numpy.ndarray)
-
brightfield2fish.data.utils.normalize_image_zero_one(im)[source]¶ Normalize a Numpy array to have min zero and max one.
Parameters: im (numpy.ndarray) – data matrix Returns: normalized data matrix Return type: (numpy.ndarray)
-
brightfield2fish.data.utils.normalize_image_zero_one_torch(im)[source]¶ Normalize a Pytorch tensor to have min zero and max one.
Parameters: im (torch.Tensor) – data matrix Returns: normalized data matrix Return type: (torch.Tensor)
-
brightfield2fish.data.utils.plot_prepped(img3d, reduce_3D_to_2D=functools.partial(<function percentile>, q=100, axis=0))[source]¶ Plots a 2D projection of a 3D image by returning a PIL Image object.
Parameters: - img3d (numpy.ndarray) – data array, prepped and normalized
- reduce_3D_to_2D (functools.partial) – function for converting a 3D image to a 2D image, e.g. functools.partial(np.percentile, q=99, axis=0)
Returns: single channel PIL.Image
Return type: (PIL.Image)
-
brightfield2fish.data.utils.prep_fish(image, channel=1, T=0, clip_percentiles=[0, 99.99], median_subtract=True, math_dtype=<class 'numpy.float64'>, out_dtype=<class 'numpy.uint16'>)[source]¶ Normalize a Numpy array to have min zero and max one.
Parameters: - image (aicsimageio.AICSImage) – input image object
- channel (int) – channel to select for prep
- T (int) – time point to select for prep
- clip_percentiles (list) – min and max pixel values at which to clip image signal
- median_subtract (bool) – if True, set all pixels below the median value to zero
- math_dtype (numpy.dpython:type) – numpy dtype in which internal computations are performed
- out_dtype (numpy.dpython:type) – numpy dtype in for output array
Returns: normalized data single channel 3D array
Return type: