Schema

Starspace defines three basic object types that describe the features extracted from a typical spatial experiment: spots, cells, and cell x gene expression matrices. It defines standard terminology to describe the basic spatial information for each object, allowing interoperability between data from different assays, and, if such a standard form were adopted, would make these data accessible to tool chains.

This repository intentionally describes a schema but not a format – these data could be defined in any number of ways. This repository chooses to use zarr

Spots

Spots is a tabular data table, where each record describes a spot. The columns of this table have three required fields and several standardized, but optional fields:

[
  {
    "description": "Name of the gene, using standard symbols",
    "mode": "REQUIRED",
    "name": "gene_name",
    "type": "STRING"
  },
  {
    "description": "y-coordinate of the center of the spot in microns",
    "mode": "REQUIRED",
    "name": "y_spot_microns",
    "type": "FLOAT"
  },
  {
    "description": "x-coordinate of the center of the spot in microns",
    "mode": "REQUIRED",
    "name": "x_spot_microns",
    "type": "FLOAT"
  },
  {
    "description": "z-coordinate of the center of the spot in microns",
    "mode": "OPTIONAL",
    "name": "z_spot_microns",
    "type": "FLOAT"
  },
  {
    "description": "y-coordinate of the center of the spot in pixels",
    "mode": "REQUIRED",
    "name": "y_spot_pixels",
    "type": "FLOAT"
  },
  {
    "description": "x-coordinate of the center of the spot in pixels",
    "mode": "REQUIRED",
    "name": "x_spot_pixels",
    "type": "FLOAT"
  },
  {
    "description": "z-coordinate of the center of the spot in pixels",
    "mode": "OPTIONAL",
    "name": "z_spot_pixels",
    "type": "FLOAT"
  },
  {
    "description": "id of the region (e.g. cell) that this spot belongs to",
    "mode": "OPTIONAL",
    "name": "region_id",
    "type": "INT"
  },
  {
    "description": "y-coordinate of the region this spot falls inside in microns",
    "mode": "OPTIONAL",
    "name": "y_region_microns",
    "type": "FLOAT"
  },
  {
    "description": "x-coordinate of the region this spot falls inside in microns",
    "mode": "OPTIONAL",
    "name": "x_region_microns",
    "type": "FLOAT"
  },
  {
    "description": "z-coordinate of the region this spot falls inside in microns",
    "mode": "OPTIONAL",
    "name": "z_region_microns",
    "type": "FLOAT"
  },
  {
    "description": "y-coordinate of the region this spot falls inside in pixels",
    "mode": "OPTIONAL",
    "name": "y_region_pixels",
    "type": "FLOAT"
  },
  {
    "description": "x-coordinate of the region this spot falls inside in pixels",
    "mode": "OPTIONAL",
    "name": "x_region_pixels",
    "type": "FLOAT"
  },
  {
    "description": "z-coordinate of the region this spot falls inside in pixels",
    "mode": "OPTIONAL",
    "name": "z_region_pixels",
    "type": "FLOAT"
  },
  {
    "description": "quality of this spot",
    "mode": "OPTIONAL",
    "name": "quality",
    "type": "FLOAT"
  },
  {
    "description": "radius of the spot",
    "mode": "OPTIONAL",
    "name": "radius",
    "type": "FLOAT"
  },
  {
    "description": "field of view that this spot is associated with",
    "mode": "OPTIONAL",
    "name": "fov",
    "type": "INT"
  }
]

The axes of this object can optionally be named:

[
  {
    "description": "name of first spots axis, which contains an integer index",
    "mode": "OPTIONAL",
    "name": "spot_index",
    "type": "STRING"
  },
  {
    "description": "name of second spots axis, listing spot characteristics",
    "mode": "OPTIONAL",
    "name": "spot_characteristics",
    "type": "STRING"
  }
]

Regions

Regions stores a label image. Each pixel belonging to a cell is encoded using the same integer value. Each sequential object is labeled with the next smallest integer. Such an image allows for each intersection of spots and cells to create a count matrix, but such an image can also be overlaid on image data to verify that cells were properly segmented.

The axes of this object can optionally be named:

[
  {
    "description": "name of regions y-axis",
    "mode": "OPTIONAL",
    "name": "y_region",
    "type": "STRING"
  },
  {
    "description": "name of regions x-axis",
    "mode": "OPTIONAL",
    "name": "x_region",
    "type": "STRING"
  },
  {
    "description": "name of regions z-axis",
    "mode": "OPTIONAL",
    "name": "z_region",
    "type": "STRING"
  },
  {
    "description": "regions pixel size y",
    "mode": "OPTIONAL",
    "name": "region_pixel_size_y",
    "type": "FLOAT"
  },
  {
    "description": "regions pixel size x",
    "mode": "OPTIONAL",
    "name": "region_pixel_size_x",
    "type": "FLOAT"
  },
  {
    "description": "regions pixel size z",
    "mode": "OPTIONAL",
    "name": "region_pixel_size_z",
    "type": "FLOAT"
  }
]

Matrix

The matrix file is a traditional region x feature expression matrix. Its values can contain either count data (e.g. spots) or continuous data (e.g. protein intensities). Regions can represent cells, anatomical areas, or stereotyped super-cellular areas, like those measured by slide-seq or spatial transcriptomics. features can be protein or rna abundances, or counts of other anatomical structures aggregated over regions.

The matrix stores metadata for each region that describe characteristics of the region:

[
  {
    "description": "unique identifier for the region",
    "mode": "REQUIRED",
    "name": "region_id",
    "type": "INT"
  },
  {
    "description": "y-coordinate of the center of the region in microns",
    "mode": "REQUIRED",
    "name": "y_region_microns",
    "type": "FLOAT"
  },
  {
    "description": "x-coordinate of the center of the region, in microns",
    "mode": "REQUIRED",
    "name": "x_region_microns",
    "type": "FLOAT"
  },
  {
    "description": "z-coordinate of the center of the region in microns",
    "mode": "OPTIONAL",
    "name": "z_region_microns",
    "type": "FLOAT"
  },
  {
    "description": "y-coordinate of the center of the region in pixels",
    "mode": "OPTIONAL",
    "name": "y_region_pixels",
    "type": "FLOAT"
  },
  {
    "description": "x-coordinate of the center of the region, in pixels",
    "mode": "OPTIONAL",
    "name": "x_region_pixels",
    "type": "FLOAT"
  },
  {
    "description": "z-coordinate of the center of the region in pixels",
    "mode": "OPTIONAL",
    "name": "z_region_pixels",
    "type": "FLOAT"
  },
  {
    "description": "physical annotation for the region, e.g. 'brain white matter'",
    "mode": "OPTIONAL",
    "name": "physical_annotation",
    "type": "STRING"
  },
  {
    "description": "cell type annotation for the region",
    "mode": "OPTIONAL",
    "name": "type_annotation",
    "type": "STRING"
  },
  {
    "description": "group id for this cell, e.g. cluster id",
    "mode": "OPTIONAL",
    "name": "group_id",
    "type": "INT"
  },
  {
    "description": "field of view that this region was identified in",
    "mode": "OPTIONAL",
    "name": "fov",
    "type": "INT"
  },
  {
    "description": "area of the region in pixels",
    "mode": "OPTIONAL",
    "name": "area_pixels",
    "type": "FLOAT"
  },
  {
    "description": "area of the region in square micrometers",
    "mode": "OPTIONAL",
    "name": "area_sq_microns",
    "type": "FLOAT"
  }
]

The matrix also stores metadata that describe the features:

[
  {
    "description": "name of the feature (e.g. gene or protein name)",
    "mode": "REQUIRED",
    "name": "gene_name",
    "type": "STRING"
  }
]

The axes of the matrix can optionally be named:

[
  {
    "description": "name of first matrix axis, describing regions",
    "mode": "OPTIONAL",
    "name": "regions",
    "type": "STRING"
  },
  {
    "description": "name of second matrix axis, describing features",
    "mode": "OPTIONAL",
    "name": "features",
    "type": "STRING"
  }
]