Starspace¶
Starspace defines a basic, minimal, proof of concept schema for gene or protein expression data containing spatially localized information. This project consists of the following:
- Defines a standard schema to describe spots, spatial matrices, and cell regions
- Implements a library that reads and writes files in the defined schema, leveraging the zarr container to ensure scalability.
- To demonstrate the flexibility of the schema, converts data from a variety of published assay types, including Spatial Transcriptomics, CODEX, In-situ Sequencing, MERFISH, osmFISH, and starMAP
- Demonstrates how to visualize and interact with these data using common analysis packages, and convert the formats into loom and anndata objects, for downstream analysis in R and Python.
Contents¶
Schema¶
Starspace defines three basic object types that describe the features extracted from a typical spatial experiment: spots, cells, and cell x gene expression matrices. It defines standard terminology to describe the basic spatial information for each object, allowing interoperability between data from different assays, and, if such a standard form were adopted, would make these data accessible to tool chains.
This repository intentionally describes a schema but not a format – these data could be defined in any number of ways. This repository chooses to use zarr
Spots¶
Spots is a tabular data table, where each record describes a spot. The columns of this table have three required fields and several standardized, but optional fields:
[
{
"description": "Name of the gene, using standard symbols",
"mode": "REQUIRED",
"name": "gene_name",
"type": "STRING"
},
{
"description": "y-coordinate of the center of the spot in microns",
"mode": "REQUIRED",
"name": "y_spot_microns",
"type": "FLOAT"
},
{
"description": "x-coordinate of the center of the spot in microns",
"mode": "REQUIRED",
"name": "x_spot_microns",
"type": "FLOAT"
},
{
"description": "z-coordinate of the center of the spot in microns",
"mode": "OPTIONAL",
"name": "z_spot_microns",
"type": "FLOAT"
},
{
"description": "y-coordinate of the center of the spot in pixels",
"mode": "REQUIRED",
"name": "y_spot_pixels",
"type": "FLOAT"
},
{
"description": "x-coordinate of the center of the spot in pixels",
"mode": "REQUIRED",
"name": "x_spot_pixels",
"type": "FLOAT"
},
{
"description": "z-coordinate of the center of the spot in pixels",
"mode": "OPTIONAL",
"name": "z_spot_pixels",
"type": "FLOAT"
},
{
"description": "id of the region (e.g. cell) that this spot belongs to",
"mode": "OPTIONAL",
"name": "region_id",
"type": "INT"
},
{
"description": "y-coordinate of the region this spot falls inside in microns",
"mode": "OPTIONAL",
"name": "y_region_microns",
"type": "FLOAT"
},
{
"description": "x-coordinate of the region this spot falls inside in microns",
"mode": "OPTIONAL",
"name": "x_region_microns",
"type": "FLOAT"
},
{
"description": "z-coordinate of the region this spot falls inside in microns",
"mode": "OPTIONAL",
"name": "z_region_microns",
"type": "FLOAT"
},
{
"description": "y-coordinate of the region this spot falls inside in pixels",
"mode": "OPTIONAL",
"name": "y_region_pixels",
"type": "FLOAT"
},
{
"description": "x-coordinate of the region this spot falls inside in pixels",
"mode": "OPTIONAL",
"name": "x_region_pixels",
"type": "FLOAT"
},
{
"description": "z-coordinate of the region this spot falls inside in pixels",
"mode": "OPTIONAL",
"name": "z_region_pixels",
"type": "FLOAT"
},
{
"description": "quality of this spot",
"mode": "OPTIONAL",
"name": "quality",
"type": "FLOAT"
},
{
"description": "radius of the spot",
"mode": "OPTIONAL",
"name": "radius",
"type": "FLOAT"
},
{
"description": "field of view that this spot is associated with",
"mode": "OPTIONAL",
"name": "fov",
"type": "INT"
}
]
The axes of this object can optionally be named:
[
{
"description": "name of first spots axis, which contains an integer index",
"mode": "OPTIONAL",
"name": "spot_index",
"type": "STRING"
},
{
"description": "name of second spots axis, listing spot characteristics",
"mode": "OPTIONAL",
"name": "spot_characteristics",
"type": "STRING"
}
]
Regions¶
Regions stores a label image. Each pixel belonging to a cell is encoded using the same integer value. Each sequential object is labeled with the next smallest integer. Such an image allows for each intersection of spots and cells to create a count matrix, but such an image can also be overlaid on image data to verify that cells were properly segmented.
The axes of this object can optionally be named:
[
{
"description": "name of regions y-axis",
"mode": "OPTIONAL",
"name": "y_region",
"type": "STRING"
},
{
"description": "name of regions x-axis",
"mode": "OPTIONAL",
"name": "x_region",
"type": "STRING"
},
{
"description": "name of regions z-axis",
"mode": "OPTIONAL",
"name": "z_region",
"type": "STRING"
},
{
"description": "regions pixel size y",
"mode": "OPTIONAL",
"name": "region_pixel_size_y",
"type": "FLOAT"
},
{
"description": "regions pixel size x",
"mode": "OPTIONAL",
"name": "region_pixel_size_x",
"type": "FLOAT"
},
{
"description": "regions pixel size z",
"mode": "OPTIONAL",
"name": "region_pixel_size_z",
"type": "FLOAT"
}
]
Matrix¶
The matrix file is a traditional region x feature expression matrix. Its values can contain either count data (e.g. spots) or continuous data (e.g. protein intensities). Regions can represent cells, anatomical areas, or stereotyped super-cellular areas, like those measured by slide-seq or spatial transcriptomics. features can be protein or rna abundances, or counts of other anatomical structures aggregated over regions.
The matrix stores metadata for each region that describe characteristics of the region:
[
{
"description": "unique identifier for the region",
"mode": "REQUIRED",
"name": "region_id",
"type": "INT"
},
{
"description": "y-coordinate of the center of the region in microns",
"mode": "REQUIRED",
"name": "y_region_microns",
"type": "FLOAT"
},
{
"description": "x-coordinate of the center of the region, in microns",
"mode": "REQUIRED",
"name": "x_region_microns",
"type": "FLOAT"
},
{
"description": "z-coordinate of the center of the region in microns",
"mode": "OPTIONAL",
"name": "z_region_microns",
"type": "FLOAT"
},
{
"description": "y-coordinate of the center of the region in pixels",
"mode": "OPTIONAL",
"name": "y_region_pixels",
"type": "FLOAT"
},
{
"description": "x-coordinate of the center of the region, in pixels",
"mode": "OPTIONAL",
"name": "x_region_pixels",
"type": "FLOAT"
},
{
"description": "z-coordinate of the center of the region in pixels",
"mode": "OPTIONAL",
"name": "z_region_pixels",
"type": "FLOAT"
},
{
"description": "physical annotation for the region, e.g. 'brain white matter'",
"mode": "OPTIONAL",
"name": "physical_annotation",
"type": "STRING"
},
{
"description": "cell type annotation for the region",
"mode": "OPTIONAL",
"name": "type_annotation",
"type": "STRING"
},
{
"description": "group id for this cell, e.g. cluster id",
"mode": "OPTIONAL",
"name": "group_id",
"type": "INT"
},
{
"description": "field of view that this region was identified in",
"mode": "OPTIONAL",
"name": "fov",
"type": "INT"
},
{
"description": "area of the region in pixels",
"mode": "OPTIONAL",
"name": "area_pixels",
"type": "FLOAT"
},
{
"description": "area of the region in square micrometers",
"mode": "OPTIONAL",
"name": "area_sq_microns",
"type": "FLOAT"
}
]
The matrix also stores metadata that describe the features:
[
{
"description": "name of the feature (e.g. gene or protein name)",
"mode": "REQUIRED",
"name": "gene_name",
"type": "STRING"
}
]
The axes of the matrix can optionally be named:
[
{
"description": "name of first matrix axis, describing regions",
"mode": "OPTIONAL",
"name": "regions",
"type": "STRING"
},
{
"description": "name of second matrix axis, describing features",
"mode": "OPTIONAL",
"name": "features",
"type": "STRING"
}
]
Working with Data¶
The starspace library defines a very simple set of objects to read and manipulate the Spots
,
Matrix
, and Regions
objects. Each object subclasses an xarray.Dataset
or
xarray.DataArray
object, meaning that they can be used the same way one would use an xarray
object. For those more familiar with numpy
or pandas
, there are simple ways to drop out of
xarray
, and for those that are more familiar with R and wish to use that language, we show how to serialize
each object into a format that can be loaded into R.
For ease of use, starspace packages some pre-formatted data, which is stored in starspace.data
. These data
are used in the below examples.
Matrix¶
Serialization options¶
starspace defines two special serialization routines for the Matrix
object to improve usability with
downstream genomics packages.
import starspace
matrix = starspace.data.osmFISH.matrix()
# save to loom for reading in R
matrix.to_loom("osmFISH.loom")
# convert to anndata for use with scanpy
adata = matrix.to_anndata()
# optionally, save to disk
adata.save("osmFISH.h5ad")
Because starspace subclasses xarray.DataArray
, it can also take advantage of any of the
xarray serialization routines, for example:
matrix.to_netcdf("osmFISH.nc")
Extracting column or row metadata¶
Turn row or column metadata into a tidy pandas.Dataframe
:
import starspace
matrix = starspace.data.osmFISH.matrix()
# pandas dataframe
col_metadata = matrix.column_metadata()
# pandas dataframe
row_metadata = matrix.row_metadata()
To extract cell x gene expression data into a numpy.array
:
import starspace
matrix = starspace.data.osmFISH.matrix()
# numpy array
data = matrix.values
For more information on how to work with xarray
objects, see their documentation
Spots¶
Spots is a simple tidy columnar data file that records the positions and identity of each spot. Because of this
structure, it is simple to turn it into a pandas.DataFrame
:
import starspace
spots = starspace.data.osmFISH.spots()
# pandas dataframe
df = spots.to_dataframe()
From pandas, one an serialize the pandas.Dataframe
a number of ways, including to .csv
:
df.to_csv('osmFISH_spots.csv')
see the Pandas documentation for more information.
Regions¶
Regions is a Dask-serialized label image. We use dask to enable large images, often bigger than would fit in memory, to be easily manipulated. For images that fit in memory, they can be easily converted into numpy arrays for downstream processing:
import starspace
regions = starspace.data.osmFISH.regions()
# numpy array
data = regions.values
Conversion Scripts¶
The following directory contains examples to convert author-published results into the spatial schema. The majority of the scripts are very simple. Each script is named as follows and has at minimum the following contents:
<assay_name>_<first_author>_<year>_<journal>_<short_description>.py
- Link to original manuscript or preprint, if available, else data attribution information
- Checklist of available data, including:
- cell (or region) x gene count matrix
- transcript locations (if appropriate)
- cell locations in polygons or masks
- Instructions to load and convert data into required format, including any information acquired via direct communications with authors.
Note
Click here to download the full example code
Spatially resolved, highly multiplexed RNA profiling in single cells¶
Rongqin Ke, Marco Mignardi, Alexandra Pacureanu, Jessica Svedlund, Johan Botling, Carolina Wählby, Mats Nilsson
This publication can be found at https://science.sciencemag.org/content/348/6233/aaa6090 and the data referenced below can be downloaded from
Checklist: - [x] point locations - [ ] cell locations - [ ] cell x gene expression matrix (derivable)
Load the data¶
import requests
from io import BytesIO
import pandas as pd
import starspace
from starspace.constants import *
response = requests.get(
"https://d24h2xsgaj29mf.cloudfront.net/raw/iss_ke_2013_nat-methods_breast-cancer/all_spots.csv"
)
data = pd.read_csv(BytesIO(response.content))
column_map = {
"gene": SPOTS_REQUIRED_VARIABLES.GENE_NAME.value,
"x": SPOTS_REQUIRED_VARIABLES.X_SPOT.value,
"y": SPOTS_REQUIRED_VARIABLES.Y_SPOT.value,
"qual": SPOTS_OPTIONAL_VARIABLES.QUALITY.value,
"fov": SPOTS_OPTIONAL_VARIABLES.FIELD_OF_VIEW.value,
"gene_code": "gene_code",
"barcode": "barcode",
}
authors = [
"Rongqin Ke", "Marco Mignardi", "Alexandra Pacureanu", "Jessica Svedlund", "Johan Botling",
"Carolina Wählby", "Mats Nilsson"
]
attributes = {
REQUIRED_ATTRIBUTES.ASSAY: ASSAYS.ISS,
REQUIRED_ATTRIBUTES.SAMPLE_TYPE: "Her2+ breast carcinoma",
REQUIRED_ATTRIBUTES.AUTHORS: authors,
REQUIRED_ATTRIBUTES.YEAR: 2013,
REQUIRED_ATTRIBUTES.ORGANISM: "human",
OPTIONAL_ATTRIBUTES.PUBLICATION_NAME: (
"In situ sequencing for RNA analysis in preserved tissue and cells"
),
OPTIONAL_ATTRIBUTES.PUBLICATION_URL: "https://www.nature.com/articles/nmeth.2563",
}
standard_columns = [column_map[c] for c in data.columns]
data.columns = standard_columns
spots = starspace.Spots.from_spot_data(data, attributes)
# s3_url = "s3://starfish.data.output-warehouse/iss_ke_2013_nat-methods_breast-cancer"
local_url = "iss_ke_2013_nat-methods_breast-cancer/"
spots.save_zarr(local_url)
Total running time of the script: ( 0 minutes 0.000 seconds)
Note
Click here to download the full example code
High-throughput single-cell gene-expression profiling with multiplexed error-robust fluorescence in situ hybridization¶
Jeffrey R. Moffitt, Junjie Hao, Guiping Wang, Kok Hao Chen, Hazen P. Babcock, Xiaowei Zhuang
This publication can be found at https://www.pnas.org/content/113/39/11046 and the data referenced below can be downloaded from s3://starfish.data.published/MERFISH/20181005/starfish_results/published_MERFISH_decoded_results.csv
Checklist: - [x] point locations - [ ] cell locations - [ ] cell x gene expression matrix (derivable)
This file converts point locations constructed with a starfish pipeline that has 99.7% correspondence to Jeff Moffit’s original matlab processing of these same data. Minor deviations are the result of numerical differences in deconvolution algorithms between matlab and python.
Load the data¶
from io import BytesIO
import numpy as np
import pandas as pd
import requests
import starspace
from starspace.constants import *
response = requests.get(
"https://d24h2xsgaj29mf.cloudfront.net/raw/merfish_moffit_2016_pnas_u2-os/"
"published_MERFISH_decoded_results.csv"
)
data = pd.read_csv(BytesIO(response.content), index_col=0)
# convert distance to quality; we'll map the name to quality below
data['distance'] = 1 - data['distance']
# drop the passes_thresholds column, this data has been conditioned on that previously
assert np.all(data['passes_thresholds'])
data = data.drop('passes_thresholds', axis=1)
# drop z_spot, it's not informative
assert np.allclose(data['zc'], 0.0005)
data = data.drop('zc', axis=1)
column_map = {
'radius': SPOTS_OPTIONAL_VARIABLES.RADIUS.value,
'target': SPOTS_REQUIRED_VARIABLES.GENE_NAME.value,
'distance': SPOTS_OPTIONAL_VARIABLES.QUALITY.value,
'xc': SPOTS_REQUIRED_VARIABLES.X_SPOT.value,
'yc': SPOTS_REQUIRED_VARIABLES.Y_SPOT.value
}
columns = [column_map[c] for c in data.columns]
data.columns = columns
attributes = {
REQUIRED_ATTRIBUTES.ORGANISM: "human",
REQUIRED_ATTRIBUTES.ASSAY: ASSAYS.MERFISH.value,
REQUIRED_ATTRIBUTES.YEAR: 2016,
REQUIRED_ATTRIBUTES.AUTHORS: [
"Jeffrey R. Moffitt", "Junjie Hao", "Guiping Wang", "Kok Hao Chen", "Hazen P. Babcock",
"Xiaowei Zhuang"
],
REQUIRED_ATTRIBUTES.SAMPLE_TYPE: "osteosarcoma (bone, epithelial) cell line",
OPTIONAL_ATTRIBUTES.PUBLICATION_NAME: (
"High-throughput single-cell gene-expression profiling with multiplexed error-robust "
"fluorescence in situ hybridization"
),
OPTIONAL_ATTRIBUTES.PUBLICATION_URL: "https://www.pnas.org/content/113/39/11046"
}
spots = starspace.Spots.from_spot_data(data, attributes)
s3_url = "s3://starfish.data.output-warehouse/merfish-moffit-2016-pnas-u2os/"
url = "merfish-moffit-2016-pnas-u2os/"
spots.save_zarr(url)
Total running time of the script: ( 0 minutes 0.000 seconds)
Note
Click here to download the full example code
Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging¶
Yury Goltsev, Nikolay Samusik, Julia Kennedy-Darling, Salil Bhate, Matthew Hale, Gustavo Vazquez, Sarah Black, Garry P. Nolan
The data can be downloaded here: http://welikesharingdata.blob.core.windows.net/forshare/index.html and the paper is available here: https://doi.org/10.1016/j.cell.2018.07.010
from collections import defaultdict
from io import BytesIO
import pandas as pd
import requests
import starspace
from starspace.constants import *
response = requests.get(
"https://d24h2xsgaj29mf.cloudfront.net/raw/codex_goltsev_2018_cell_spleen/"
"Suppl.Table2.CODEX_paper_MRLdatasetexpression.csv"
)
data = pd.read_csv(BytesIO(response.content))
attributes = {
REQUIRED_ATTRIBUTES.ASSAY: ASSAYS.CODEX,
REQUIRED_ATTRIBUTES.SAMPLE_TYPE: "spleen",
REQUIRED_ATTRIBUTES.AUTHORS: [
"Yury Goltsev", "Nikolay Samusik", "Julia Kennedy-Darling", "Salil Bhate", "Matthew Hale",
"Gustavo Vazquez", "Sarah Black", "Garry P. Nolan"
],
REQUIRED_ATTRIBUTES.YEAR: 2018,
REQUIRED_ATTRIBUTES.ORGANISM: "mouse",
OPTIONAL_ATTRIBUTES.PUBLICATION_URL: "https://doi.org/10.1016/j.cell.2018.07.010",
OPTIONAL_ATTRIBUTES.PUBLICATION_NAME:
"Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging",
}
dims = tuple(MATRIX_AXES)
x = data["X.X"]
y = data["Y.Y"]
z = data["Z.Z"]
group = data["niche cluster ID"]
metadata_col = data["sample_Xtile_Ytile"]
type_annotation = data["Imaging phenotype cluster ID"]
data = data.drop(
["X.X", "Y.Y", "Z.Z", "sample_Xtile_Ytile", "niche cluster ID", "Imaging phenotype cluster ID"],
axis=1
)
additional_metadata = defaultdict(list)
for i, v in enumerate(metadata_col):
sample_type, fov_x, fov_y = v.split('_')
additional_metadata["sample_type"].append(sample_type)
additional_metadata["fov_x"].append(int(fov_x.strip("X")))
additional_metadata["fov_y"].append(int(fov_y.strip("Y")))
additional_metadata = pd.DataFrame(additional_metadata)
coords = {
MATRIX_REQUIRED_REGIONS.REGION_ID: (MATRIX_AXES.REGIONS, data.index),
MATRIX_REQUIRED_FEATURES.GENE_NAME: (MATRIX_AXES.FEATURES, data.columns),
MATRIX_REQUIRED_REGIONS.X_REGION: (MATRIX_AXES.REGIONS, x),
MATRIX_REQUIRED_REGIONS.Y_REGION: (MATRIX_AXES.REGIONS, y),
MATRIX_OPTIONAL_REGIONS.Z_REGION: (MATRIX_AXES.REGIONS, z),
MATRIX_OPTIONAL_REGIONS.GROUP_ID: (MATRIX_AXES.REGIONS, group),
MATRIX_OPTIONAL_REGIONS.TYPE_ANNOTATION: (MATRIX_AXES.REGIONS, type_annotation),
"fov_x": (MATRIX_AXES.REGIONS, additional_metadata["fov_x"]),
"fov_y": (MATRIX_AXES.REGIONS, additional_metadata["fov_y"]),
"sample_type": (MATRIX_AXES.REGIONS, additional_metadata["sample_type"])
}
matrix = starspace.Matrix.from_expression_data(data.values, coords, dims, attributes)
url = ("codex_goltsev_2018_cell_spleen/")
matrix.save_zarr(url=url)
Total running time of the script: ( 0 minutes 0.000 seconds)
Note
Click here to download the full example code
Visualization and analysis of gene expression in tissue sections by spatial transcriptomics¶
Patrik L. Ståhl, Fredrik Salmén, Sanja Vickovic, Anna Lundmark, José Fernández Navarro, Jens Magnusson, Stefania Giacomello, Michaela Asp, Jakub O. Westholm4, Mikael Huss4, Annelie Mollbrink2, Sten Linnarsson, Simone Codeluppi, Åke Borg, Fredrik Pontén, Paul Igor Costea, Pelin Sahlén, Jan Mulder, Olaf Bergmann, Joakim Lundeberg, Jonas Frisén
this publication can be found at https://science.sciencemag.org/content/353/6294/78.long and the data referenced below can be downloaded from https://www.spatialresearch.org/resources-published-datasets/doi-10-1126science-aaf2403/
checklist: - [x] point locations - [x] cell locations (NA) - [x] cell x gene expression matrix (NA)
load the data¶
from io import BytesIO
import dask.array as da
import numpy as np
import pandas as pd
import requests
from skimage.transform import matrix_transform
import starspace
from starspace.constants import *
response = requests.get(
"https://d24h2xsgaj29mf.cloudfront.net/raw/spatial_transcriptomics_stahl_2016/"
"Rep1_MOB_count_matrix-1.tsv"
)
data = pd.read_csv(BytesIO(response.content), sep='\t', index_col=0)
attributes = {
REQUIRED_ATTRIBUTES.AUTHORS: (
"Patrik L. Ståhl", "Fredrik Salmén", "Sanja Vickovic", "Anna Lundmark",
"José Fernández Navarro", "Jens Magnusson", "Stefania Giacomello", "Michaela Asp",
"Jakub O. Westholm", "Mikael Huss", "Annelie Mollbrink", "Sten Linnarsson",
"Simone Codeluppi", "Åke Borg", "Fredrik Pontén", "Paul Igor Costea", "Pelin Sahlén",
"Jan Mulder", "Olaf Bergmann", "Joakim Lundeberg", "Jonas Frisén"
),
REQUIRED_ATTRIBUTES.YEAR: 2016,
REQUIRED_ATTRIBUTES.SAMPLE_TYPE: "Olfactory Bulb",
REQUIRED_ATTRIBUTES.ORGANISM: "mouse",
REQUIRED_ATTRIBUTES.ASSAY: ASSAYS.SPATIAL_TRANSCRIPTOMICS.value,
OPTIONAL_ATTRIBUTES.PUBLICATION_NAME: (
"Visualization and analysis of gene expression in tissue sections by spatial "
"transcriptomics"
),
OPTIONAL_ATTRIBUTES.PUBLICATION_URL: "https://science.sciencemag.org/content/353/6294/78.long"
}
# convert the spots data
# cells maybe need a radius?
# transform coordinates
response = requests.get(
"https://d24h2xsgaj29mf.cloudfront.net/raw/spatial_transcriptomics_stahl_2016/"
"Rep1_MOB_transformation.txt"
)
transform = np.array([float(v) for v in response.content.decode().strip().split()]).reshape(3, 3).T
x, y = zip(*[map(float, v.split('x')) for v in data.index])
xy = np.hstack([
np.array(x)[:, None],
np.array(y)[:, None],
])
transformed = matrix_transform(xy, transform)
dims = (MATRIX_AXES.REGIONS.value, MATRIX_AXES.FEATURES.value)
coords = {
MATRIX_REQUIRED_REGIONS.REGION_ID: (MATRIX_AXES.REGIONS, np.arange(data.shape[0])),
MATRIX_REQUIRED_REGIONS.X_REGION: (MATRIX_AXES.REGIONS, transformed[:, 0]),
MATRIX_REQUIRED_REGIONS.Y_REGION: (MATRIX_AXES.REGIONS, transformed[:, 1]),
MATRIX_REQUIRED_FEATURES.GENE_NAME: (MATRIX_AXES.FEATURES, data.columns)
}
data = da.from_array(data.values, chunks=MATRIX_CHUNK_SIZE)
matrix = starspace.Matrix.from_expression_data(
data=data, coords=coords, dims=dims, name="matrix", attrs=attributes
)
url = "spatial-transcriptomics-stahl-2016-science-olfactory-bulb"
matrix.save_zarr(url=url)
Total running time of the script: ( 0 minutes 0.000 seconds)
Note
Click here to download the full example code
Visualization and analysis of gene expression in tissue sections by spatial transcriptomics¶
Patrik L. Ståhl, Fredrik Salmén, Sanja Vickovic, Anna Lundmark, José Fernández Navarro, Jens Magnusson, Stefania Giacomello, Michaela Asp, Jakub O. Westholm4, Mikael Huss4, Annelie Mollbrink2, Sten Linnarsson, Simone Codeluppi, Åke Borg, Fredrik Pontén, Paul Igor Costea, Pelin Sahlén, Jan Mulder, Olaf Bergmann, Joakim Lundeberg, Jonas Frisén
this publication can be found at https://science.sciencemag.org/content/353/6294/78.long and the data referenced below can be downloaded from https://www.spatialresearch.org/resources-published-datasets/doi-10-1126science-aaf2403/
checklist: - [x] point locations - [x] cell locations (NA) - [x] cell x gene expression matrix (NA)
load the data¶
from io import BytesIO
import dask.array as da
import numpy as np
import pandas as pd
import requests
from skimage.transform import matrix_transform
import starspace
from starspace.constants import *
response = requests.get(
"https://d24h2xsgaj29mf.cloudfront.net/raw/spatial_transcriptomics_stahl_2016/"
"Layer1_BC_count_matrix-1.tsv"
)
data = pd.read_csv(BytesIO(response.content), sep='\t', index_col=0)
attributes = {
REQUIRED_ATTRIBUTES.AUTHORS: (
"Patrik L. Ståhl", "Fredrik Salmén", "Sanja Vickovic", "Anna Lundmark",
"José Fernández Navarro", "Jens Magnusson", "Stefania Giacomello", "Michaela Asp",
"Jakub O. Westholm", "Mikael Huss", "Annelie Mollbrink", "Sten Linnarsson",
"Simone Codeluppi", "Åke Borg", "Fredrik Pontén", "Paul Igor Costea", "Pelin Sahlén",
"Jan Mulder", "Olaf Bergmann", "Joakim Lundeberg", "Jonas Frisén"
),
REQUIRED_ATTRIBUTES.YEAR: 2016,
REQUIRED_ATTRIBUTES.SAMPLE_TYPE: "prostate cancer",
REQUIRED_ATTRIBUTES.ORGANISM: "human",
REQUIRED_ATTRIBUTES.ASSAY: ASSAYS.SPATIAL_TRANSCRIPTOMICS.value,
OPTIONAL_ATTRIBUTES.PUBLICATION_NAME: (
"Visualization and analysis of gene expression in tissue sections by spatial "
"transcriptomics"
),
OPTIONAL_ATTRIBUTES.PUBLICATION_URL: "https://science.sciencemag.org/content/353/6294/78.long"
}
# convert the spots data
# cells maybe need a radius?
# transform coordinates
response = requests.get(
"https://d24h2xsgaj29mf.cloudfront.net/raw/spatial_transcriptomics_stahl_2016/"
"Layer1_BC_transformation.txt"
)
transform = np.array([float(v) for v in response.content.decode().strip().split()]).reshape(3, 3).T
x, y = zip(*[map(float, v.split('x')) for v in data.index])
xy = np.hstack([
np.array(x)[:, None],
np.array(y)[:, None],
])
transformed = matrix_transform(xy, transform)
dims = (MATRIX_AXES.REGIONS.value, MATRIX_AXES.FEATURES.value)
coords = {
MATRIX_REQUIRED_REGIONS.REGION_ID: (MATRIX_AXES.REGIONS, np.arange(data.shape[0])),
MATRIX_REQUIRED_REGIONS.X_REGION: (MATRIX_AXES.REGIONS, transformed[:, 0]),
MATRIX_REQUIRED_REGIONS.Y_REGION: (MATRIX_AXES.REGIONS, transformed[:, 1]),
MATRIX_REQUIRED_FEATURES.GENE_NAME: (MATRIX_AXES.FEATURES, data.columns)
}
data = da.from_array(data.values, chunks=MATRIX_CHUNK_SIZE)
matrix = starspace.Matrix.from_expression_data(
data=data, coords=coords, dims=dims, name="matrix", attrs=attributes
)
url = "spatial-transcriptomics-stahl-2016-science-prostate-cancer"
matrix.save_zarr(url=url)
Total running time of the script: ( 0 minutes 0.000 seconds)
Note
Click here to download the full example code
Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region¶
Jeffrey R. Moffitt, Dhananjay Bambah-Mukku, Stephen W. Eichhorn, Eric Vaughn, Karthik Shekhar, Julio D. Perez, Nimrod D. Rubinstein, Junjie Hao, Aviv Regev, Catherine Dulac, Xiaowei Zhuang
This publication can be found at https://science.sciencemag.org/content/362/6416/eaau5324 and the data referenced below can be downloaded from https://datadryad.org/handle/10255/dryad.192644
Checklist: - [ ] point locations - [ ] cell locations - [x] cell x gene expression matrix
Load the data¶
import os
import requests
from io import BytesIO
import dask.array as da
import numpy as np
import pandas as pd
import starspace
from starspace.constants import *
response = requests.get(
"https://d24h2xsgaj29mf.cloudfront.net/raw/merfish_moffit_2018_science_hypothalamic-preoptic/"
"Moffitt_and_Bambah-Mukku_et_al_merfish_all_cells.csv"
)
data = pd.read_csv(BytesIO(response.content), header=0)
name = "merfish moffit 2018 science hypothalamic preoptic"
This data file is a cell x gene expression matrix that contains additional metadata as columns of the matrix. Extract those extra columns and clean up the data file.
annotation = np.array(data["Cell_class"], dtype="U")
group_id = np.array(data["Neuron_cluster_ID"], dtype="U")
x = data["Centroid_X"]
y = data["Centroid_Y"]
region_id = np.array(data["Cell_ID"], dtype="U")
unstructured_field_names = ["Animal_ID", "Animal_sex", "Behavior", "Bregma"]
unstructured_metadata = data[unstructured_field_names]
non_expression_fields = (
unstructured_field_names
+ ["Cell_class", "Neuron_cluster_ID", "Centroid_X", "Centroid_Y", "Cell_ID"]
)
expression_data = data.drop(non_expression_fields, axis=1)
gene_name = [v.lower() for v in expression_data.columns]
Write down some important metadata from the publication.
attrs = {
REQUIRED_ATTRIBUTES.ASSAY: ASSAYS.MERFISH,
REQUIRED_ATTRIBUTES.SAMPLE_TYPE: "hypothalamic pre-optic nucleus",
REQUIRED_ATTRIBUTES.AUTHORS: [
"Jeffrey R. Moffitt", "Dhananjay Bambah-Mukku", "Stephen W. Eichhorn", "Eric Vaughn",
"Karthik Shekhar", "Julio D. Perez", "Nimrod D. Rubinstein", "Junjie Hao", "Aviv Regev",
"Catherine Dulac", "Xiaowei Zhuang"
],
REQUIRED_ATTRIBUTES.YEAR: 2018,
REQUIRED_ATTRIBUTES.ORGANISM: "mouse",
OPTIONAL_ATTRIBUTES.PUBLICATION_NAME: (
"Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic "
"region"
),
OPTIONAL_ATTRIBUTES.PUBLICATION_URL: "https://science.sciencemag.org/content/362/6416/eaau5324",
}
Create the chunked dataset.
chunk_data = da.from_array(expression_data.values, chunks=MATRIX_CHUNK_SIZE)
Wrap the dask array in an xarray, adding the metadata fields as “coordinates”.
# convert columns with object dtype into fixed-length strings
coords = {
MATRIX_REQUIRED_FEATURES.GENE_NAME: (MATRIX_AXES.FEATURES.value, gene_name),
MATRIX_REQUIRED_REGIONS.X_REGION: (MATRIX_AXES.REGIONS.value, x),
MATRIX_REQUIRED_REGIONS.Y_REGION: (MATRIX_AXES.REGIONS.value, y),
MATRIX_REQUIRED_REGIONS.REGION_ID: (MATRIX_AXES.REGIONS.value, region_id),
MATRIX_OPTIONAL_REGIONS.GROUP_ID: (MATRIX_AXES.REGIONS.value, group_id),
MATRIX_OPTIONAL_REGIONS.TYPE_ANNOTATION: (MATRIX_AXES.REGIONS.value, annotation)
}
dims = (MATRIX_AXES.REGIONS.value, MATRIX_AXES.FEATURES.value)
matrix = starspace.Matrix.from_expression_data(
data=chunk_data, coords=coords, dims=dims, name=name, attrs=attrs
)
s3_url = "s3://starfish.data.output-warehouse/merfish-moffit-2018-science-hypothalamic-preoptic"
url = "merfish-moffit-2018-science-hypothalamic-preoptic"
matrix.save_zarr(url=url)
Total running time of the script: ( 0 minutes 0.000 seconds)
Note
Click here to download the full example code
Spatially resolved, highly multiplexed RNA profiling in single cells¶
Kok Hao Chen, Alistair N. Boettiger, Jeffrey R. Moffitt, Siyuan Wang, Xiaowei Zhuang
This publication can be found at https://science.sciencemag.org/content/348/6233/aaa6090 and the data referenced below can be downloaded from
Checklist: - [x] point locations - [ ] cell locations - [x] cell x gene expression matrix (derivable)
Load the data¶
import requests
from io import BytesIO
import pandas as pd
import starspace
from starspace.constants import *
response = requests.get(
"https://d24h2xsgaj29mf.cloudfront.net/raw/merfish_chen_2015_science_imr90/"
"140genesData.xlsx"
)
data = pd.read_excel(BytesIO(response.content))
name = "merfish chen 2015 science imr90"
This data file is a cell x gene expression matrix that contains additional metadata as columns of the matrix. Extract those extra columns and clean up the data file.
# map column names to schema
column_map = {
"RNACentroidX": SPOTS_REQUIRED_VARIABLES.X_SPOT,
"RNACentroidY": SPOTS_REQUIRED_VARIABLES.Y_SPOT,
"cellID": "per_slice_cell_id", # this is not unique experiment-wide
"CellPositionX": SPOTS_OPTIONAL_VARIABLES.X_REGION,
"CellPositionY": SPOTS_OPTIONAL_VARIABLES.Y_REGION,
"geneName": SPOTS_REQUIRED_VARIABLES.GENE_NAME,
"experiment": "experiment",
"library": "library",
"intCodeword": "int_codeword",
"isCorrectedMatch": "is_corrected_match",
"isExactMatch": "is_exact_match"
}
columns = [column_map[c] for c in data.columns]
data.columns = columns
# demonstrate that cellID is not unique:
group_columns = (
"per_slice_cell_id",
SPOTS_OPTIONAL_VARIABLES.Y_REGION,
SPOTS_OPTIONAL_VARIABLES.X_REGION,
)
# group by the columns, use size to run a no-op aggregation routine, then drop the size column
# (labeled zero)
not_unique = data.groupby(group_columns).size().reset_index().drop(0, axis=1)
assert_cols = ["per_slice_cell_id"]
assert not_unique[assert_cols].drop_duplicates().shape != not_unique[assert_cols].shape
# fix region ids so that they uniquely identify cells across the experiment.
group_columns = (
"experiment", "library", "per_slice_cell_id",
SPOTS_OPTIONAL_VARIABLES.Y_REGION, SPOTS_OPTIONAL_VARIABLES.X_REGION
)
region_ids_map = data.groupby(group_columns).size().reset_index().drop(0, axis=1)
assert_cols = ["per_slice_cell_id", "library", "experiment"]
assert region_ids_map[assert_cols].drop_duplicates().shape == region_ids_map[assert_cols].shape
# map each region to a unique identifier and add it to the data frame
region_ids_map = region_ids_map.drop(
[SPOTS_OPTIONAL_VARIABLES.Y_REGION, SPOTS_OPTIONAL_VARIABLES.X_REGION], axis=1
)
region_ids_map = region_ids_map.reset_index().set_index(assert_cols)
region_ids = region_ids_map.loc[pd.MultiIndex.from_frame(data[assert_cols])]
data[SPOTS_OPTIONAL_VARIABLES.REGION_ID] = region_ids.values
Write down some important metadata from the publication.
attrs = {
REQUIRED_ATTRIBUTES.ASSAY: ASSAYS.MERFISH,
REQUIRED_ATTRIBUTES.SAMPLE_TYPE: "IMR90 lung fibroblast cell line",
REQUIRED_ATTRIBUTES.AUTHORS: (
"Kok Hao Chen", "Alistair N. Boettiger", "Jeffrey R. Moffitt", "Siyuan Wang",
"Xiaowei Zhuang"
),
REQUIRED_ATTRIBUTES.YEAR: 2015,
REQUIRED_ATTRIBUTES.ORGANISM: "human",
OPTIONAL_ATTRIBUTES.NOTES: (
"cellID field from author data renamed per_slice_cell_id to reflect stored data"
)
}
convert the dataframe into an xarray dataset
spots = starspace.Spots.from_spot_data(data, attrs)
Write the data to zarr on s3
s3_url = "s3://starfish.data.output-warehouse/merfish-chen-2015-science-imr90/"
url = "merfish-chen-2015-science-imr90/"
spots.save_zarr(url)
Convert the xarray dataset to a matrix.
matrix = spots.to_spatial_matrix()
matrix.save_zarr(url)
Total running time of the script: ( 0 minutes 0.000 seconds)
Note
Click here to download the full example code
Modeling Spatial Correlation of Transcripts with Application to Developing Pancreas¶
Ruishan Liu, Marco Mignardi, Robert Jones, Martin Enge, Seung K. Kim, Stephen R. Quake & James Zou
This publication can be found at https://www.nature.com/articles/s41598-019-41951-2 and the data can be downloaded from https://cirm.ucsc.edu/projects
Checklist: - [x] point locations - [~] cell locations (centroids only) - [x] cell x gene expression matrix (derivable)
import requests
from pathlib import Path
from io import BytesIO
import pandas as pd
import starspace
from starspace.constants import *
response = requests.get(
"https://d24h2xsgaj29mf.cloudfront.net/raw/iss_liu_2019_nat-sci-reports_pancreas-dev/"
"Nuc_TOT_2p2.txt"
)
region_data = pd.read_csv(BytesIO(response.content))
response = requests.get(
"https://d24h2xsgaj29mf.cloudfront.net/raw/iss_liu_2019_nat-sci-reports_pancreas-dev/"
"RNA_TOT_2p2.txt"
)
rna_data = pd.read_csv(BytesIO(response.content))
response = requests.get(
"https://d24h2xsgaj29mf.cloudfront.net/raw/iss_liu_2019_nat-sci-reports_pancreas-dev/"
"Conversion_Pool2.txt"
)
gene_map = pd.read_csv(BytesIO(response.content))
Build the spot table
# some of these spots don't map to real genes. Interesting. Definitely retain "Barcode_Num" and
# "Barcode_Letter"
gene_map = gene_map.set_index("Barcode_Num")
gene_info = gene_map.loc[rna_data.Seq_num, :]
gene_info.index = rna_data.index
rna_data = pd.concat([rna_data, gene_info], axis=1)
# "ObjectNumber" is the join key for gene ids, but we've joined all the tables, so we can drop it.
rna_data = rna_data.drop("ObjectNumber", axis=1)
# merge in cell centroids
region_data = region_data.set_index("ObjectNumber")
region_data = region_data.drop("ImageNumber", axis=1) # duplicated in rna_data
region_info = region_data.loc[rna_data["Parent_Cells"], :]
region_info.index = rna_data.index
rna_data = pd.concat([rna_data, region_info], axis=1)
notes = list()
notes.append("'seq_num' contains channel information for the in-situ sequencing code of each gene")
notes.append("'barcode_letter' contains the nucleotides read out using ISS")
column_map = {
"ImageNumber": SPOTS_OPTIONAL_VARIABLES.FIELD_OF_VIEW,
"Blob_X": SPOTS_REQUIRED_VARIABLES.X_SPOT,
"Blob_Y": SPOTS_REQUIRED_VARIABLES.Y_SPOT,
"Parent_Cells": SPOTS_OPTIONAL_VARIABLES.REGION_ID,
"Location_Center_X": SPOTS_OPTIONAL_VARIABLES.X_REGION,
"Location_Center_Y": SPOTS_OPTIONAL_VARIABLES.Y_REGION,
"Gene_Name": SPOTS_REQUIRED_VARIABLES.GENE_NAME,
"Seq_qual": SPOTS_OPTIONAL_VARIABLES.QUALITY,
"Seq_num": "seq_num",
"Barcode_Letter": "barcode_letter",
}
columns = [column_map[c] for c in rna_data.columns]
rna_data.columns = columns
attributes = {
REQUIRED_ATTRIBUTES.ASSAY: ASSAYS.ISS.value,
REQUIRED_ATTRIBUTES.SAMPLE_TYPE: "fetal pancreas",
REQUIRED_ATTRIBUTES.AUTHORS: [
"Ruishan Liu", "Marco Mignardi", "Robert Jones", "Martin Enge", "Seung K. Kim", "Stephen R. Quake" "James Zou"
],
REQUIRED_ATTRIBUTES.YEAR: 2019,
REQUIRED_ATTRIBUTES.ORGANISM: "human",
OPTIONAL_ATTRIBUTES.PUBLICATION_NAME: (
"Modeling Spatial Correlation of Transcripts with Application to Developing Pancreas"
),
OPTIONAL_ATTRIBUTES.PUBLICATION_URL: "https://www.nature.com/articles/s41598-019-41951-2"
}
spots = starspace.Spots.from_spot_data(rna_data, attributes)
# s3_url = "s3://starfish.data.output-warehouse/iss_liu_2019_nat-sci-reports_pancreas-dev/"
url = "iss_liu_2019_nat-sci-reports_pancreas-dev/"
spots.save_zarr(url=url)
we have the needed information to pivot into a matrix, too
matrix = spots.to_spatial_matrix()
matrix.save_zarr(url=url)
Total running time of the script: ( 0 minutes 0.000 seconds)
Note
Click here to download the full example code
spatial organization of the somatosensory cortex revealed by cyclic smfish¶
simone codeluppi, lars e. borm, amit zeisel, gioele la manno, josina a. van lunteren, camilla i. svensson, sten linnarsson
this publication can be found at https://www.nature.com/articles/s41592-018-0175-z and the data referenced below can be downloaded from http://linnarssonlab.org/osmfish/
checklist: - [x] point locations - [x] cell locations - [x] cell x gene expression matrix
load the data¶
import pickle
import re
import requests
from itertools import repeat
from io import BytesIO
import tempfile
import dask.array as da
import h5py
import loompy
import numpy as np
import pandas as pd
import starspace
from starspace.constants import *
response = requests.get(
"https://d24h2xsgaj29mf.cloudfront.net/raw/"
"osmfish_codeluppi_2018_nat-methods_somatosensory-cortex/"
"mRNA_coords_raw_counting.hdf5"
)
spots_data = h5py.File(BytesIO(response.content), "r")
response = requests.get(
"https://d24h2xsgaj29mf.cloudfront.net/raw/"
"osmfish_codeluppi_2018_nat-methods_somatosensory-cortex/"
"polyT_seg.pkl"
)
region_data = pickle.load(BytesIO(response.content))
# load spot info
gene = []
x = []
y = []
imaging_round = []
pattern = r"^(.*?)_Hybridization(\d*?)$"
for k in spots_data.keys():
gene_, round_ = re.match(pattern, k).groups()
gene_data = spots_data[k]
x.extend(gene_data[:, 0])
y.extend(gene_data[:, 1])
gene.extend(repeat(gene_, gene_data.shape[0]))
imaging_round.extend(repeat(int(round_), gene_data.shape[0]))
spots_data.close()
# build the spot information
spot_data = pd.DataFrame({
SPOTS_REQUIRED_VARIABLES.GENE_NAME: gene,
SPOTS_REQUIRED_VARIABLES.X_SPOT: x,
SPOTS_REQUIRED_VARIABLES.Y_SPOT: y,
SPOTS_OPTIONAL_VARIABLES.ROUND: imaging_round,
})
# construct the attributes
attributes = {
REQUIRED_ATTRIBUTES.AUTHORS: (
"Simone Codeluppi", "Lars E. Borm", "Amit Zeisel", "Gioele La Manno",
"Josina A. van Lunteren", "Camilla I. Svensson", "Sten Linnarsson"
),
REQUIRED_ATTRIBUTES.YEAR: 2018,
REQUIRED_ATTRIBUTES.SAMPLE_TYPE: "somatosensory cortex",
REQUIRED_ATTRIBUTES.ORGANISM: "mouse",
REQUIRED_ATTRIBUTES.ASSAY: ASSAYS.OSMFISH.value,
OPTIONAL_ATTRIBUTES.PUBLICATION_NAME: (
"Spatial organization of the somatosensory cortex revealed by cyclic smFISH"
),
OPTIONAL_ATTRIBUTES.PUBLICATION_URL: "https://www.nature.com/articles/s41592-018-0175-z"
}
spots = starspace.Spots.from_spot_data(spot_data, attrs=attributes)
s3_url = (
"s3://starfish.data.output-warehouse/osmfish-codeluppi-2018-nat-methods-somatosensory-cortex/"
)
url = "osmfish-codeluppi-2018-nat-methods-somatosensory-cortex/"
spots.save_zarr(url=url)
load the region information; we’re gonna be lazy and just create a label image. Makes for simple lookups. It’s only 6 gb, and we can put it in dask, so w/e find the extent of the images from the spots and the region data
x_min, x_max = np.percentile(spot_data[SPOTS_REQUIRED_VARIABLES.X_SPOT], [0, 100])
y_min, y_max = np.percentile(spot_data[SPOTS_REQUIRED_VARIABLES.Y_SPOT], [0, 100])
label = np.empty((int(x_max) + 1, int(y_max) + 1), dtype=np.int16)
for region_id, array in region_data.items():
region_id = int(region_id)
x = array[:, 0]
y = array[:, 1]
label[x, y] = region_id
dims = tuple(REGIONS_AXES)
regions = starspace.Regions.from_label_image(label, dims=dims, attrs=attributes)
regions.save_zarr(url=url)
load up the count matrix
response = requests.get(
"https://d24h2xsgaj29mf.cloudfront.net/raw/"
"osmfish_codeluppi_2018_nat-methods_somatosensory-cortex/"
"osmFISH_SScortex_mouse_all_cells.loom"
)
with tempfile.TemporaryDirectory() as tmpdirname:
with open(os.path.join(tmpdirname, "temp.loom"), 'wb') as f:
f.write(response.content)
conn = loompy.connect(os.path.join(tmpdirname, "temp.loom"), mode="r")
row_attrs = dict(conn.row_attrs)
col_attrs = dict(conn.col_attrs)
data = da.from_array(conn[:, :].T, chunks=MATRIX_CHUNK_SIZE)
# region id should be int dtype
col_attrs["CellID"] = col_attrs["CellID"].astype(int)
dims = (MATRIX_AXES.REGIONS.value, MATRIX_AXES.FEATURES.value)
coords = {
MATRIX_REQUIRED_REGIONS.REGION_ID: (MATRIX_AXES.REGIONS, col_attrs["CellID"]),
MATRIX_REQUIRED_REGIONS.X_REGION: (MATRIX_AXES.REGIONS, col_attrs["X"]),
MATRIX_REQUIRED_REGIONS.Y_REGION: (MATRIX_AXES.REGIONS, col_attrs["Y"]),
MATRIX_OPTIONAL_REGIONS.GROUP_ID: (MATRIX_AXES.REGIONS, col_attrs["ClusterID"]),
MATRIX_OPTIONAL_REGIONS.TYPE_ANNOTATION: (MATRIX_AXES.REGIONS, col_attrs["ClusterName"]),
MATRIX_OPTIONAL_REGIONS.PHYS_ANNOTATION: (MATRIX_AXES.REGIONS, col_attrs["Region"]),
MATRIX_OPTIONAL_REGIONS.AREA_PIXELS: (MATRIX_AXES.REGIONS, col_attrs["size_pix"]),
MATRIX_OPTIONAL_REGIONS.AREA_UM2: (MATRIX_AXES.REGIONS, col_attrs["size_um2"]),
"valid": (MATRIX_AXES.REGIONS, col_attrs["Valid"]),
"tsne_1": (MATRIX_AXES.REGIONS, col_attrs["_tSNE_1"]),
"tsne_2": (MATRIX_AXES.REGIONS, col_attrs["_tSNE_2"]),
MATRIX_OPTIONAL_FEATURES.CHANNEL: (MATRIX_AXES.FEATURES, row_attrs["Fluorophore"]),
MATRIX_REQUIRED_FEATURES.GENE_NAME: (MATRIX_AXES.FEATURES, row_attrs["Gene"]),
MATRIX_OPTIONAL_FEATURES.ROUND.value: (MATRIX_AXES.FEATURES, row_attrs["Hybridization"]),
}
matrix = starspace.Matrix.from_expression_data(
data=data, coords=coords, dims=dims, name="matrix", attrs=attributes
)
matrix.save_zarr(url=url)
Total running time of the script: ( 0 minutes 0.000 seconds)
Analysis Examples¶
The following notebooks demonstrate how data formatted in the starspace schema can be read into python, visualized, and analyzed.
Contributing¶
This package is only as useful as the data that exist in the format it specifies. We eagerly encourage contribution of datasets, and would be happy to work to evolve the schema.
To contribute to the schema or library, or to add formatted data, please begin by opening an issue to discuss the
proposed contribution. For data additions, the contribution process is simple: add a conversion example to
/conversion_examples
that reads the data from a publicly accessible repository. We will run the script, upload
the data to the starspace
amazon s3 bucket, and add the contributed data to starspace.data
Optionally, you can add a notebook to /analysis_examples
demonstrating how the data can be navigated and showing
off the cool features of your spatial dataset!