The following illustrates a schema for an address book of people, the companies that they work for, and their addresses, and an example address book of the CEOs of several technology companies.
Address books are represented using three classes to represent people (Person), the companies that they work for (Company), and their addresses (Address).
The many-to-one relationship between companies and their employees (Person.company) and the one-to-one relationships between company and people and their addresses (Company.address, Person.address) are represented by ManyToOne and OneToOne attributes. The required argument related_name defines the name of the reverse relationship from the related class (Company.employees, Address.company, Address.person). This makes each relationship bidirectional.
Address books are encoded into two worksheets for companies and their employees, with groups of columns that represent the address of each company and employee. Encoding companies and their addresses and people and their addresses into the same worksheet, rather than encoding addresses into a separate worksheet, makes address books easier for human to read.
The ObjTables software provides an object-relational mapping system (ORM) that can automatically map between the object-oriented and spreadsheet representations. This makes it easy for authors to create datasets with high-level data structures and export them to human-readable spreadsheets, and this makes it easy for other investigators to parse human-readable spreadsheets into convenient data structures for comparison and re-analysis. Similar to other ORMs, ObjTables makes it easy to define constraints on the value of each attribute (e.g., an attribute that represents molecular weights must be positive), ObjTables makes it easy to implement custom validations of each object and entire datasets, and ObjTables automatically generates APIs for constructing objects and querying their relationships. See below for more information.
A more complete example is available in TSV format from GitHub .
!!ObjTables type='Schema' tableFormat='row' description='Table/model and column/attribute definitions' date='2020-03-10 21:34:50' objTablesVersion='0.0.8' | ||||||
---|---|---|---|---|---|---|
!Name | !Type | !Parent | !Format | !Verbose name | !Verbose name plural | !Description |
Company | Class | row | Company | Companies | ||
name | Attribute | Company | String(primary=True, unique=True) | Name | ||
url | Attribute | Company | Url | URL | ||
address | Attribute | Company | OneToOne('Address', related_name='company') | Address | ||
Person | Class | row | Person | People | ||
name | Attribute | Person | String(primary=True, unique=True) | Name | ||
type | Attribute | Person | Enum(['family', 'friend', 'business']) | Type | ||
company | Attribute | Person | ManyToOne('Company', related_name='employees') | Company | ||
email_address | Attribute | Person | Email address | |||
phone_number | Attribute | Person | String | Phone number | ||
address | Attribute | Person | OneToOne('Address', related_name='person') | Address | ||
Address | Class | multiple_cells | Address | Addresses | ||
street | Attribute | Address | String(primary=True, unique=True) | Street | ||
city | Attribute | Address | String | City | ||
state | Attribute | Address | String | State | ||
zip_code | Attribute | Address | String | Zip code | ||
country | Attribute | Address | String | Country |
A more complete example is available in Python format from GitHub .
import obj_tables class Address(obj_tables.Model): street = obj_tables.StringAttribute( primary=True, unique=True, verbose_name='Street') city = obj_tables.StringAttribute( verbose_name='City') state = obj_tables.StringAttribute( verbose_name='State') zip_code = obj_tables.StringAttribute( verbose_name='Zip code') country = obj_tables.StringAttribute( verbose_name='Country') class Meta(obj_tables.Model.Meta): table_format = \ obj_tables.TableFormat \ .multiple_cells attribute_order = ( 'street', 'city', 'state', 'zip_code', 'country', ) verbose_name = 'Address' verbose_name_plural = 'Addresses' class PersonType(enum.Enum): family = 'family' friend = 'friend' business = 'business' class Person(obj_tables.Model): name = obj_tables.StringAttribute( primary=True, unique=True, verbose_name='Name') type = obj_tables.EnumAttribute( PersonType, verbose_name='Type') company = obj_tables.ManyToOneAttribute( 'Company', related_name='employees', verbose_name='Company') email_address = obj_tables.EmailAttribute( verbose_name='Email address') phone_number = obj_tables.StringAttribute( verbose_name='Phone number') address = obj_tables.OneToOneAttribute( 'Address', related_name='person', verbose_name='Address') class Meta(obj_tables.Model.Meta): table_format = \ obj_tables.TableFormat.row attribute_order = ( 'name', 'type', 'company', 'email_address', 'phone_number', 'address', ) verbose_name = 'Person' verbose_name_plural = 'People' class Company(obj_tables.Model): name = obj_tables.StringAttribute( primary=True, unique=True, verbose_name='Name') url = obj_tables.UrlAttribute( verbose_name='URL') address = obj_tables.OneToOneAttribute( 'Address', related_name='company', verbose_name='Address') class Meta(obj_tables.Model.Meta): table_format = \ obj_tables.TableFormat.row attribute_order = ( 'name', 'url', 'address', ) verbose_name = 'Company' verbose_name_plural = 'Companies'
The following example illustrates an address book of the CEOs of several major technology companies, the companies that they lead, and their addresses. A more complete example is available in merged-TSV format from GitHub .
!!!ObjTables objTablesVersion='0.0.8' date='2020-03-14 13:19:04' | ||
---|---|---|
!!ObjTables type='TableOfContents' | ||
!Table | !Description | !Number of objects |
Schema | Table/model and column/attribute definitions | |
Companies | 4 | |
People | 4 |
!!ObjTables type='Data' tableFormat='row' class='Company' name='Companies' date='2020-03-14 13:19:04' objTablesVersion='0.0.8' | ||||||
---|---|---|---|---|---|---|
!Address | ||||||
!Name | !URL | !Street | !City | !State | !Zip code | !Country |
Apple | https://www.apple.com/ | 10600 N Tantau Ave | Cupertino | CA | 95014 | US |
https://www.facebook.com/ | 1 Hacker Way #15 | Menlo Park | CA | 94025 | US | |
https://www.google.com/ | 1600 Amphitheatre Pkwy | Mountain View | CA | 94043 | US | |
Netflix | https://www.netflix.com/ | 100 Winchester Cir | Los Gatos | CA | 95032 | US |
!!ObjTables type='Data' tableFormat='row' class='Person' name='People' date='2020-03-14 13:19:04' objTablesVersion='0.0.8' | |||||||||
---|---|---|---|---|---|---|---|---|---|
!Address | |||||||||
!Name | !Type | !Company | !Email address | !Phone number | !Street | !City | !State | !Zip code | !Country |
Mark Zuckerberg | family | zuck@fb.com | 650-543-4800 | 1 Hacker Way #15 | Menlo Park | CA | 94025 | US | |
Reed Hastings | business | Netflix | reed.hastings@netflix.com | 408-540-3700 | 100 Winchester Cir | Los Gatos | CA | 95032 | US |
Sundar Pichai | business | sundar@google.com | 650-253-0000 | 1600 Amphitheatre Pkwy | Mountain View | CA | 94043 | US | |
Tim Cook | business | Apple | tcook@apple.com | 408-996-1010 | 10600 N Tantau Ave | Cupertino | CA | 95014 | US |
Below are several examples of ObjTables schemas and datasets in all of the supported formats.
Description | Schema | UML diagram | Dataset | Schema & dataset |
---|---|---|---|---|
Address book (more complete example with a transposed table) | tsv, xlsx, py | svg | tsv, multi.tsv, xlsx, json, yml | multi.tsv, xlsx |
Finance: Financial transactions | tsv, xlsx, py | svg | tsv, multi.tsv, xlsx, json, yml | multi.tsv, xlsx |
Genomics: Genes and transcript variants | tsv, xlsx, py | svg | tsv, multi.tsv, xlsx, json, yml | multi.tsv, xlsx |
Models of E. coli metabolism discussed in the supplement to the ObjTables paper | ||||
Khodayari & Maranas kinetic model | tsv, xlsx, py | svg | tsv, multi.tsv, xlsx, json, yml | multi.tsv, xlsx |
Gerosa thermodynamic model | tsv, xlsx, py | svg | tsv, multi.tsv, xlsx, json, yml | multi.tsv, xlsx |
Code for merging the models | ||||
Systems biology: Network of metabolites and biochemical reactions | tsv, xlsx, py | svg | tsv, multi.tsv, xlsx, json, yml | multi.tsv, xlsx |
Systems biology: SBtab – systems biology data and kinetic models | tsv, xlsx, py | svg | ||
Hynne model of yeast glycolysis | tsv, multi.tsv, xlsx, json, yml | |||
Jiang et al. model of pancreatic beta-cell insulin secretion | tsv, multi.tsv, xlsx, json, yml | |||
Related data | tsv, multi.tsv, xlsx, json, yml | |||
Noor et al. model of Escherichia coli metabolism | tsv, multi.tsv, xlsx, json, yml | |||
Related data | tsv, multi.tsv, xlsx, json, yml | |||
Sigurdsson et al. model of mouse metabolism | tsv, multi.tsv, xlsx, json, yml | |||
Teusink model of yeast glycolysis | tsv, multi.tsv, xlsx, json, yml | |||
Related data | tsv, multi.tsv, xlsx, json, yml | |||
Wortel et al. model of Escherichia coli metabolism | tsv, multi.tsv, xlsx, json, yml | |||
Related data | tsv, multi.tsv, xlsx, json, yml | |||
Standard free energies of reactions calculated by eQuilibrator | tsv, multi.tsv, xlsx, json, yml | |||
Yeast transcriptional regulatory network inferred by Chang et al. | tsv, multi.tsv, xlsx, json, yml | |||
Systems biology: WC-KB: format for datasets for eukaryotic whole-cell models | py | svg | ||
Example knowledge base | xlsx | |||
Systems biology: WC-Lang: language for composite, hybrid whole-cell models | py | svg | ||
Example model | xlsx |
ObjTables schemas can either be defined using the tabular format described below or using the ObjTables Python package. The tabular format is easy to use, and requires no programming. The tabular format supports most of the features of ObjTables, including a wide range of data types, relationships, transposed and embedded tables, single inheritance, and basic validation.
The ObjTables Python package is more flexible. In addition to the features of the tabular schema format, the Python package supports abstract classes and methods, custom data types, custom validation, custom XSLX formatting, and more. The Python package can also be used to encapsulate methods for manipulating data inside schemas.
We recommend that developers begin with the tabular format, and transition to the Python format when more capabilities are needed. When more capabilities are required, we recommend that developers use the ObjTables software tools to convert their initial tabular-formatted schema into a Python module. This workflow provides a quick path to developing a custom schema.
Tabular-formatted schemas should begin with a single header row, which indicates that the schema is encoded in ObjTables format (!!ObjTables type="Schema" ...).
Optionally, the header can use the name attribute to define the name of the schema (e.g., !!ObjTables type="Schema" name="<schema-name>" ...). If worksheets also define the schema that they belong to, ObjTables will check that these values are equal.
After the header row, the schema should contain a table with the columns below that defines the classes/models/tables and their attributes/columns. Each row in the table should define a single class or attribute.
Tabular-formatted schemas can be saved in comma-separated (.csv), tab-separated (.tsv), or XSLX (.xlsx) format.
Schemas can also be implemented using the ObjTables Python package. The API enables developers to define schemas with a similar syntax to object-relational mapping tools such as Django , Ruby on Rails , and SQLAlchemy .
To encode complex datasets into a minimal number of tables, related classes can be encoded into a column or a series of columns of the table that represents their (row-formatted) parent classes. (For column-formatted parent classes, related classes can be encoded into a row or a series of rows.)
ObjTables supports numerous data types, and it is easy to implement additional types.
This includes powerful data types from several mathematics and science packages:
The following table lists the attribute types currently available in ObjTables. The first column indicates the names that should be used in conjunction with the tabular schema format. The second column indicates the Python class which implements each type, and which should be used to define schemas with the Python package. The third column indicates the data type used to encode each attribute into an spreadsheet cell. The fourth column indicates the Python data type used to represent each attribute in the ObjTables software tools.
A few of the attributes have required arguments. Please see the Python documentation for more information.
Most of the attributes also have optional arguments that can be used to control how values of the attribute are validated. For example, the Integer and Float attributes have optional min and max arguments which can be used to indicate the minimum and maximum valid values. The String and LongString attributes have optional min_length and max_length attributes which can be used to indicate the minimum and maximum valid lengths of each value.
The none argument controls whether an attribute is required (its value must be defined) or optional (its value can be undefined – null in XLSX, null in JSON and YAML, None in Python):
The unique and primary arguments can be used to configure attributes to function as primary keys for encoding relationships between instances of classes into tables:
ObjTables can encode and decode datasets into and from the file formats outlined below. We recommend using programs such as Microsoft Excel or LibreOffice Calc for viewing and editing datasets. We recommend using CSV or TSV to store datasets because these formats are conducive to revisioning with version control systems such as Git. We recommend using JSON for working with ObjTables in other programming languages beyond Python.
Each collection of tables must declare that the tables are encoded in ObjTables's tabular format by including the line !!!ObjTables ... before one of the tables.
Optionally, document declarations (!!!ObjTables ...) can capture key-value pairs of document-level metadata. Each key and value should be a string. At a minimum, we recommend using the keys below. The ObjTables software automatically generates these keys when datasets are exported to files.
Each table must declare the class that it represents by including a line that begins with !!ObjTables type='Data' class='<class_name>' ...). The value of class should be the name of a class in the schema.
Optionally, table declarations can capture key-value pairs of class/table-level metadata. Each key and value should be a string. At a minimum, we recommend using the keys below. The ObjTables software automatically generates these keys when datasets are exported to files.
Class instances can be encoded into tables as follows:
The value of each attribute of each instance should be encoded into cells as follows:
Comments about individual class instances can be encoded as follows:
Sets of tables that encode datasets can include additional tables, rows, and columns. This can be used to encapsulate additional metadata alongside datasets.
To make it easy to use ObjTables with other programming languages, ObjTables can encode datasets into JSON or YAML documents, which can easily be parsed by a wide range of languages. ObjTables encodes datasets into JSON and YAML as described and illustrated below.
{ # dictionary of metadata about # the document "_documentMetadata": { "objTablesVersion": "<ObjTables version>", "date": "<Date>", ... # additional document metadata }, # dictionary which maps the name of each class # to a dictionary with metadata about each class "_classMetadata": { "<ClassName>": { "id": "<ClassName>", "name": "<Class verbose name>", ... # additional metadata for the class }, ... # dictionaries with metadata # for more classes }, # for each class, lists of dictionaries # which represent the instances of the classes "<ClassName>": [ { "__type": "<ClassName>", "__id": <unique integer id>, "<attribute name>": <value of attribute>, "<*to-one attribute name>": <integer id of related object>, "<*to-many attribute name>": [<integer id of related object>, ...], ... # additional attributes of the # instance of the class }, ... # additional instances of the class ], ... # lists of dictionaries which represent the # instances of additional classes }
This Python code illustrates how to decode datasets encoded into this JSON/YAML format. We recommend following this example to create methods for other languages.
One of the major goals of ObjTables is to make it easy to rigorously validate the syntax and semantics of complex datasets. ObjTables achieves this by making it easy to utilize the six levels of validation summarized below. The validations marked with ○ can optionally be disabled.
ObjTables provides several tools for working with schemas and datasets. The tools marked with ● are available through all of the user interfaces (the web application, web service, command-line program, and Python package). The tools marked with ○ are only available through the Python package. Please see below and the Python documentation for more information about the tools .
ObjTables include four interfaces: a web application, a web service, a command-line program, and a Python package. The web application, web service, and command-line program provide the same features. In addition to the features of the web application, web service, and command-line program, the Python package can programmatically query, edit, merge, split, revision, and migrate datasets. The Python package is also more flexible. For example, the Python package can support additional datatypes and custom validation.
We recommend beginning with the web application, web service, or command-line program. We recommend using the Python package when more flexibility is required, such as a custom data type, or to analyze datasets with Python tools such as NumPy, Pandas, scikit-learn, or SciPy.
The web application is available at objtables.org/app.
The following example illustrates how to use the command-line program to generate a template spreadsheet for a dataset, validate a dataset, and compare two datasets.
obj-tables gen-template /path/to/schema.xlsx /path/to/template.xlsx obj-tables validate /path/to/schema.xlsx /path/to/dataset.xlsx obj-tables diff /path/to/schema.xlsx /path/to/dataset-1.xlsx /path/to/dataset-2.xlsx
Installation instructions for the command-line program are available at docs.karrlab.org. Documentation for the command-line program is available inline by running obj-tables --help.
Documentation for the web service is available at objtables.org/api.
As described above, the Python package has significantly more capabilities than the web application, web service, and command-line program.
The following example briefly introduces the API by illustrating how to use the API to programmatically create, manipulate, analyze, and export the same address book of tech CEOs described above .
Additional and more extensive tutorials are available as interactive Jupyter notebooks at sandbox.karrlab.org. Detailed documentation is available at docs.karrlab.org. Documentation for each module, class, and function can also be obtained by executing help(<module/class/function>) (e.g., help(obj_tables.io.validate)).
apple = Company( name='Apple', url='https://apple.com/', address=Address( street='10600 N Tantau Ave', city='Cupertino', state='CA', zip_code='95014', country='US')) facebook = Company( name='Facebook', url='https://facebook.com/', address=Address( street='1 Hacker Way #15', city='Menlo Park', state='CA', zip_code='94025', country='US')) google = Company( name='Google', url='https://google.com/', address=Address( street='1600 Amphitheatre Pkwy', city='Mountain View', state='CA', zip_code='94043', country='US')) netflix = Company( name='Netflix', url='https://netflix.com/', address=Address( street='100 Winchester Cir', city='Los Gatos', state='CA', zip_code='95032', country='US')) companies = [ apple, facebook, google, netflix, ]
cook = Person( name='Tim Cook', type=PersonType.business, company=apple, email_address='tcook@apple.com', phone_number='408-996-1010', address=apple.address) hastings = Person( name='Reed Hastings', type=PersonType.business, company=netflix, email_address='reed.hastings@netflix.com', phone_number='408-540-3700', address=netflix.address) pichai = Person( name='Sundar Pichai', type=PersonType.business, company=google, email_address='sundar@google.com', phone_number='650-253-0000', address=google.address) zuckerberg = Person( name='Mark Zuckerberg', type=PersonType.family, company=facebook, email_address='zuck@fb.com', phone_number='650-543-4800', address=facebook.address) ceos = [ cook, hastings, pichai, zuckerberg, ]
assert facebook.url == 'https://facebook.com/'
facebook.url = 'https://about.fb.com/'
import obj_tables errors = obj_tables.Validator().run( companies + ceos) assert errors is None
import obj_tables.io import os import tempfile dirname = tempfile.mkdtemp() filename_xlsx = os.path.join( dirname, 'address_book.xlsx' ) obj_tables.io.Writer().run( filename_xlsx, companies + ceos, models=[ Company, Person, ] )
objects = obj_tables.io.Reader().run( filename_xlsx, models=[ Company, Person, ], group_objects_by_model=False, ignore_sheet_order=True )
zuckerberg_copy = next( el for el in objects if isinstance(el, Person) \ and el.name == 'Mark Zuckerberg') assert zuckerberg_copy.is_equal( zuckerberg) assert zuckerberg_copy.difference( zuckerberg) == ''
We recommend that developers use the web service to work with ObjTables in other programming languages:
The documentation for the web service contains detailed information about the inputs and outputs of each endpoint.
Below are several resources that we recommend for working with ObjTables-encoded datasets:
While ObjTables has many similarities to other toolkits, ObjTables's unique combination of features provides some advantages for some use cases. Below, we outline the advantages and disadvantages of ObjTables of over several other types of tools.
We designed ObjTables as a complement to the Open Office Spreadsheet XML format (XLSX; ECMA-376 ; ISO/IEC 29500 ) for several reasons.
To make ObjTables an appealing option for supplementary datasets to journal articles, we wanted to base ObjTables on a file format that is easy for many researchers to read, and which is already a popular format for supplementary data. Furthermore, we wanted to base ObjTables on an open-source format. This narrowed our choices to the XLSX format or text-based formats such as CSV or TSV. While data serialization formats such as JSON, XML, and YAML provide some of the benefits of ObjTables, we did not want to base ObjTables on these formats because the GUI tools for these files do not make datasets as easy to read as spreadsheets programs such as Excel.
Although databases and ORMs also provide some of the benefits of ObjTables, we did not want to base ObjTables on a traditional relational database or a low-code database such as Airtable because we wanted journals to be able to archive datasets as a file, and we wanted readers to be able to use simple tools to read these files. Relational databases are too complex for many researchers, and commercial low-code tools are not designed to export and import datasets to and from files, which is critical for journal archiving. Another reason why we chose to base ObjTables on a file format rather than a database is that we wanted ObjTables to be compatible with asynchronous collaboration workflows with version control systems such as Git.
As a result, we chose to base ObjTables on top of the XLSX format, a popular and open format for spreadsheets.
Due to the popularity and significant capabilities of existing spreadsheet programs such as Excel, we wanted to enable users to continue to use their favorite programs on their favorite operating systems as much as possible. In particular, we wanted peer reviewers, editors, and readers of journal articles to be able to read ObjTables-encoded spreadsheets with their existing spreadsheet programs rather than have to install an additional program. We also wanted authors to be able to continue to do the majority of their work (e.g., combine data from multiple sources, munge each source, perform calculations, make charts) with their favorite spreadsheet program and only briefly have to use a new program for advanced features such as validation that are not available in programs such as Excel.
We did not want to develop an extension for a spreadsheet program such as Excel or Google Sheets because this would only benefit users of that one program as there is no universal format for extensions. In addition, an extension would only benefit users of specific versions of that program on specific operating systems because the interfaces for extensions change over time. We also did not want to build the ObjTables software as an extension of Excel or Google Sheets because we wanted researchers to be able to use ObjTables with free, open-source tools. We did not want to build the ObjTables software as an extension of an open-source spreadsheet tool such as LibreOffice Calc because these tools are much less popular than Excel.
As a result, we chose to develop the ObjTables software as a separate, OS-independent program from spreadsheet programs such as Excel. Furthermore, we tried to make the software as accessible as possible by providing it as a web application, command-line program, web service, and Python package.
ObjTables is under active development. Below are several open issues that we intend to address going forward. Please see the GitHub issue list for more information.
Going forward, we hope that the community will adopt ObjTables as a standard for supplementary spreadsheets. Toward a community standard, we invite feedback and contributions. Please contact the developers to share feedback or get involved.
Guided by community input, potential directions include:
ObjTables worksheets are protected to help users use programs such as Microsoft Excel and LibreOffice Calc as graphical user interfaces for editing datasets. Specifically, ObjTables protects the elements of spreadsheets which encode the schema (worksheets, column headings, etc.) to help researchers focus on editing their data without unintentionally changing the schema. This is designed to help researchers separate the development of schemas from the development of datasets, and adhere to schemas as they edit their datasets.
By using this site you agree to use cookies to collect limited personal information to help us improve ObjTables as outlined in our Privacy Policy.