Skip to content

CasaXPS data analysis results parsing #99

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,7 @@ cython_debug/
!examples/vms/vms_txt_export.txt
!tests/data/scienta_txt/Ag_*.txt
!tests/data/vms_txt_export/vms_txt_export.txt
!tests/data/vms_analysis/FeO_analyzed_lineshapes_cols.txt

build/
.python-version
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.9.1
rev: v0.9.4
hooks:
# Run the linter.
- id: ruff
Expand Down
Binary file added docs/assets/casa_data_analysis_converted.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 3 additions & 1 deletion docs/explanation/contextualization.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@
Conceptually, mapping between representations of concepts and instance data is a key tasks in information science. The plugin pynxtools-xps implements this specifically for the file and serialization formats used within the research field of photoelectron spectroscopy (PES).

In pynxtools-xps, the mapping from the vendor format is a two-step process:

1) First, each information piece is parsed from the experiment- and vendor-specific and assigned a name that describes what the reader developer thinks it semantically means. This naming can come from documentation of the original data, existing key-value infrastructure in the data file, or from domain knowledge of the reader developer. All data and metadata items are internally stored as a flat list of dictionaries, with each dictionary containing all information about a single XP spectrum.

2) This list of dicts is then mapped onto either the ([NXmpes](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXmpes.html) NeXus application definition or its specialization [NXxps](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXxps.html)). For this, a JSON config file is used that provides a concept map from the originally assigned keys towards the groups, fields, and attributes in the NeXus standard. Such transformations are configured via the respective files in the [*config*](https://github.com/FAIRmat-NFDI/pynxtools-xps/tree/main/pynxtools_xps/config) directory of pynxtools-xps.

Upon parsing, the XPS reader uses the config file to map the (meta-)data to a *template* which follows the NeXus application definitions. It also takes metadata provided through additional means (i.e., an electronic lab notebook (ELN) file) to fill in missing required and recommended fields and attributes in the application definition that were not provided in the raw data fikes. It is this *template* variable from which core functions like *convert.py* of the pynxtools write the actual NeXus/HDF5 file. The latter tool is also referred to as the dataconverter of [pynxtools](https://github.com/FAIRmat-NFDI/pynxtools).
Upon parsing, the XPS reader uses the config file to map the (meta-)data to a *template* which follows the NeXus application definitions. It also takes metadata provided through additional means (i.e., an electronic lab notebook (ELN) file) to fill in missing required and recommended fields and attributes in the application definition that were not provided in the raw data files. It is this *template* variable from which core functions like *convert.py* of the pynxtools write the actual NeXus/HDF5 file. The latter tool is also referred to as the dataconverter of [pynxtools](https://github.com/FAIRmat-NFDI/pynxtools).
1 change: 1 addition & 0 deletions docs/explanation/coordinate_system.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# The XPS coordinate system

The application definition [NXxps](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXxps.html) defines a coordinate system based on the sample stage, which is the same coordinate system as in the [ISO standard](https://www.iso.org/standard/24269.html) for surface chemical analysis.

![The XPS coordinate system](../assets/xps_cs.png)
Expand Down
46 changes: 45 additions & 1 deletion docs/explanation/data_processing.md
Original file line number Diff line number Diff line change
@@ -1 +1,45 @@
# Data processing with CasaXPS
# Data processing with CasaXPS

```pynxtools-xps``` supports extracting data and the description of the data analysis (i.e., peak fitting)
by the [CasaXPS data analysis software](http://www.casaxps.com/).

## Modeling of peak fitting in CasaXPS

This is a short description of how peak models are implemented in CasaXPS. The user is referred to the [CasaXPS web site](http://www.casaxps.com/) for a more accurate and detailed explanation.

CasaXPS models peak fitting using two concepts: **regions** and **components**. Regions define the energy range
that is used for peak fitting as well as the background shape to be used. Many different background shapes are
available in CasaXPS, including the most commonly used linear, Shirley, and Tougaard backgrounds. Each peak model is made up of several components, each of which model a single chemical species. Components can have many different line shapes. Constraints with respect to the total area, full-width at half maximum, and position on the energy axis can be defined as well, also with respect to any of the other components.

## Modeling of data fitting in NeXus

NeXus contains a base class for modelling fit procedures called [`NXfit`](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXfit.html). ``NXfit`` contains

- the data to be modelled
- one or more instances of [`NXpeak`](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXpeak.html) to define individual peaks in the model. These map to the components in
CasaXPS
- one or more instances of [`NXfit_background`](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXfit_background.html) to define the background to be subtracted during the fit. These map to the regions in CasaXPS.
- two instances of [`NXfit_function`](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXfit_function.html) to describe the function used for the global fit (`global_fit_function`) and for the optimization (`error_function`).
- information about the fitting envelope and the residual of the fit

The application definition `NXxps` implements an [`NXfit` group](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXxps.html#nxxps-entry-fit-group) to model peak fitting in XPS. Aside from the terms defined in the base class `NXfit`, it also contains some information more specific to XPS fits, like the atomic concentration of each species in the fit model.

## How-to convert peak fitting in CasaXPS into a NeXus file

```pynxtools-xps``` can extract the definition of the peak fitting parameters and store them in an HDF5 file compliant with the `NXxps` application definition.

Three files are needed for the example conversion:

1) The VAMAS (.vms) file containing the original (meta)data and the definition of the peak fitting in the VAMAS
comments
2) The lineshapes of the measurement data as well as the peak fitting, exported from CasaXPS as a TXT file.
This file can be obtained by using the `Save Tab ASCII` to TXT button in CasaXPS and choosing "Rows of Tables" as the export option.
3) The analysis results (incl. the atomic concentrations), exported from CasaXPS as a CSV file. This file can be obtained from the `Quantify` window in CasaXPS and exporting the "Comps" report from the "Report" tab. You can learn more about XPS quantification in CasaXPS [here](http://www.casaxps.com/casaxps-training/quantification/quant.htm).

You can have a look a the example conversion to understand which exported files are expected for the data reader to work.

## Example conversion
See [here](../reference/vms.md#data-analysis-and-peak-fitting) for an example of converting VAMAS data containing
data analysis results from CasaXPS. The resulting file looks like this:

![Example NeXus file with an NXfit group](../assets/casa_data_analysis_converted.png)
6 changes: 4 additions & 2 deletions docs/explanation/implementation.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Purpose and aim of pynxtools-xps

pynxtools-xps aims for the implementation of [FAIR principles of data stewardship](https://doi.org/10.1162/dint_r_00024) in photoelectron spectroscopy (PES). In many experimental fields, there has been a push towards such standardization and interoperability in recent yeards; however, there has been a distinct lack of such efforts in PES.

While there exists a widely adopted [ISO standard](https://www.iso.org/standard/24269.html) for data transfer in surface chemical analysis, it does not fully cover all of the information that is obtained in modern photoemission experiments. Within the FAIRmat project of the German National Research Data Infrastructure Germany (NFDI), we have spent considerable effort towards building developing an extensive and elaborated standard ([NXmpes](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXmpes.html) with its specialization [NXxps](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXxps.html)) for harmonizing PES data using [NeXus](https://www.nexusformat.org/), a community-driven data-modeling framework for experiments.
Expand All @@ -7,11 +8,12 @@ The goal of pynxtools-xps is to provide a mapping from the diverse proprietary a

As part of [pynxtools and its plugin infrastructure](https://github.com/FAIRmat-NFDI/pynxtools), pynxtools-xps is fully integrated into the NOMAD research data management systems (RDMS), with the aim of facilitating harmonization of XPS data and enabling development of data-centric software tools and services.

# Software landscape in photoelectron spectroscopy - a mixture of proprietary and open-source solutions
## Software landscape in photoelectron spectroscopy - a mixture of proprietary and open-source solutions

As in many other experimental fields, the software landscape in photoelectron spectroscopy is extremely diverse, ranging from fully integrated software solution from technology partners, that integrate the measurement, post-processing, and data analysis, to custom-written software for specific uses cases.
While propietary software is often easy to use for end users, such software often writes to proprietary serialization formats (file or database entries). Not only are these formats not openly readable, but they are often not well-documented and the content and meaning of the semantic concepts is very often not documented publicly. While open-source software typically writes to more openly (and sometimes better documented) formats, they tend to loose much of the metadata that commercial vendors can provide with their data. pynxtools-xps aims at both endpoints of this spectrum and everything in-between: it provides an easy-to-use framework for writing a standardization parser for small, encapsulated solutions, while also providing the possibility of mapping the full richness of data and metadata acquired in a high-end XPS laboratory onto NeXus.

# Implementation design
## Implementation design

pynxtools-xps is a community-based tool that provides a bottom-up approach for mapping XPS data onto the NXmpes and NXxps standards. Specificallly, the software contains example parsers for data that was measured by PES researchers in a wide array of experimental setups. The goal is not neccessarily to implement a fully comprehensive mapping of all possible existing file formats, but rather help the individual researcher or technology partner to start reading their data into the NeXus standard.

Expand Down
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,10 @@ How-to guides provide step-by-step instructions for a wide range of tasks, with
The explanation section provides background knowledge on the implementation design, how the data is structured, how data processing can be incorporated, how the integration works in NOMAD, and more.

- [Design principles and implementation](explanation/implementation.md)
- [NXmpes and NXxps](explanation/appdefs.md) -->
- [NXmpes and NXxps](explanation/appdefs.md)
- [The XPS coordinate system](explanation/coordinate_system.md)
- [How to map pieces of information to NeXus](explanation/contextualization.md)
<!-- - [Data processing](explanation/data_processing.md) -->
- [Mapping of data processing performed in CasaXPS](explanation/data_processing.md)
<!-- - - [NOMAD integration](explanation/nomad_integration.md) -->

</div>
Expand Down
36 changes: 29 additions & 7 deletions docs/reference/vms.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,18 @@
The reader supports VAMAS (.vms, .npl) files, the ISO standard data transfer format ([ISO 14976](https://www.iso.org/standard/24269.html)) for X-ray photoelectron spectroscopy. The data can be stored both in REGULAR (i.e, with an equally spaced energy axis) as well as IRREGULAR mode. The reader also allows for .npl files which are structured in the same way as .vms files.

Note that most vendors and analysis software tend to write metadata from their instruments into the comment lines of the VAMAS format. Currently, the VAMAS reader supports parsing of metadata from VAMAS format for the following vendors and software solutions:

- [Kratos Analytical Ltd](https://www.kratos.com/)
- [Specs GmbH](https://www.specs-group.com/specs/)
- [Phi Electronics](https://www.phi.com/): same metadata as in the PHI reader
- [CasaXPS](http://www.casaxps.com/): calibrations and peak fitting

The reader for the VAMAS format can be found [here](https://github.com/FAIRmat-NFDI/pynxtools-xps/tree/main/src/pynxtools_xps/vms).

Example data is available [here](https://github.com/FAIRmat-NFDI/pynxtools-xps/tree/main/examples/vms). The data was measured with and exported from [SpecsLabProdigy](https://www.specs-group.com/nc/specs/products/detail/prodigy/).


## Standard .vms data

Example data is available [here](https://github.com/FAIRmat-NFDI/pynxtools-xps/tree/main/examples/vms). The data was measured with and exported from [SpecsLabProdigy](https://www.specs-group.com/nc/specs/products/detail/prodigy/).

### REGULAR file format

<!-- How is this data structured -->
Expand All @@ -35,12 +35,34 @@ The example conversion for the IRREGULAR VAMAS file can be run with the followin
user@box:~$ dataconverter irregular.vms eln_data_vms.yaml --reader xps --nxdl NXxps --output irregular.vms.nxs
```

### TXT export from CasaXPS
## Data analysis and peak fitting

```pynxtools-xps``` also supports extracting data and the description of the data analysis (i.e., peak fitting)
by the [CasaXPS data analysis software](http://www.casaxps.com/). Three files are needed for the example conversion:

1) The VAMAS (.vms) file containing the original (meta)data and the definition of the peak fitting in the VAMAS
comments
2) The lineshapes of the measurement data as well as the peak fitting, exported from CasaXPS as a TXT file.
3) The analysis results (incl. the atomic concentrations), exported from CasaXPS as a CSV file.

Example data is available [here](https://github.com/FAIRmat-NFDI/pynxtools-xps/tree/main/examples/vms/vms_analysis).

The example conversion for the .txt export file can be run with the following command:

```console
user@box:~$ dataconverter FeO* eln.yaml --reader $READER --nxdl $NXDL --output vms_analysis_ref.nxs
```

You can learn much more about how to prepare the data in CasaXPS for NeXus conversion [here](../explanation/data_processing.md).

```pynxtools-xps``` also supports data exported from VAMAS by the [CasaXPS data analysis software](http://www.casaxps.com/) as TXT file. The example conversion for the .txt export file can be run with the following command:
## Standalone export from CasaXPS

```pynxtools-xps``` also supports data exported from CasaXPS as TXT file by itself.

Example data is available [here](https://github.com/FAIRmat-NFDI/pynxtools-xps/tree/main/examples/vms/txt_export).

The example conversion for the .txt export file can be run with the following command:

```console
user@box:~$ dataconverter vms_txt_export.txt eln_data_vms_txt_export.yaml --reader xps --nxdl NXxps --output vms_txt_export.nxs
```

<!-- ## Data analysis in CasaXPS -->
10 changes: 7 additions & 3 deletions docs/tutorial/standalone.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,13 @@ You will have a basic understanding how to use pynxtools-xps for converting your
## Steps

### Installation

See here for how to install pynxtools together with the XPS reader plugin.

### Running the reader from the command line

An example script to run the XPS reader in `pynxtools`:

```console
user@box:~$ dataconverter $<xps-file path> $<xps-file path> $<eln-file path> --reader xps --nxdl NXxps --output <output-file path>.nxs
```
Expand All @@ -31,9 +34,10 @@ Note that none of the supported file format have data/values for all required an

You can find examples how to use `pynxtools-xps` for your XPS research data pipeline in [`src/pynxtools-xps/nomad/examples`](../../src/pynxtools_xps/nomad/examples/). These are designed for working with [`NOMAD`](https://nomad-lab.eu/) and its [`NOMAD Remote Tools Hub (NORTH)`](https://nomad-lab.eu/prod/v1/gui/analyze/north). Feel invited to try out the respective tutorial [here](tutorial/nomad.md).

There are also small example files with raw and converted data for using the `pynxtools` dataconverter with the `mpes` reader and the `NXmpes` application definition in the [`examples`](../../examples/) folder.
There are also small example files with raw and converted data for using the `pynxtools` dataconverter with the `mpes` reader and the `NXmpes` application definition in the [`examples`](https://github.com/FAIRmat-NFDI/pynxtools-xps/tree/main/examples/) folder.

For this tutorial, we will work with the example data for the VAMAS reader (see [here](../reference/vms.md)). You can run the conversion as

For this tutorial, we will work with the example data for the VAMAS reader (see here [](../../examples/vms/)). You can run the conversion as
```shell
dataconverter \\
--reader xps \\
Expand All @@ -46,4 +50,4 @@ dataconverter \\

TODO: add more steps! <!--[The Jupyter notebook is available here](https://github.com/FAIRmat-NFDI/pynxtools-em/blob/main/examples/HowToUseTutorial.ipynb) TODO!-->

**Congrats! You now have a FAIR NeXus file!**
**Congrats! You now have a FAIR NeXus file!**
Loading