Skip to content

Upgrading to v0.5

SEGY-SAK underwent major redevelopment from v0.4.x to v0.5. The changes were necessary to introduce improvements to the overall SEGY-SAK experience and to bring SEGY-SAK closer to the Xarray ecosystem. This section offers a guide for upgrading your existing use of SEGY-SAK to the new API and discusses some of the advantages of the new approach.

Tip

If you have feedback or encounter issues with the new functionality of SEGY-SAK, the old functionality remains in place but is depreciated (with small changes) for backwards compatability. The old API is problematic to maintain, and will be removed in future releases.

Please submit any problems to the Github Issue Tracker.

Replacing segy_loader

The segy_loader method SEGY-SAK implemented was designed prior to the implementation of custom backends for Xarray. The introduction of this new API from Xarray offered an opportunity to significantly improve the way SEGY-SAK handled and interacted with SEG-Y files. The new approach introduces

  • lazy loading of SEG-Y trace data. (Headers still greedy for geometry).
  • Better integration with other Xarray methods including streaming output to NetCDF, Zarr, ZGY or Xarray other supported formats.
  • Better large file support (reduced memory footprint and lazy loading of data).
  • Speed improvements for header scanning and loading.
  • Greater maintainability and usability through a geometry agnostic loading approach.
  • Support open_mfdataset such as in cases where gathers are split into inline files.
Python
import xarray as xr
segy = xr.open_dataset(
    segy_file,
    dim_byte_fields={"iline":189, "xline":193},
    extra_byte_fields={"cdp_x":181, "cdp_y":185}
)
Python
from segysak.segy import segy_loader
segy = segy_loader(
    segy_file,
    iline=189, xline=193,
    cdpx=181, cdpy=185
)

Upon loading the SEG-Y data with the new loading tool you will notice some incompatibility with the older SEISNC standard. Read more about replacing the .seis accessor and the changes to the SEISNC reference.

Coordinate scaling is no longer applied by default to loaded SEG-Y data. The user can scale the data (if the coordinate scalar was set in the SEG-Y headers) or provide the scalar manually.

Python
segy.segysak.scale_coords()
# or 
segy.segysak.scale_coords(coord_scalar=-100)

Replacing segy_writer

The segy_writer functionality has similarly been updated to better align with Xarray. The new methodology introduces

  • Native support for streamed writing via Xarray chunks. Specifically, chunking is support for all dimensions including trace samples.
  • Access via the Xarray .seisio accessor.
  • Dimensionality agnostic.
  • Dead trace skipping on write.

A vertical dimension is required and it's key can be specified using the vert_dimension keyword argument. Arbitrary key names for the vertical dimension are possible. Each orthogonal dimension of the trace data to be exported is then specified as an additional keyword argument (e.g. iline and xline in this case), with 1 or more horizontal dimensions required. Additional variables can be exported for each trace to the headers using the trace_header_map diction and byte location mapping.

Python
dataset.seisio.to_segy(
    "out.segy",
    trace_header_map={"cdp_x":73, "cdp_y":77},
    iline=189, xline=193,
    vert_dimension='samples'
)
Python
from segysak.segy import segy_header_scrape
segy_writer(
    dataset,
    "out.segy",
    trace_header_map=dict(iline=5, xline=21)
)

Replacing ds.seisio.to_netcdf and open_seisnc

Changes to the SEISNC format have negated the need for a special implementation of to_netcdf and improves SEGY-SAKs compatibility with other Xarray supported backends.

Python
dataset.to_netcdf('outfile.nc')
Python
dataset.seisio.to_netcdf('outfile.nc')

Similarly, open_seisnc is now depreciated. Datasets saved with the standard ds.to_netcdf method can now be opened using Xarray directly.

Python
dataset = xr.open_dataset('outfile_new.nc')
Python
from segysak import open_seisnc
dataset = open_seisnc('outfile_old.nc')

Replacing ds.seis

SEGY-SAK implemented a custom accessor namespace in <=0.4.x called seis. This has been depreciated in favour of a segysak namespace. The segysak interface removes a lot of legacy implementation that tied SEGY-SAK uses to specific dimension names and keywords. While SEGY-SAK still supports that approach, the user has flexibility and SEGY-SAK now strives to be geometry/convention agnostic, relying on the user to understand their data. Doing this simplifies development and improves the generalisability of SEGY-SAK.

Tip

Data loaded using SEGY-SAK conventions still requires minimal input from the user on dimension and coordinate names.

Major changes:

  • The segysak accessor is available on DataArray and Dataset objects.
  • Set and get dimension and coordinate names using ds.segysak.set_dimensions, ds.segysak.get_dimensions, ds.segysak.set_coords and ds.segysak.get_coords.
  • Store seisnc attributes on DataArray and Dataset objects. This allows multiple DataArrays to be combined (e.g. time and depth data) into a single Dataset.
  • Affine transform updated to use LSQ solver.
  • Fixes/removes bugs with fill_cdpna and percentiles.
  • The arguments and returned values have changed for some accessor methods. Refer to the reference for more details.

Example with calc_corner_points

Python
corner_points = dataset.seysak.calc_corner_points()
Python
dataset.seis.calc_corner_points()
corner_points = dataset.attrs['corner_points_xy']

Changes to SEISNC

SEGY-SAK no longer assumes any details about the data units or vertical dimension. As such, the vertical dimension is now loaded as samples by default.

Previously, SEG-Y attributes would be stored in the Xarray Dataset.attrs dictionary. This caused ambiguity where attributes were more suited to a sub-array of the Dataset and created unnecessary complexity for storing objects that were not compatible with NetCDF (e.g. lists and custom attributes like the text header). To overcome these difficulties SEGY-SAK has introduced a custom attribute to Xarray called seisnc which is stored in Dataset.attrs['seisnc'] or DataArray.attrs['seisnc'] as a JSON string. The use of JSON allows for the storage of more complicated objects natively in NetCDF.

SEGY-SAK accessors take care or serial/deserialisation of the data to the JSON string. Any JSON compatible object is supported (lists, dictionaries, int, float, strings). The SEISNC attributes are set and accessed using the accessor namespace.

Python
> ds.seisnc['an_attribute'] = 'my attr'
> ds.seisnc['an_attribute']
'my attr' 
> ds.seisnc.attrs
{'an_attribute':'my_attr'}
> ds.attrs
{'seisnc':'{"an_attribute":"my_attr"}'}

Certain keys are still reserved for the SEISNC standard. However, some will be depreciated in future versions.

Changes to the old API

The only breaking change to the old API in v0.5 is the removal of the silent argument from all functions. The old API made use of tqdm to post information about slow loading processes. This functionality is carried forward, but control of tqdm is handled by a new library class Progress. This all tqdm arguments to be set at a session level.

For instance, to disable progress reporting for all subsequent SEGY-SAK commands.

Python
from segysak.progress import Progress
from segysak.segy import segy_header_scrape
Progress.set_defaults(disable=True)
# silent keyword argument replaced by Progress.set_defaults(disable=True) ^^^
scrape = segy_header_scrape("data/volve10r12-full-twt-sub3d.sgy")
Python
from segysak.segy import segy_header_scrape
scrape = segy_header_scrape(
    "data/volve10r12-full-twt-sub3d.sgy",
    silent=True
)

Removing ZGY

SEGY-SAK has removed ZGY tools which have been migrated to and improved in the PyZGY package. Please use the ZGY tools and new Xarray backend provided by PyZGY.

Until a new release is created, please install PyZgy from Github

Bash
pip install git+https://github.com/equinor/pyzgy.git
Python
# open a zgy file
zgy = xr.open_dataset('zgyfile.zgy')

# write to a zgy file
zgy.pyzgy.to_zgy('out_zgyfile.zgy')
Python
from segysak.openzgy import zgy_loader, zgy_writer
ds_zgy = zgy_loader("zgyfile.zgy")

zgy_writer(ds, "out_zgyfile.zgy")