SEG-Y to Vector DataFrames and Back¶

The connection of segysak to xarray greatly simplifies the process of vectorising segy 3D data and returning it to SEGY. To do this, one can use the close relationship between pandas and xarray.

Loading Data¶

We start by loading data normally using the segy_loader utility. For this example we will use the Volve example sub-cube.

In [2]:

Copied!





import pathlib
import xarray as xr
from IPython.display import display

volve_3d_path = pathlib.Path("data/volve10r12-full-twt-sub3d.sgy")
print("3D", volve_3d_path.exists())

volve_3d = xr.open_dataset(volve_3d_path, dim_byte_fields={'iline': 5, 'xline': 21}, extra_byte_fields={'cdp_x': 73, 'cdp_y': 77})
import pathlib
import xarray as xr
from IPython.display import display

volve_3d_path = pathlib.Path("data/volve10r12-full-twt-sub3d.sgy")
print("3D", volve_3d_path.exists())

volve_3d = xr.open_dataset(volve_3d_path, dim_byte_fields={'iline': 5, 'xline': 21}, extra_byte_fields={'cdp_x': 73, 'cdp_y': 77})

3D True

Vectorisation¶

Once the data is loaded it can be converted to a pandas.DataFrame directly from the loaded Dataset. The Dataframe is multi-index and contains columns for each variable in the originally loaded dataset. This includes the seismic amplitude as data and the cdp_x and cdp_y locations. If you require smaller volumes from the input data, you can use xarray selection methods prior to conversion to a DataFrame.

In [3]:

Copied!

volve_3d_df = volve_3d.to_dataframe()
display(volve_3d_df)
volve_3d_df = volve_3d.to_dataframe()
display(volve_3d_df)

			cdp_x	cdp_y	data
iline	xline	samples
10090	2150	4.0	43640052	647744704	0.020575
		8.0	43640052	647744704	0.022041
		12.0	43640052	647744704	0.019659
		16.0	43640052	647744704	0.025421
		20.0	43640052	647744704	0.025436
...	...	...	...	...	...
10150	2351	3384.0	43414413	647878266	0.000000
		3388.0	43414413	647878266	0.000000
		3392.0	43414413	647878266	0.000000
		3396.0	43414413	647878266	0.000000
		3400.0	43414413	647878266	0.000000

10473700 rows × 3 columns

We can remove the multi-index by resetting the index of the DataFrame. Vectorized workflows such as machine learning can then be easily applied to the DataFrame.

In [4]:

Copied!

volve_3d_df_reindex = volve_3d_df.reset_index()
display(volve_3d_df_reindex)
volve_3d_df_reindex = volve_3d_df.reset_index()
display(volve_3d_df_reindex)

	iline	xline	samples	cdp_x	cdp_y	data
0	10090	2150	4.0	43640052	647744704	0.020575
1	10090	2150	8.0	43640052	647744704	0.022041
2	10090	2150	12.0	43640052	647744704	0.019659
3	10090	2150	16.0	43640052	647744704	0.025421
4	10090	2150	20.0	43640052	647744704	0.025436
...	...	...	...	...	...	...
10473695	10150	2351	3384.0	43414413	647878266	0.000000
10473696	10150	2351	3388.0	43414413	647878266	0.000000
10473697	10150	2351	3392.0	43414413	647878266	0.000000
10473698	10150	2351	3396.0	43414413	647878266	0.000000
10473699	10150	2351	3400.0	43414413	647878266	0.000000

10473700 rows × 6 columns

Return to Xarray¶

It is possible to return the DataFrame to the Dataset for output to SEGY. To do this the multi-index must be reset. Afterward, pandas provides the to_xarray method.

In [5]:

Copied!





volve_3d_df_multi = volve_3d_df_reindex.set_index(["iline", "xline", "samples"])
display(volve_3d_df_multi)
volve_3d_ds = volve_3d_df_multi.to_xarray()
display(volve_3d_ds)
volve_3d_df_multi = volve_3d_df_reindex.set_index(["iline", "xline", "samples"])
display(volve_3d_df_multi)
volve_3d_ds = volve_3d_df_multi.to_xarray()
display(volve_3d_ds)

			cdp_x	cdp_y	data
iline	xline	samples
10090	2150	4.0	43640052	647744704	0.020575
		8.0	43640052	647744704	0.022041
		12.0	43640052	647744704	0.019659
		16.0	43640052	647744704	0.025421
		20.0	43640052	647744704	0.025436
...	...	...	...	...	...
10150	2351	3384.0	43414413	647878266	0.000000
		3388.0	43414413	647878266	0.000000
		3392.0	43414413	647878266	0.000000
		3396.0	43414413	647878266	0.000000
		3400.0	43414413	647878266	0.000000

10473700 rows × 3 columns

The resulting dataset requires some changes to make it compatible again for export to SEGY. Firstly, the attributes need to be set. The simplest way is to copy these from the original SEG-Y input. Otherwise they can be set manually. segysak specifically needs the sample_rate and the coord_scalar attributes.

In [6]:

Copied!

volve_3d_ds.attrs = volve_3d.attrs
display(volve_3d_ds.attrs)
volve_3d_ds.attrs = volve_3d.attrs
display(volve_3d_ds.attrs)

{'seisnc': '{"coord_scalar": -100.0, "coord_scaled": false}'}

The cdp_x and cdp_y positions must be reduced to 2D along the vertical axis "twt" and set as coordinates.

Afterwards, use the to_segy method as normal to return to SEGY.

In [9]:

Copied!

volve_3d_ds.seisio.to_segy("data/test.segy", iline=189, xline=193, trace_header_map={'cdp_x':181, 'cdp_y':185})
volve_3d_ds.seisio.to_segy("data/test.segy", iline=189, xline=193, trace_header_map={'cdp_x':181, 'cdp_y':185})

Very large datasets¶

If you have a very large dataset (SEG-Y file), it may be possible to use ds.to_dask_dataframe() which can perform operations, including the writing of data in a lazy manner.