(devguideplotdata)=

```{eval-rst}
.. currentmodule:: masci_tools.vis.data
```

# Using the {py:class}`PlotData` class

## Description

The {py:class}`PlotData` class simplifies supporting data to plotting functions in
multiple ways, while keeping the plotting functions themselves simple and easy to
understand.

The basic idea of {py:class}`PlotData` is to mimic the behaviour of the `data`
argument in [matplotlib] or the `source` argument in [bokeh]. Suppose we have our
data for a plot in a dictionary `d`, which has the keys `x`, `y1` and `y2`. If
we now want to plot both `y` keys against `x` we can do this in the following way.

```python
from masci_tools.vis.data import PlotData

plot_data = PlotData(d, x='x', y=['y1', 'y2'])

for entry, source in plot_data.items():
   #entry has the keys needed to get the data from the source
   #and source is the mapping to use

   print(entry.x, entry.y) #Yields x, y1 in the first loop and x, y2 in the second

   #Now we can plot the data
   #for example plt.plot(entry.x, entry.y, data=source)
```

The keys are automatically expanded to be of the same length, if this is possible.
There are three iteration modes, with the same names as for dicts:

:`keys`: Yields `namedtuple` with the keys for each plot
:`values`: Yields `namedtuple` with the values corresponding to the keys for each plot
:`items`: Yields the `keys` and their corresponding mapping for each plot

All of these functions have an argument `first`, which will only return the first element
if it is given as `True`.

:::{note}
The names `x` and `y` in the example above are completely arbitrary.
The names for the columns and the fields on the `namedtuple` are determined by the
keyword arguments given to {py:class}`PlotData` at initialization
:::

:::{note}
At the moment the types of mappings accepted in the {py:class}`PlotData` class are
limited to {py:class}`dict`, `pd.DataFrame` and `ColumnDataSource` ([bokeh]) objects
:::

## Initializing {py:class}`PlotData` without a mapping

Users might want to provide data directly as arrays. If this should be allowed, there
is a function {py:func}`process_data_arguments()` to allow for this option. This
function can either take a `data` argument with a mapping and the same keyword
arguments as the {py:class}`PlotData`.

```python
from masci_tools.vis.data import process_data_arguments

plot_data = process_data_arguments(data=d, x='x', y=['y1','y2'])
```

Or you can provide the arrays directly without a `data` argument

```python
from masci_tools.vis.data import process_data_arguments

#x,y1,y2 are the actual arrays
plot_data = process_data_arguments(x=x, y=[y1,y2])
```

If no `data` argument is given the keyword arguments are assumed to contain the data
and they will be processed according to three rules:

1. If the data is a multidimensional array (list of lists, etc.) and it is not
   forbidden by the given argument the first dimension of the array is iterated over
   and interpreted as separate entries (if the data was previously split up into multiple
   sets a length check is performed)
2. If the data is a one-dimensional array and of a different length than the number
   of defined data sets it is added to all previously existing entries
3. If the data is a one-dimensional array and of the same length as the number of
   defined data sets each entry is added to the corresponding data set

:::{note}
List or array in this context refers to `list`, `np.array` and `pd.Series`
:::

## Available routines on {py:class}`PlotData`

There are a couple of routines for mutating/copyying or getting information about the data
in a {py:class}`PlotData` instance. These are not meant to be used heavily and should be
used for typical simple work done for plot data processing, i.e. scaling, shifting,
getting limits, ...

:::{note}
The term data key in the following section refers to the keys of the keyword arguments
given to {py:class}`PlotData` at initialization or the fields on the namedtuples
returned by iterating over an instance
:::

- {py:meth}`PlotData.get_keys()`: Get all the keys for a given data key in a list
- {py:meth}`PlotData.get_values()`: Get all the values for a given data key in a list
- {py:meth}`PlotData.min()`: Get the minimum value for a given data key. A mask can be
  passed to further select the data. If `separate=True` is passed a list of minimum
  values for each plot is returned
- {py:meth}`PlotData.max()`: Get the maximum value for a given data key. A mask can be
  passed to further select the data. If `separate=True` is passed a list of maximum
  values for each plot is returned
- {py:meth}`PlotData.apply()`: Apply a lambda function to transform the data of a given data key (in-place!!)
- {py:meth}`PlotData.get_function_result()`: Apply a function to a given data key and return the results (Does not change the data)
- {py:meth}`PlotData.sort_data()`: Sort the data by the given data keys
- {py:meth}`PlotData.group_data()`: Group the data by the given data keys
- {py:meth}`PlotData.shift_data()`: Shift the data of a given data key either globally or with different shifts for each plot
- {py:meth}`PlotData.copy_data()`: Copy data to a of one data key to a new data key
- {py:meth}`PlotData.distinct_datasets()`: Return how many different datasets exist for a given data key

:::{warning}
The methods {py:meth}`PlotData.sort_data()` and {py:meth}`PlotData.group_data()` will
always convert the data sources to `pd.DataFrame` objects if they are not already.
:::

[bokeh]: https://docs.bokeh.org/en/latest/index.html
[matplotlib]: https://matplotlib.org/stable/index.html