Using the PlotData class

Description

The PlotData class simplifies supporting data to plotting functions in multiple ways, while keeping the plotting functions themselves simple and easy to understand.

The basic idea of PlotData is to mimic the behaviour of the data argument in matplotlib or the source argument in bokeh. Suppose we have our data for a plot in a dictionary d, which has the keys x, y1 and y2. If we now want to plot both y keys against x we can do this in the following way.

from masci_tools.vis.data import PlotData

plot_data = PlotData(d, x='x', y=['y1', 'y2'])

for entry, source in plot_data.items():
   #entry has the keys needed to get the data from the source
   #and source is the mapping to use

   print(entry.x, entry.y) #Yields x, y1 in the first loop and x, y2 in the second

   #Now we can plot the data
   #for example plt.plot(entry.x, entry.y, data=source)

The keys are automatically expanded to be of the same length, if this is possible. There are three iteration modes, with the same names as for dicts:

keys:

Yields namedtuple with the keys for each plot

values:

Yields namedtuple with the values corresponding to the keys for each plot

items:

Yields the keys and their corresponding mapping for each plot

All of these functions have an argument first, which will only return the first element if it is given as True.

Note

The names x and y in the example above are completely arbitrary. The names for the columns and the fields on the namedtuple are determined by the keyword arguments given to PlotData at initialization

Note

At the moment the types of mappings accepted in the PlotData class are limited to dict, pd.DataFrame and ColumnDataSource (bokeh) objects

Initializing PlotData without a mapping

Users might want to provide data directly as arrays. If this should be allowed, there is a function process_data_arguments() to allow for this option. This function can either take a data argument with a mapping and the same keyword arguments as the PlotData.

from masci_tools.vis.data import process_data_arguments

plot_data = process_data_arguments(data=d, x='x', y=['y1','y2'])

Or you can provide the arrays directly without a data argument

from masci_tools.vis.data import process_data_arguments

#x,y1,y2 are the actual arrays
plot_data = process_data_arguments(x=x, y=[y1,y2])

If no data argument is given the keyword arguments are assumed to contain the data and they will be processed according to three rules:

  1. If the data is a multidimensional array (list of lists, etc.) and it is not forbidden by the given argument the first dimension of the array is iterated over and interpreted as separate entries (if the data was previously split up into multiple sets a length check is performed)

  2. If the data is a one-dimensional array and of a different length than the number of defined data sets it is added to all previously existing entries

  3. If the data is a one-dimensional array and of the same length as the number of defined data sets each entry is added to the corresponding data set

Note

List or array in this context refers to list, np.array and pd.Series

Available routines on PlotData

There are a couple of routines for mutating/copyying or getting information about the data in a PlotData instance. These are not meant to be used heavily and should be used for typical simple work done for plot data processing, i.e. scaling, shifting, getting limits, …

Note

The term data key in the following section refers to the keys of the keyword arguments given to PlotData at initialization or the fields on the namedtuples returned by iterating over an instance

  • PlotData.get_keys(): Get all the keys for a given data key in a list

  • PlotData.get_values(): Get all the values for a given data key in a list

  • PlotData.min(): Get the minimum value for a given data key. A mask can be passed to further select the data. If separate=True is passed a list of minimum values for each plot is returned

  • PlotData.max(): Get the maximum value for a given data key. A mask can be passed to further select the data. If separate=True is passed a list of maximum values for each plot is returned

  • PlotData.apply(): Apply a lambda function to transform the data of a given data key (in-place!!)

  • PlotData.get_function_result(): Apply a function to a given data key and return the results (Does not change the data)

  • PlotData.sort_data(): Sort the data by the given data keys

  • PlotData.group_data(): Group the data by the given data keys

  • PlotData.shift_data(): Shift the data of a given data key either globally or with different shifts for each plot

  • PlotData.copy_data(): Copy data to a of one data key to a new data key

  • PlotData.distinct_datasets(): Return how many different datasets exist for a given data key

Warning

The methods PlotData.sort_data() and PlotData.group_data() will always convert the data sources to pd.DataFrame objects if they are not already.