retentioneering.preparing package¶
Submodules¶
retentioneering.preparing.preparing module¶
-
class
retentioneering.preparing.preparing.
SessionSplitter
(n_components)[source]¶ Bases:
object
Class for session splitting processing
-
add_session_column
(df, thr, sort)[source]¶ Creates columns with session rank.
Parameters: - df (pd.DataFrame) – DataFrame with columns responding for Event Name, Event Timestamp and User ID
- thr (float) – time threshold from previous step for session interruption
- sort (bool) – If sorting by User ID & Event Timestamp is required
Returns: input data with columns session
Return type: pd.DataFrame
-
add_time_from_prev_event
(df, unit=None, delta_unit='s')[source]¶ Adds time from previous event column.
Parameters: - df (pd.DataFrame) – DataFrame with columns responding for Event Name, Event Timestamp and User ID
- unit (str) – type of string for pd.datetime parsing
- delta_unit (str) – step in timestamp column (e.g. seconds from 01-01-1970)
Returns: input data with column from_prev_event
Return type: pd.DataFrame
-
fit
(df, columns_config, unit=None, delta_unit='s')[source]¶ Fits the gausian mixture model for understanding threshold.
Parameters: - df – DataFrame with columns responding for Event Name, Event Timestamp and User ID
- columns_config – Dictionary that maps to required column names: {‘event_name_col’: Event Name Column, ‘event_timestamp_col’ Event Timestamp Column, ‘user_id_col’: User ID Column}
- unit – type of string for pd.datetime parsing
- delta_unit – step in timestamp column (e.g. seconds from 01-01-1970)
Returns: None
-
get_threshold
(thr=0.95, thrs=array([1.00000e+00, 1.25000e+00, 1.50000e+00, ..., 2.99950e+03, 2.99975e+03, 3.00000e+03]))[source]¶ Finds best threshold.
Parameters: - thr (float in interval (0, 1)) – Probability threshold for session interruption
- thrs (List[float] or iterable) – Timedelta values for threshold checking
Returns: value of threshold
Return type: float
-
predict
(df, thr_prob=0.95, thrs=array([1.00000e+00, 1.25000e+00, 1.50000e+00, ..., 2.99950e+03, 2.99975e+03, 3.00000e+03]), sort=True)[source]¶ Predicts sessions for passed DataFrame.
Parameters: - df (pd.DataFrame) – DataFrame with columns responding for Event Name, Event Timestamp and User ID
- thr_prob (float) – Probability threshold for session interruption
- thrs (List[float] or iterable) – Timedelta values for threshold checking
- sort (bool) – If sorting by User ID & Event Timestamp is required
Returns: self
Return type: pd.DataFrame
-
-
retentioneering.preparing.preparing.
add_first_and_last_events
(df, first_event_name='fisrt_event', last_event_name='last_event')[source]¶ For every user and session adds first event with first_event_name and last event with last_event_name.
Parameters: - df (pd.DataFrame) – input DataFrame
- first_event_name (str) – name of the first event
- last_event_name (str) – name of the last event
Returns: self
Return type: pd.DataFrame
-
retentioneering.preparing.preparing.
add_lost_events
(df, positive_event_name='passed', negative_event_name='lost', settings=None)[source]¶ Add new events with negative_event_name in input DataFrame.
Parameters: - df (pd.DataFrame) – input pd.DataFrame
- positive_event_name (str) – positive event name
- negative_event_name (str) – negative event name which should be added if there is no positive event in the session
- settings (dict) – config dict
Returns: self
Return type: pd.DataFrame
-
retentioneering.preparing.preparing.
add_passed_event
(df, positive_event_name='passed', filters=None, settings=None)[source]¶ Add new events with positive_event_name and delete all events after.
Parameters: - df (pd.DataFrame) – input pd.DataFrame
- positive_event_name (str) – name of the positive event which should be added if filter conditions is True
- filters (dict) – dict with filter conditions
- settings (config dict) – dict
Returns: self
Return type: pd.DataFrame
-
retentioneering.preparing.preparing.
drop_duplicated_events
(df, duplicate_thr_time=0, settings=None)[source]¶ Delete duplicated events (two events with save event names if the time between them less than duplicate_thr_time).
Parameters: - df (pd.DataFrame) – input pd.DataFrame
- duplicate_thr_time (int) – threshold for time between events
- settings (dict) – config dict
Returns: self
Return type: pd.DataFrame
-
retentioneering.preparing.preparing.
filter_events
(df, filters=[], settings=None)[source]¶ Apply filters to the input table.
Parameters: - df (pd.DataFrame) – input pd.DataFrame
- filters (list) – list each element of which is a filter dict
- settings (dict) – config dict
Returns: self
Return type: pd.DataFrame
-
retentioneering.preparing.preparing.
filter_users
(df, filters=[], settings=None)[source]¶ Apply filters to users from the input table and leave all events for the received users.
Parameters: - df (pd.DataFrame) – input pd.DataFrame
- filters (list) – list each element of which is a filter dict
- settings (dict) – config dict
Returns: pd.DataFrame