Filters

padua.filters.filter_exclude(df, s)[source]

Filter dataframe to exclude matching columns, based on search for “s”

Parameters:s – string to search for, exclude matching columns
padua.filters.filter_intensity(df, label='')[source]

Filter to include only the Intensity values with optional specified label, excluding other Intensity measurements, but retaining all other columns.

padua.filters.filter_intensity_lfq(df, label='')[source]

Filter to include only the Intensity values with optional specified label, excluding other Intensity measurements, but retaining all other columns.

padua.filters.filter_localization_probability(df, threshold=0.75)[source]

Remove rows with a localization probability below 0.75

Return a DataFrame where the rows with a value < threshold (default 0.75) in column ‘Localization prob’ are removed. Filters data to remove poorly localized peptides (non Class-I by default).

Parameters:
  • df – Pandas DataFrame
  • threshold – Cut-off below which rows are discarded (default 0.75)
Returns:

Pandas DataFrame

padua.filters.filter_select_columns(df, columns)[source]

Filter dataframe to include specified columns, retaining any Intensity columns.

padua.filters.minimum_valid_values_in_any_group(df, levels=None, n=1, invalid=<Mock id='140020034530328'>)[source]

Filter DataFrame by at least n valid values in at least one group.

Taking a Pandas DataFrame with a MultiIndex column index, filters rows to remove rows where there are less than n valid values per group. Groups are defined by the levels parameter indexing into the column index. For example, a MultiIndex with top and second level Group (A,B,C) and Replicate (1,2,3) using levels=[0,1] would filter on n valid values per replicate. Alternatively, levels=[0] would filter on n

valid values at the Group level only, e.g. A, B or C.

By default valid values are determined by np.nan. However, alternatives can be supplied via invalid.

Parameters:
  • df – Pandas DataFrame
  • levelslist of int specifying levels of column MultiIndex to group by
  • nint minimum number of valid values threshold
  • invalid – matching invalid value
Returns:

filtered Pandas DataFrame

padua.filters.remove_columns_containing(df, column, match)[source]

Return a DataFrame with rows where column values containing match are removed.

The selected column series of values from the supplied Pandas DataFrame is compared to match, and those rows that contain it are removed from the DataFrame.

Parameters:
  • df – Pandas DataFrame
  • column – Column indexer
  • matchstr match target
Returns:

Pandas DataFrame filtered

padua.filters.remove_columns_matching(df, column, match)[source]

Return a DataFrame with rows where column values match match are removed.

The selected column series of values from the supplied Pandas DataFrame is compared to match, and those rows that match are removed from the DataFrame.

Parameters:
  • df – Pandas DataFrame
  • column – Column indexer
  • matchstr match target
Returns:

Pandas DataFrame filtered

padua.filters.remove_contaminants(df)[source]

Remove rows with a + in the ‘Contaminants’ column

Return a DataFrame where rows where there is a “+” in the column ‘Contaminants’ are removed. Filters data to remove peptides matched as reverse.

Parameters:df – Pandas DataFrame
Returns:filtered Pandas DataFrame
padua.filters.remove_only_identified_by_site(df)[source]

Remove rows with a + in the ‘Only identified by site’ column

Return a DataFrame where rows where there is a “+” in the column ‘Only identified by site’ are removed. Filters data to remove peptides matched as reverse.

Parameters:df – Pandas DataFrame
Returns:filtered Pandas DataFrame
padua.filters.remove_potential_contaminants(df)[source]

Remove rows with a + in the ‘Potential contaminant’ column

Return a DataFrame where rows where there is a “+” in the column ‘Contaminants’ are removed. Filters data to remove peptides matched as reverse.

Parameters:df – Pandas DataFrame
Returns:filtered Pandas DataFrame
padua.filters.remove_reverse(df)[source]

Remove rows with a + in the ‘Reverse’ column.

Return a DataFrame where rows where there is a “+” in the column ‘Reverse’ are removed. Filters data to remove peptides matched as reverse.

Parameters:df – Pandas DataFrame
Returns:filtered Pandas DataFrame
padua.filters.search(df, match, columns=['Proteins', 'Protein names', 'Gene names'])[source]

Search for a given string in a set of columns in a processed DataFrame.

Returns a filtered DataFrame where match is contained in one of the columns.

Parameters:
  • df – Pandas DataFrame
  • matchstr to search for in columns
  • columnslist of str to search for match
Returns:

filtered Pandas DataFrame