Utils¶

padua.utils.build_combined_label(sl, idxs, sep=' ')[source]¶

Generate a combined label from a list of indexes into sl, by joining them with sep (str).

Parameters:	sl (dict of str) – Strings to combine idxs (list of sl keys) – Indexes into sl sep –
Returns:	str of combined label

padua.utils.calculate_s0_curve(s0, minpval, maxpval, minratio, maxratio, curve_interval=0.1)[source]¶

Calculate s0 curve for volcano plot.

Taking an min and max p value, and a min and max ratio, calculate an smooth curve starting from parameter s0 in each direction.

The curve_interval parameter defines the smoothness of the resulting curve.

Parameters:	s0 – float offset of curve from interset minpval – float minimum p value maxpval – float maximum p value minratio – float minimum ratio maxratio – float maximum ratio curve_interval – float stepsize (smoothness) of curve generator
Returns:	x, y, fn x,y points of curve, and fn generator

padua.utils.chunks(seq, num)[source]¶

Separate seq (np.array) into num series of as-near-as possible equal length values.

Parameters:	seq (np.array) – Sequence to split num (int) – Number of parts to split sequence into
Returns:	np.array of split parts

padua.utils.find_nearest_idx(array, value)[source]¶

Parameters:	array – value –
Returns:

padua.utils.get_index_list(l, ms)[source]¶

Parameters:	l – ms –
Returns:

padua.utils.get_protein_id(s)[source]¶

Return a shortened string, split on spaces, underlines and semicolons.

Extract the first, highest-ranked protein ID from a string containing protein IDs in MaxQuant output format: e.g. P07830;P63267;Q54A44;P63268

Long names (containing species information) are eliminated (split on ‘ ‘) and isoforms are removed (split on ‘_’).

Parameters:	s (str or unicode) – protein IDs in MaxQuant format
Returns:	string

padua.utils.get_protein_id_list(df, level=0)[source]¶

Return a complete list of shortform IDs from a DataFrame

Extract all protein IDs from a dataframe from multiple rows containing protein IDs in MaxQuant output format: e.g. P07830;P63267;Q54A44;P63268

Long names (containing species information) are eliminated (split on ‘ ‘) and isoforms are removed (split on ‘_’).

Parameters:	df (pandas.DataFrame) – DataFrame level (int or str) – Level of DataFrame index to extract IDs from
Returns:	list of string ids

padua.utils.get_protein_ids(s)[source]¶

Return a list of shortform protein IDs.

Extract all protein IDs from a string containing protein IDs in MaxQuant output format: e.g. P07830;P63267;Q54A44;P63268

Long names (containing species information) are eliminated (split on ‘ ‘) and isoforms are removed (split on ‘_’).

Parameters:	s (str or unicode) – protein IDs in MaxQuant format
Returns:	list of string ids

padua.utils.get_shortstr(s)[source]¶

Return the first part of a string before a semicolon.

Extract the first, highest-ranked protein ID from a string containing protein IDs in MaxQuant output format: e.g. P07830;P63267;Q54A44;P63268

Parameters:	s (str or unicode) – protein IDs in MaxQuant format
Returns:	string

padua.utils.hierarchical_match(d, k, default=None)[source]¶

Match a key against a dict, simplifying element at a time

Parameters:	df (pandas.DataFrame) – DataFrame level (int or str) – Level of DataFrame index to extract IDs from
Returns:	hiearchically matched value or default

padua.utils.qvalues(pv, m=None, verbose=False, lowmem=False, pi0=None)[source]¶

Estimates q-values from p-values

m: number of tests. If not specified m = pv.size verbose: print verbose messages? (default False) lowmem: use memory-efficient in-place algorithm pi0: if None, it’s estimated as suggested in Storey and Tibshirani, 2003.

For most GWAS this is not necessary, since pi0 is extremely likely to be 1

Parameters:	pv – m – verbose – lowmem – pi0 –
Returns: