Prasad Muley bio photo

Prasad Muley

24, Python Programmer, Photographer, Portraitist and Pro-Biker.

Email Twitter Instagram Github

I love python’s generators generator expression. I will explain why I love it. You must use it whenever you’ve to return list of items. I learned some interesting stuff about generators.

What is generators in Python

  • Generators are a special class of functions that simplify the task of writing iterators.
  • Any function containing yield keyword or generator expression is a generator function.

Lol. It’s a so typical and theortical definition of generators.

Examples

def yeild_keyword(number):
    ''' using yield keyword'''
    for num in range(number):
        yield num


def generator_expression(number):
    '''using gen expression'''
    return (num for num in range(number))

Here, when function yield_keyword gets invoked returns a generator object. I know you’re thinking that there is not a single return statement and how it returns value?

The yield keyword or generator expression is detected by Python’s bytecode compiler which compiles the function specially as result.

Difference betweeen generators and regular function

  • Regular functions compute a value and return it, but generators return an iterator that returns a stream of values.
  • Generators working memory doesn’t include all the inputs and outputs. So generators expression is faster than list comprehension when you’re computing on large data.

I guess theory is enough now. We’re going to use cProfile to analyze time and memory_profiler.

Using cProfile

Example: Calculate square of numbers

from cProfile import run

def square_by_list_comprehensions(list_items):
    """return square of each number
       using list comprehensions"""
    return [num*num for num in list_items]


if __name__ == '__main__':
    run('square_by_list_comprehensions(range(1,100000))')

Output:

$ python squares_by_list_comprehension.py 

         4 function calls in 0.008 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    0.008    0.008 <string>:1(<module>)
        1    0.005    0.005    0.005    0.005 list_comprehension_cprofile.py:3(square_by_list_comprehensions)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.002    0.002    0.002    0.002 {range}

Example: Calculate square of numbers using generator

from cProfile import run

def square_by_generator(list_items):
    """return square of each number
       using generator expression"""
    # you can use generator expression
	# return (num for num in list_items)

    for num in list_items:
        yield num * num

if __name__ == '__main__':
    run('square_by_generator(range(1,100000))')

Output:

$ python square_using_generators.py

         4 function calls in 0.002 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    0.002    0.002 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 yield_cprofile.py:3(square_by_generator)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.002    0.002    0.002    0.002 {range}

Here, generator function takes 0.002 seconds and list comprehension function takes 0.008 seconds.

Using memory_profiler

from memory_profiler import profile

@profile
def square_by_list_comprehensions(list_items):
    """return square of each number
       using list comprehensions"""
    return [num*num for num in list_items]


if __name__ == '__main__':
    square_by_list_comprehensions(range(1,10000))

Output:

$ python square_by_list_comprehesion.py 
/usr/local/lib/python2.7/dist-packages/memory_profiler.py:88: UserWarning: psutil module not found. memory_profiler will be slow
  warnings.warn("psutil module not found. memory_profiler will be slow")
Filename: square_by_list_comprehension.py

Line #    Mem usage    Increment   Line Contents
================================================
     3     13.9 MiB      0.0 MiB   @profile
     4                             def square_by_list_comprehensions(list_items):
     5                                 """return square of each number
     6                                    using list comprehensions"""
     7     14.3 MiB      0.4 MiB       return [num*num for num in list_items]

Using generators

from memory_profiler import profile

@profile
def square_by_generator(list_items):
    """return square of each number
       using generator expression"""
    for num in list_items:
        yield num * num
    # return (num * num for num in list_items)

if __name__ == '__main__':
    square_by_generator(range(1,10000))

Output:

$ python square_of_numbers_using_generators.py

Here, generators function didn’t use memory but regular function takes around 14MB memory in order to calculate square of numbers.

Generators vs Mongo’s MapReduce

MapReduce is being used to improve performance. Recently, A response from top_users function got broken although it has map reduce function. This function didn’t fetch record for last six month (40k records). So I decided to play with generators and check performance.

Using MapReduce and cProfile

from db.models import *
import logging
from utils import enable_console_logging
from datetime import datetime
from dateutil.relativedelta import relativedelta
from cProfile import runctx


log = logging.getLogger()
enable_console_logging(log)

def get_top_users(store_id, from_date, to_date, fetch_all=False):

    data = {'more': False, 'top_users': [], 'total': 0}
    points_status_list = [APPROVED, AUTO_APPROVED, REDEEMED,
                          DEDUCTED, EXPIRED, EXHAUSTED]
    top_users_map_f = '''
    function(){
    val = {
           awarded_points : (this.transaction_type == 'award' ? this.points:0),
           redeemed_points : (this.transaction_type == 'redeem' ? this.points:0),
	       first_name: this.user_first_name,
	       last_name: this.user_last_name,
	       user_id: this.user_id
	};
	if(this.transaction_type == 'deduct'){
	    val.awarded_points = 0-val.awarded_points;
        }
        emit(this.user_id, val);
    }
    '''

    top_users_red_f = '''
    function(key, vals){
	red_val = {
	    user_id: key,
	    awarded_points: 0,
            redeemed_points: 0,
	    first_name: null,
	    last_name: null	
	}		
	vals.forEach(function(v){
	    red_val.awarded_points += v.awarded_points;
            red_val.redeemed_points += v.redeemed_points;
	    if(!red_val.first_name)
		red_val.first_name = v.first_name;
	    if(!red_val.last_name)
		red_val.last_name = v.last_name;
	});
	return red_val;
    }
    '''

    result_list = []
    
    for result in Transaction.objects(approved_time__gte = from_date, approved_time__lt = to_date + relativedelta(days = 1),
	                                  store_id = store_id, points_status__in = points_status_list).map_reduce(top_users_map_f,top_users_red_f, 'inline' ):
	result_list.append(result.value)
                
    result_list_sorted = sorted(result_list, key= lambda k: k.get('awarded_points',0), reverse = True) 
    data['total'] = len(result_list_sorted)
    print "Total records: %s" % data['total']
    data['more'] = True if len(result_list_sorted) > start_index + count else False
    if not fetch_all:
	result_list_sorted = result_list_sorted[start_index:start_index+count]

    data['top_users'] = result_list_sorted
    return data

def main():
    connect(db='db_name', host='localhost')
    store_id = 'c99ad71fcd6929dc9161b9b01781ce3a'
    start_date = datetime(1970,1,1)
    end_date = datetime(2016,9,4)
    start_time = datetime.utcnow()
    runctx('get_top_users(store_id, start_date, end_date)',
           globals(), locals())
    end_time = datetime.utcnow()

Output:

$ python top_users_using_mapreduce.py 
Total records: 38619
   
   236111 function calls (236074 primitive calls) in 35.868 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.015    0.015   35.868   35.868 <string>:1(<module>)
    38621    0.004    0.000    0.004    0.000 base.py:1157(_collection)
        1    0.000    0.000    0.000    0.000 base.py:1212(_query)
        2    0.000    0.000    0.001    0.001 base.py:1465(_sub_js_fields)
        3    0.000    0.000    0.000    0.000 base.py:45(__init__)
        2    0.000    0.000    0.000    0.000 base.py:525(clone)
        2    0.000    0.000    0.000    0.000 base.py:533(clone_into)
        1    0.000    0.000    0.000    0.000 base.py:78(__call__)
    38620    0.044    0.000   35.782    0.001 base.py:846(map_reduce)
        1    0.000    0.000    0.000    0.000 calendar.py:110(weekday)
        1    0.000    0.000    0.000    0.000 calendar.py:116(monthrange)
        2    0.000    0.000    0.000    0.000 code.py:44(__new__)
        2    0.000    0.000    0.000    0.000 code.py:65(scope)
        2    0.000    0.000    0.000    0.000 code.py:71(__repr__)
        9    0.000    0.000    0.005    0.001 collection.py:1084(ensure_index)
        1    0.000    0.000   35.721   35.721 collection.py:1598(inline_map_reduce)
       10    0.000    0.000    0.000    0.000 collection.py:162(full_name)
        9    0.000    0.000    0.000    0.000 collection.py:173(name)
       70    0.000    0.000    0.000    0.000 collection.py:182(database)
        9    0.000    0.000    0.000    0.000 collection.py:41(_gen_index_name)
       11    0.000    0.000    0.000    0.000 collection.py:51(__init__)
       10    0.000    0.000    0.000    0.000 collection.py:728(find)
        9    0.000    0.000    0.004    0.000 collection.py:951(create_index)
       12    0.000    0.000    0.000    0.000 common.py:178(validate_positive_float)
       12    0.000    0.000    0.000    0.000 common.py:205(validate_positive_float_or_zero)
       12    0.000    0.000    0.000    0.000 common.py:214(validate_read_preference)
       12    0.000    0.000    0.000    0.000 common.py:227(validate_tag_sets)
       12    0.000    0.000    0.000    0.000 common.py:271(validate_uuid_subtype)
       12    0.000    0.000    0.000    0.000 common.py:375(__init__)
       12    0.000    0.000    0.000    0.000 common.py:396(__init__)
        5    0.000    0.000    0.000    0.000 common.py:4(_import_class)
       12    0.000    0.000    0.000    0.000 common.py:438(__set_options)
       12    0.000    0.000    0.000    0.000 common.py:474(__get_write_concern)
       24    0.000    0.000    0.000    0.000 common.py:536(__get_slave_okay)
       25    0.000    0.000    0.000    0.000 common.py:554(__get_read_pref)
       23    0.000    0.000    0.000    0.000 common.py:570(__get_acceptable_latency)
       23    0.000    0.000    0.000    0.000 common.py:596(__get_tag_sets)
       13    0.000    0.000    0.000    0.000 common.py:618(__get_uuid_subtype)
       12    0.000    0.000    0.000    0.000 common.py:633(__get_safe)
       24    0.000    0.000    0.000    0.000 common.py:73(validate_boolean)
        1    0.000    0.000    0.000    0.000 connection.py:128(get_db)
        1    0.000    0.000    0.000    0.000 connection.py:82(get_connection)
       30    0.000    0.000    0.000    0.000 copy.py:101(_copy_immutable)
        6    0.000    0.000    0.000    0.000 copy.py:113(_copy_with_constructor)
        4    0.000    0.000    0.000    0.000 copy.py:306(_reconstruct)
       40    0.000    0.000    0.000    0.000 copy.py:66(copy)
        4    0.000    0.000    0.000    0.000 copy_reg.py:92(__newobj__)
        2    0.000    0.000    0.000    0.000 copy_reg.py:95(_slotnames)
       20    0.000    0.000   35.723    1.786 cursor.py:1000(_refresh)
       10    0.000    0.000    0.000    0.000 cursor.py:1082(__iter__)
       20    0.000    0.000   35.723    1.786 cursor.py:1085(next)
       10    0.000    0.000    0.000    0.000 cursor.py:200(conn_id)
       10    0.000    0.000    0.000    0.000 cursor.py:216(__del__)
       10    0.000    0.000    0.000    0.000 cursor.py:295(__query_spec)
       10    0.000    0.000    0.000    0.000 cursor.py:381(__query_options)
       10    0.000    0.000    0.000    0.000 cursor.py:391(__check_okay_to_chain)
       10    0.000    0.000    0.000    0.000 cursor.py:438(limit)
       10    0.000    0.000    0.000    0.000 cursor.py:76(__init__)
       10    0.000    0.000   35.722    3.572 cursor.py:912(__send_message)
       78    0.000    0.000    0.000    0.000 database.py:112(connection)
       39    0.000    0.000    0.000    0.000 database.py:121(name)
       11    0.000    0.000    0.000    0.000 database.py:183(__getattr__)
       11    0.000    0.000    0.000    0.000 database.py:193(__getitem__)
       10    0.000    0.000    0.000    0.000 database.py:264(_fix_outgoing)
       10    0.000    0.000   35.725    3.572 database.py:277(_command)
       10    0.000    0.000   35.725    3.572 database.py:349(command)
        1    0.000    0.000    0.000    0.000 database.py:39(__init__)
        1    0.000    0.000    0.000    0.000 document.py:139(_get_db)
      2/1    0.000    0.000    0.005    0.005 document.py:144(_get_collection)
        9    0.000    0.000    0.000    0.000 document.py:25(includes_cls)
        1    0.000    0.000    0.000    0.000 document.py:533(_get_collection_name)
        1    0.000    0.000    0.005    0.005 document.py:540(ensure_indexes)
    38619    0.012    0.000    0.012    0.000 document.py:738(__init__)
        4    0.000    0.000    0.000    0.000 document.py:762(_lookup_field)
        3    0.000    0.000    0.000    0.000 field_list.py:10(__init__)
        2    0.000    0.000    0.000    0.000 fields.py:377(to_mongo)
        2    0.000    0.000    0.000    0.000 fields.py:421(prepare_query_value)
        7    0.000    0.000    0.000    0.000 fields.py:87(prepare_query_value)
       10    0.000    0.000    0.000    0.000 helpers.py:126(_check_command_response)
        1    0.000    0.000    0.000    0.000 helpers.py:230(_check_database_name)
       18    0.000    0.000    0.000    0.000 helpers.py:37(_index_list)
        9    0.000    0.000    0.000    0.000 helpers.py:53(_index_document)
       10    0.004    0.000    0.118    0.012 helpers.py:80(_unpack_response)
        1    0.000    0.000    0.005    0.005 manager.py:27(__get__)
       10    0.000    0.000    0.000    0.000 member.py:148(get_socket)
       10    0.000    0.000    0.000    0.000 member.py:155(maybe_return_socket)
       10    0.000    0.000    0.000    0.000 mongo_client.py:1078(__check_bson_size)
       20    0.004    0.000   35.603    1.780 mongo_client.py:1146(__receive_data_on_socket)
       10    0.000    0.000   35.603    3.560 mongo_client.py:1161(__receive_message_on_socket)
       10    0.000    0.000   35.603    3.560 mongo_client.py:1177(__send_and_receive)
       10    0.000    0.000   35.604    3.560 mongo_client.py:1190(_send_message_with_response)
        1    0.000    0.000    0.000    0.000 mongo_client.py:1299(__getattr__)
        1    0.000    0.000    0.000    0.000 mongo_client.py:1310(__getitem__)
        9    0.000    0.000    0.000    0.000 mongo_client.py:397(_cached)
        9    0.000    0.000    0.000    0.000 mongo_client.py:407(_cache_index)
       10    0.000    0.000    0.000    0.000 mongo_client.py:497(__check_auth)
       30    0.000    0.000    0.000    0.000 mongo_client.py:515(__member_property)
       20    0.000    0.000    0.000    0.000 mongo_client.py:556(is_mongos)
       10    0.000    0.000    0.000    0.000 mongo_client.py:603(auto_start_request)
       10    0.000    0.000    0.000    0.000 mongo_client.py:608(get_document_class)
       10    0.000    0.000    0.000    0.000 mongo_client.py:621(tz_aware)
       10    0.000    0.000    0.000    0.000 mongo_client.py:629(max_bson_size)
       10    0.000    0.000    0.000    0.000 mongo_client.py:778(__ensure_member)
       10    0.000    0.000    0.000    0.000 mongo_client.py:909(__socket)
       10    0.000    0.000    0.000    0.000 pool.py:100(__ne__)
       10    0.000    0.000    0.000    0.000 pool.py:103(__hash__)
       10    0.000    0.000    0.000    0.000 pool.py:309(get_socket)
       10    0.000    0.000    0.000    0.000 pool.py:415(maybe_return_socket)
       10    0.000    0.000    0.000    0.000 pool.py:436(_return_socket)
       10    0.000    0.000    0.000    0.000 pool.py:456(_check)
       20    0.000    0.000    0.000    0.000 pool.py:539(_get_request_state)
       10    0.000    0.000    0.000    0.000 pool.py:81(set_wire_version_range)
       30    0.000    0.000    0.000    0.000 pool.py:95(__eq__)
        4    0.000    0.000    0.001    0.000 re.py:144(sub)
        4    0.000    0.000    0.001    0.000 re.py:226(_compile)
        1    0.000    0.000    0.000    0.000 relativedelta.py:109(__init__)
        1    0.000    0.000    0.000    0.000 relativedelta.py:202(_fix)
        1    0.000    0.000    0.000    0.000 relativedelta.py:245(__radd__)
       10    0.000    0.000    0.000    0.000 socket.py:223(meth)
       48    0.000    0.000    0.000    0.000 son.py:102(__setitem__)
       11    0.000    0.000    0.000    0.000 son.py:111(keys)
       67    0.000    0.000    0.000    0.000 son.py:122(__iter__)
       48    0.000    0.000    0.000    0.000 son.py:180(update)
        1    0.000    0.000    0.000    0.000 son.py:196(get)
        1    0.000    0.000    0.000    0.000 son.py:213(__len__)
       19    0.000    0.000    0.000    0.000 son.py:85(__init__)
       19    0.000    0.000    0.000    0.000 son.py:91(__new__)
    19/10    0.000    0.000    0.000    0.000 son.py:96(__repr__)
        8    0.000    0.000    0.000    0.000 sre_compile.py:178(_compile_charset)
        8    0.000    0.000    0.000    0.000 sre_compile.py:207(_optimize_charset)
       22    0.000    0.000    0.000    0.000 sre_compile.py:24(_identityfunction)
        2    0.000    0.000    0.000    0.000 sre_compile.py:258(_mk_bitmap)
     10/2    0.000    0.000    0.000    0.000 sre_compile.py:32(_compile)
        6    0.000    0.000    0.000    0.000 sre_compile.py:354(_simple)
        2    0.000    0.000    0.000    0.000 sre_compile.py:361(_compile_info)
        4    0.000    0.000    0.000    0.000 sre_compile.py:474(isstring)
        2    0.000    0.000    0.001    0.000 sre_compile.py:480(_code)
        2    0.000    0.000    0.001    0.001 sre_compile.py:495(compile)
       24    0.000    0.000    0.000    0.000 sre_parse.py:126(__len__)
       42    0.000    0.000    0.000    0.000 sre_parse.py:130(__getitem__)
        6    0.000    0.000    0.000    0.000 sre_parse.py:134(__setitem__)
       18    0.000    0.000    0.000    0.000 sre_parse.py:138(append)
     16/8    0.000    0.000    0.000    0.000 sre_parse.py:140(getwidth)
        2    0.000    0.000    0.000    0.000 sre_parse.py:178(__init__)
       62    0.000    0.000    0.000    0.000 sre_parse.py:182(__next)
       32    0.000    0.000    0.000    0.000 sre_parse.py:195(match)
       50    0.000    0.000    0.000    0.000 sre_parse.py:201(get)
       10    0.000    0.000    0.000    0.000 sre_parse.py:257(_escape)
      4/2    0.000    0.000    0.000    0.000 sre_parse.py:301(_parse_sub)
      4/2    0.000    0.000    0.000    0.000 sre_parse.py:379(_parse)
        2    0.000    0.000    0.000    0.000 sre_parse.py:663(parse)
        2    0.000    0.000    0.000    0.000 sre_parse.py:67(__init__)
        2    0.000    0.000    0.000    0.000 sre_parse.py:72(opengroup)
        2    0.000    0.000    0.000    0.000 sre_parse.py:83(closegroup)
       10    0.000    0.000    0.000    0.000 sre_parse.py:90(__init__)
    38619    0.008    0.000    0.020    0.000 top_users_using_mapreduce.py:102(<lambda>)
        1    0.025    0.025   35.853   35.853 top_users_using_mapreduce.py:53(get_top_users)
       20    0.000    0.000    0.000    0.000 thread_util.py:106(get)
       10    0.000    0.000    0.000    0.000 thread_util.py:230(acquire)
       10    0.000    0.000    0.000    0.000 thread_util.py:255(release)
       10    0.000    0.000    0.000    0.000 thread_util.py:275(release)
       20    0.000    0.000    0.000    0.000 thread_util.py:49(acquire)
       20    0.000    0.000    0.000    0.000 thread_util.py:52(release)
       20    0.000    0.000    0.000    0.000 thread_util.py:92(_make_vigil)
       10    0.000    0.000    0.000    0.000 threading.py:225(_is_owned)
       10    0.000    0.000    0.000    0.000 threading.py:276(notify)
       10    0.000    0.000    0.000    0.000 threading.py:62(_note)
        1    0.000    0.000    0.000    0.000 transform.py:31(query)
        1    0.000    0.000    0.000    0.000 visitor.py:116(__and__)
        4    0.000    0.000    0.000    0.000 visitor.py:153(__init__)
        2    0.000    0.000    0.000    0.000 visitor.py:156(accept)
        2    0.000    0.000    0.000    0.000 visitor.py:159(empty)
        1    0.000    0.000    0.000    0.000 visitor.py:20(visit_query)
        1    0.000    0.000    0.000    0.000 visitor.py:70(__init__)
        1    0.000    0.000    0.000    0.000 visitor.py:79(visit_query)
        1    0.000    0.000    0.000    0.000 visitor.py:90(to_query)
        1    0.000    0.000    0.000    0.000 visitor.py:98(_combine)
        2    0.000    0.000    0.000    0.000 {_sre.compile}
       70    0.000    0.000    0.000    0.000 {_struct.unpack}
        5    0.000    0.000    0.000    0.000 {abs}
       10    0.114    0.011    0.114    0.011 {bson._cbson.decode_all}
       25    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x890140}
       18    0.000    0.000    0.000    0.000 {built-in method utcnow}
      114    0.000    0.000    0.000    0.000 {getattr}
       88    0.000    0.000    0.000    0.000 {hasattr}
       10    0.000    0.000    0.000    0.000 {hash}
       24    0.000    0.000    0.000    0.000 {id}
      407    0.000    0.000    0.000    0.000 {isinstance}
  462/455    0.000    0.000    0.000    0.000 {len}
        4    0.000    0.000    0.000    0.000 {method '__reduce_ex__' of 'object' objects}
       60    0.000    0.000    0.000    0.000 {method 'acquire' of 'thread.lock' objects}
       10    0.000    0.000    0.000    0.000 {method 'add' of 'set' objects}
    38904    0.003    0.000    0.003    0.000 {method 'append' of 'list' objects}
       18    0.000    0.000    0.000    0.000 {method 'copy' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        6    0.000    0.000    0.000    0.000 {method 'extend' of 'list' objects}
    38789    0.012    0.000    0.012    0.000 {method 'get' of 'dict' objects}
        2    0.000    0.000    0.000    0.000 {method 'get' of 'dictproxy' objects}
       18    0.000    0.000    0.000    0.000 {method 'isdigit' of 'str' objects}
        4    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
       41    0.000    0.000    0.000    0.000 {method 'iteritems' of 'dict' objects}
       23    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
        9    0.000    0.000    0.000    0.000 {method 'join' of 'unicode' objects}
       10    0.000    0.000    0.000    0.000 {method 'lower' of 'str' objects}
        6    0.000    0.000    0.000    0.000 {method 'lstrip' of 'str' objects}
       78    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
        3    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}
       10    0.000    0.000    0.000    0.000 {method 'pop' of 'set' objects}
       10    0.000    0.000    0.000    0.000 {method 'popleft' of 'collections.deque' objects}
       96   35.599    0.371   35.599    0.371 {method 'recv' of '_socket.socket' objects}
       50    0.000    0.000    0.000    0.000 {method 'release' of 'thread.lock' objects}
        2    0.000    0.000    0.000    0.000 {method 'remove' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'replace' of 'datetime.datetime' objects}
       10    0.000    0.000    0.000    0.000 {method 'replace' of 'str' objects}
       10    0.000    0.000    0.000    0.000 {method 'sendall' of '_socket.socket' objects}
        4    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
       20    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
        4    0.000    0.000    0.000    0.000 {method 'sub' of '_sre.SRE_Pattern' objects}
       38    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}
       12    0.000    0.000    0.000    0.000 {method 'values' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'weekday' of 'datetime.date' objects}
       21    0.000    0.000    0.000    0.000 {min}
       26    0.000    0.000    0.000    0.000 {ord}
       20    0.000    0.000    0.000    0.000 {posix.getpid}
       10    0.000    0.000    0.000    0.000 {pymongo._cmessage._query_message}
        6    0.000    0.000    0.000    0.000 {range}
       10    0.000    0.000    0.000    0.000 {repr}
       40    0.000    0.000    0.000    0.000 {setattr}
        2    0.017    0.008    0.037    0.019 {sorted}
       20    0.000    0.000    0.000    0.000 {time.time}


(env) 
  • So top_user function took 36 seconds. Let’s see generator example:

Using generators and cProfile

from db.models import *
import logging
from utils import enable_console_logging
from datetime import datetime
from dateutil.relativedelta import relativedelta
from cProfile import runctx


def get_transactions_using_generator(**kwargs):
    for lt in Transaction.objects(**kwargs):
        yield lt


def get_top_users_generators(merchant_id, from_date, to_date, fetch_all=False):

    data = {'more': False, 'top_users': [], 'total': 0}
    points_status_list = [APPROVED, AUTO_APPROVED, REDEEMED,
                          DEDUCTED, EXPIRED, EXHAUSTED]
    result_dict = dict()

    kwargs = dict(store_id = store_id, points_status__in = points_status_list,
                  approved_time__lt = to_date + relativedelta(days=1),
                  approved_time__gte = from_date)
    
    for result in get_transactions_using_generator(**kwargs):
        if result.transaction_type == 'award':
            if result.user_email in result_dict:
                result_dict[result.user_email] += result.points 
            else:
                result_dict[result.user_email] = result.points
        else:
            if result.user_email in result_dict:
                result_dict[result.user_email] -= result.points
            else:
                result_dict[result.user_email] = (-result.points)
                
    result_list_sorted = [{user: result_dict[user]} for user in
                          sorted(result_dict, key=result_dict.get, reverse = True)]
    data['total'] = len(result_list_sorted)
    print "Total records: %s" % data['total']
    data['more'] = True if len(result_list_sorted) > start_index + count else False
    if not fetch_all:
	result_dict_sorted = result_list_sorted[start_index:start_index+count]

    data['top_users'] = result_list_sorted
    return data


def main():
    connect(db='ss_db', host='localhost')
    # get merchant ids who have loyalty enabled
    store_id = 'c99ad71fcd6929dc9161b9b01781ce3a'
    start_date = datetime(1970,1,1)
    end_date = datetime(2016,9,4)
    start_time = datetime.utcnow()
    runctx('get_top_users_using_generators(merchant_id, start_date, end_date)',
           globals(), locals())
    end_time = datetime.utcnow()


if __name__ == '__main__':
    main()

Output using generators

$ python top_users_using_generator.py 
Total records: 38643
         32932865 function calls (32788303 primitive calls) in 28.247 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.014    0.014   28.247   28.247 <string>:1(<module>)
    84206    0.339    0.000   27.415    0.000 base.py:1131(next)
        1    0.000    0.000    0.000    0.000 base.py:1157(_collection)
        1    0.000    0.000    0.000    0.000 base.py:1164(_cursor_args)
    84206    0.027    0.000    0.027    0.000 base.py:1178(_cursor)
        1    0.000    0.000    0.000    0.000 base.py:1212(_query)
        2    0.000    0.000    0.000    0.000 base.py:45(__init__)
        1    0.000    0.000    0.000    0.000 base.py:525(clone)
        1    0.000    0.000    0.000    0.000 base.py:533(clone_into)
        1    0.000    0.000    0.000    0.000 base.py:78(__call__)
        1    0.000    0.000    0.000    0.000 calendar.py:110(weekday)
        1    0.000    0.000    0.000    0.000 calendar.py:116(monthrange)
        9    0.000    0.000    0.005    0.001 collection.py:1084(ensure_index)
       23    0.000    0.000    0.000    0.000 collection.py:162(full_name)
        9    0.000    0.000    0.000    0.000 collection.py:173(name)
    84287    0.019    0.000    0.019    0.000 collection.py:182(database)
        9    0.000    0.000    0.000    0.000 collection.py:41(_gen_index_name)
       10    0.000    0.000    0.000    0.000 collection.py:51(__init__)
       10    0.000    0.000    0.000    0.000 collection.py:728(find)
        9    0.000    0.000    0.004    0.000 collection.py:951(create_index)
       11    0.000    0.000    0.000    0.000 common.py:178(validate_positive_float)
       11    0.000    0.000    0.000    0.000 common.py:205(validate_positive_float_or_zero)
       11    0.000    0.000    0.000    0.000 common.py:214(validate_read_preference)
       11    0.000    0.000    0.000    0.000 common.py:227(validate_tag_sets)
       11    0.000    0.000    0.000    0.000 common.py:271(validate_uuid_subtype)
       11    0.000    0.000    0.000    0.000 common.py:375(__init__)
       11    0.000    0.000    0.000    0.000 common.py:396(__init__)
  2923321    0.947    0.000    1.204    0.000 common.py:4(_import_class)
       11    0.000    0.000    0.000    0.000 common.py:438(__set_options)
       11    0.000    0.000    0.000    0.000 common.py:474(__get_write_concern)
       20    0.000    0.000    0.000    0.000 common.py:536(__get_slave_okay)
       22    0.000    0.000    0.000    0.000 common.py:554(__get_read_pref)
       21    0.000    0.000    0.000    0.000 common.py:570(__get_acceptable_latency)
       21    0.000    0.000    0.000    0.000 common.py:596(__get_tag_sets)
       12    0.000    0.000    0.000    0.000 common.py:618(__get_uuid_subtype)
       11    0.000    0.000    0.000    0.000 common.py:633(__get_safe)
       22    0.000    0.000    0.000    0.000 common.py:73(validate_boolean)
        1    0.000    0.000    0.000    0.000 connection.py:128(get_db)
        1    0.000    0.000    0.000    0.000 connection.py:82(get_connection)
       15    0.000    0.000    0.000    0.000 copy.py:101(_copy_immutable)
        3    0.000    0.000    0.000    0.000 copy.py:113(_copy_with_constructor)
        2    0.000    0.000    0.000    0.000 copy.py:306(_reconstruct)
       20    0.000    0.000    0.000    0.000 copy.py:66(copy)
        2    0.000    0.000    0.000    0.000 copy_reg.py:92(__newobj__)
        2    0.000    0.000    0.000    0.000 copy_reg.py:95(_slotnames)
       33    0.001    0.000    2.606    0.079 cursor.py:1000(_refresh)
        9    0.000    0.000    0.000    0.000 cursor.py:1082(__iter__)
    84224    0.251    0.000    3.005    0.000 cursor.py:1085(next)
        9    0.000    0.000    0.000    0.000 cursor.py:200(conn_id)
       10    0.000    0.000    0.000    0.000 cursor.py:216(__del__)
       10    0.000    0.000    0.000    0.000 cursor.py:295(__query_spec)
       10    0.000    0.000    0.000    0.000 cursor.py:381(__query_options)
        9    0.000    0.000    0.000    0.000 cursor.py:391(__check_okay_to_chain)
        9    0.000    0.000    0.000    0.000 cursor.py:438(limit)
       10    0.000    0.000    0.000    0.000 cursor.py:76(__init__)
       23    0.003    0.000    2.605    0.113 cursor.py:912(__send_message)
       90    0.000    0.000    0.000    0.000 database.py:112(connection)
       37    0.000    0.000    0.000    0.000 database.py:121(name)
       10    0.000    0.000    0.000    0.000 database.py:183(__getattr__)
       10    0.000    0.000    0.000    0.000 database.py:193(__getitem__)
    84214    0.110    0.000    0.110    0.000 database.py:264(_fix_outgoing)
        9    0.000    0.000    0.004    0.000 database.py:277(_command)
        9    0.000    0.000    0.004    0.000 database.py:349(command)
        1    0.000    0.000    0.000    0.000 database.py:39(__init__)
  3031380    4.777    0.000    8.645    0.000 document.py:113(__setattr__)
        1    0.000    0.000    0.000    0.000 document.py:139(_get_db)
      2/1    0.000    0.000    0.005    0.005 document.py:144(_get_collection)
        9    0.000    0.000    0.000    0.000 document.py:25(includes_cls)
    84205    3.628    0.000   15.997    0.000 document.py:35(__init__)
        1    0.000    0.000    0.000    0.000 document.py:533(_get_collection_name)
    84205    4.207    0.000   24.046    0.000 document.py:539(_from_son)
        1    0.000    0.000    0.005    0.005 document.py:540(ensure_indexes)
  1804294    1.200    0.000    1.200    0.000 document.py:547(<genexpr>)
        4    0.000    0.000    0.000    0.000 document.py:762(_lookup_field)
    84205    0.329    0.000    0.435    0.000 document.py:824(__set_field_display)
        2    0.000    0.000    0.000    0.000 field_list.py:10(__init__)
        1    0.000    0.000    0.000    0.000 field_list.py:67(__nonzero__)
   252697    0.031    0.000    0.031    0.000 fields.py:126(to_python)
   336669    0.112    0.000    0.112    0.000 fields.py:173(to_python)
228757/84205    1.167    0.000    1.802    0.000 fields.py:233(to_python)
        4    0.000    0.000    0.000    0.000 fields.py:241(to_python)
   168272    0.057    0.000    0.057    0.000 fields.py:346(to_python)
        2    0.000    0.000    0.000    0.000 fields.py:377(to_mongo)
    84205    0.042    0.000    0.050    0.000 fields.py:390(to_python)
        2    0.000    0.000    0.000    0.000 fields.py:421(prepare_query_value)
   794037    0.261    0.000    0.342    0.000 fields.py:62(to_python)
  1227086    0.465    0.000    0.600    0.000 fields.py:84(__get__)
        7    0.000    0.000    0.000    0.000 fields.py:87(prepare_query_value)
  2610355    2.194    0.000    3.868    0.000 fields.py:94(__set__)
        9    0.000    0.000    0.000    0.000 helpers.py:126(_check_command_response)
        1    0.000    0.000    0.000    0.000 helpers.py:230(_check_database_name)
       18    0.000    0.000    0.000    0.000 helpers.py:37(_index_list)
        9    0.000    0.000    0.000    0.000 helpers.py:53(_index_document)
       23    0.013    0.001    1.120    0.049 helpers.py:80(_unpack_response)
        1    0.000    0.000    0.005    0.005 manager.py:27(__get__)
       23    0.000    0.000    0.001    0.000 member.py:148(get_socket)
       23    0.000    0.000    0.001    0.000 member.py:155(maybe_return_socket)
       23    0.000    0.000    0.000    0.000 mongo_client.py:1078(__check_bson_size)
       46    0.030    0.001    1.478    0.032 mongo_client.py:1146(__receive_data_on_socket)
       23    0.000    0.000    1.479    0.064 mongo_client.py:1161(__receive_message_on_socket)
       23    0.000    0.000    1.480    0.064 mongo_client.py:1177(__send_and_receive)
       23    0.000    0.000    1.482    0.064 mongo_client.py:1190(_send_message_with_response)
        1    0.000    0.000    0.000    0.000 mongo_client.py:1299(__getattr__)
        1    0.000    0.000    0.000    0.000 mongo_client.py:1310(__getitem__)
        9    0.000    0.000    0.000    0.000 mongo_client.py:397(_cached)
        9    0.000    0.000    0.000    0.000 mongo_client.py:407(_cache_index)
       23    0.000    0.000    0.000    0.000 mongo_client.py:497(__check_auth)
       30    0.000    0.000    0.000    0.000 mongo_client.py:515(__member_property)
       20    0.000    0.000    0.000    0.000 mongo_client.py:556(is_mongos)
       23    0.000    0.000    0.000    0.000 mongo_client.py:603(auto_start_request)
       10    0.000    0.000    0.000    0.000 mongo_client.py:608(get_document_class)
       10    0.000    0.000    0.000    0.000 mongo_client.py:621(tz_aware)
       10    0.000    0.000    0.000    0.000 mongo_client.py:629(max_bson_size)
       23    0.000    0.000    0.000    0.000 mongo_client.py:778(__ensure_member)
       23    0.000    0.000    0.001    0.000 mongo_client.py:909(__socket)
    84205    0.068    0.000    0.104    0.000 objectid.py:203(__validate)
    84205    0.025    0.000    0.129    0.000 objectid.py:77(__init__)
       23    0.000    0.000    0.000    0.000 pool.py:100(__ne__)
       23    0.000    0.000    0.000    0.000 pool.py:103(__hash__)
       23    0.000    0.000    0.001    0.000 pool.py:309(get_socket)
       23    0.000    0.000    0.001    0.000 pool.py:415(maybe_return_socket)
       23    0.000    0.000    0.001    0.000 pool.py:436(_return_socket)
       12    0.000    0.000    0.000    0.000 pool.py:44(_closed)
       23    0.000    0.000    0.000    0.000 pool.py:456(_check)
       46    0.000    0.000    0.000    0.000 pool.py:539(_get_request_state)
       23    0.000    0.000    0.000    0.000 pool.py:81(set_wire_version_range)
       69    0.000    0.000    0.000    0.000 pool.py:95(__eq__)
        1    0.000    0.000    0.000    0.000 queryset.py:25(__iter__)
    84206    0.023    0.000   27.661    0.000 queryset.py:65(_iter_results)
      843    0.208    0.000   27.638    0.033 queryset.py:83(_populate_cache)
        1    0.000    0.000    0.000    0.000 relativedelta.py:109(__init__)
        1    0.000    0.000    0.000    0.000 relativedelta.py:202(_fix)
        1    0.000    0.000    0.000    0.000 relativedelta.py:245(__radd__)
   168410    0.023    0.000    0.023    0.000 signals.py:30(<lambda>)
       35    0.000    0.000    0.000    0.000 socket.py:223(meth)
       43    0.000    0.000    0.000    0.000 son.py:102(__setitem__)
        9    0.000    0.000    0.000    0.000 son.py:111(keys)
       61    0.000    0.000    0.000    0.000 son.py:122(__iter__)
   168455    0.103    0.000    0.135    0.000 son.py:180(update)
    84223    0.137    0.000    0.272    0.000 son.py:85(__init__)
    84223    0.106    0.000    0.135    0.000 son.py:91(__new__)
     18/9    0.000    0.000    0.000    0.000 son.py:96(__repr__)
    84206    0.218    0.000   27.884    0.000 top_users_using_generator.py:11(get_transactions_using_generator)
        1    0.185    0.185   28.233   28.233 top_users_using_generator.py:16(get_top_users_using_generators)
       46    0.000    0.000    0.000    0.000 thread_util.py:106(get)
       23    0.000    0.000    0.000    0.000 thread_util.py:230(acquire)
       23    0.000    0.000    0.000    0.000 thread_util.py:255(release)
       23    0.000    0.000    0.000    0.000 thread_util.py:275(release)
       46    0.000    0.000    0.000    0.000 thread_util.py:49(acquire)
       46    0.000    0.000    0.000    0.000 thread_util.py:52(release)
       46    0.000    0.000    0.000    0.000 thread_util.py:92(_make_vigil)
       23    0.000    0.000    0.000    0.000 threading.py:225(_is_owned)
       23    0.000    0.000    0.000    0.000 threading.py:276(notify)
       23    0.000    0.000    0.000    0.000 threading.py:62(_note)
        1    0.000    0.000    0.000    0.000 transform.py:31(query)
        1    0.000    0.000    0.000    0.000 visitor.py:116(__and__)
        3    0.000    0.000    0.000    0.000 visitor.py:153(__init__)
        2    0.000    0.000    0.000    0.000 visitor.py:156(accept)
        2    0.000    0.000    0.000    0.000 visitor.py:159(empty)
        1    0.000    0.000    0.000    0.000 visitor.py:20(visit_query)
        1    0.000    0.000    0.000    0.000 visitor.py:70(__init__)
        1    0.000    0.000    0.000    0.000 visitor.py:79(visit_query)
        1    0.000    0.000    0.000    0.000 visitor.py:90(to_query)
        1    0.000    0.000    0.000    0.000 visitor.py:98(_combine)
      161    0.000    0.000    0.000    0.000 {_struct.unpack}
        5    0.000    0.000    0.000    0.000 {abs}
       23    0.977    0.042    1.107    0.048 {bson._cbson.decode_all}
    84225    0.028    0.000    0.028    0.000 {built-in method __new__ of type object at 0x890140}
       18    0.000    0.000    0.000    0.000 {built-in method utcnow}
      289    0.000    0.000    0.000    0.000 {callable}
   890406    0.396    0.000    0.853    0.000 {getattr}
   505701    0.247    0.000    0.247    0.000 {hasattr}
       23    0.000    0.000    0.000    0.000 {hash}
       48    0.000    0.000    0.000    0.000 {id}
  4078717    0.833    0.000    0.833    0.000 {isinstance}
   170389    0.015    0.000    0.015    0.000 {len}
        2    0.000    0.000    0.000    0.000 {method '__reduce_ex__' of 'object' objects}
      138    0.000    0.000    0.000    0.000 {method 'acquire' of 'thread.lock' objects}
       23    0.000    0.000    0.000    0.000 {method 'add' of 'set' objects}
    84303    0.015    0.000    0.015    0.000 {method 'append' of 'list' objects}
       18    0.000    0.000    0.000    0.000 {method 'copy' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
       12    0.000    0.000    0.000    0.000 {method 'fileno' of '_socket.socket' objects}
  8565209    1.190    0.000    1.190    0.000 {method 'get' of 'dict' objects}
        2    0.000    0.000    0.000    0.000 {method 'get' of 'dictproxy' objects}
       18    0.000    0.000    0.000    0.000 {method 'isdigit' of 'str' objects}
   288755    0.184    0.000    0.184    0.000 {method 'items' of 'dict' objects}
   421063    0.059    0.000    0.059    0.000 {method 'iteritems' of 'dict' objects}
       22    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
        9    0.000    0.000    0.000    0.000 {method 'join' of 'unicode' objects}
        9    0.000    0.000    0.000    0.000 {method 'lower' of 'str' objects}
        6    0.000    0.000    0.000    0.000 {method 'lstrip' of 'str' objects}
    84277    0.016    0.000    0.016    0.000 {method 'pop' of 'dict' objects}
        3    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}
       23    0.000    0.000    0.000    0.000 {method 'pop' of 'set' objects}
    84214    0.010    0.000    0.010    0.000 {method 'popleft' of 'collections.deque' objects}
       76    1.448    0.019    1.448    0.019 {method 'recv' of '_socket.socket' objects}
      115    0.000    0.000    0.000    0.000 {method 'release' of 'thread.lock' objects}
        1    0.000    0.000    0.000    0.000 {method 'replace' of 'datetime.datetime' objects}
        9    0.000    0.000    0.000    0.000 {method 'replace' of 'str' objects}
       23    0.000    0.000    0.000    0.000 {method 'sendall' of '_socket.socket' objects}
        4    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
       18    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
       45    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}
       11    0.000    0.000    0.000    0.000 {method 'values' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'weekday' of 'datetime.date' objects}
        1    0.000    0.000    0.000    0.000 {min}
       46    0.000    0.000    0.000    0.000 {posix.getpid}
       13    0.000    0.000    0.000    0.000 {pymongo._cmessage._get_more_message}
       10    0.000    0.000    0.000    0.000 {pymongo._cmessage._query_message}
        9    0.000    0.000    0.000    0.000 {repr}
       12    0.000    0.000    0.000    0.000 {select.select}
  2610375    1.350    0.000    8.808    0.000 {setattr}
    84207    0.151    0.000    0.151    0.000 {sorted}
       46    0.000    0.000    0.000    0.000 {time.time}


(env)

Here, MapReduce takes more time than generators. Response time will improve if I use generators instead of map reduce.

When to use generators

  • It gives you lazy evaluation. Actually instead of returning the all elements , it returns them one by one.
  • Useful for calculating large sets of results.
    • if we don’t know whether we need all results or not , in that case we don’t need to allocate the memory for all results at the same time.