Python etc / JSON streaming

JSON streaming

Sometimes you need to create JSON from a big pile of data you can stream from some source, file or socket for instance. Sadly you can't encode generator as-is using the Python json library:

In [1]: json.dumps(range(10))
TypeError: Object of type 'range' is not JSON serializable

The simple solution here is to derive from list and override __iter__ method:

In [1]: class LazyList(list):
   ...:     def __init__(self, gen):
   ...:         self.__gen = gen
   ...:     def __iter__(self):
   ...:         return iter(self.__gen)
In [2]: json.dumps(LazyList(range(10)))
Out[2]: '[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]'

Mind that the solution is not problem-free. It might not work correctly for indent parameter, and also some versions of json require you to override __len__ as well.

The solution described is a more or less hack, the clear one is to use simplejson instead. It explicitly supports iterable_as_array flag:

In [1]: import simplejson as json
In [2]: json.dumps(range(10), iterable_as_array=True)
Out[3]: '[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]'

You also may put all data of the generator into a list and encode it afterward but will take some time and additional memory.