Python for Big Data =================== .. sidebar:: Page Contents .. contents:: :local: Managing Data ------------- Scipy ~~~~~ * https://www.scipy.org/ According to the SciPy Web page, "SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. In particular, these are some of the core packages: * NumPy * IPython * Pandas * Matplotlib * Sympy * SciPy library It is thus an agglomeration of useful pacakes and will prbably sufice for your projects in case you use Python. Pandas ~~~~~~ * http://pandas.pydata.org/ According to the Pandas Web page, "Pandas is a library library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language." In addition to access to charts via matplotlib it has elementary functionality for conduction data analysis. Pandas may be very suitable for your projects. Tutorial: http://pandas.pydata.org/pandas-docs/stable/10min.html Numpy ----- * http://www.numpy.org/ According to the Numpy Web page "NumPy is a package for scientific computing with Python. It contains a powerful N-dimensional array object, sophisticated (broadcasting) functions, tools for integrating C/C++ and Fortran code, useful linear algebra, Fourier transform, and random number capabilities Tutorial: https://docs.scipy.org/doc/numpy-dev/user/quickstart.html Graphics Libraries ---------------------------------------------------------------------- MatplotLib ~~~~~~~~~~ * http://matplotlib.org/ According the the Matplotlib Web page, "matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. matplotlib can be used in python scripts, the python and ipython shell (ala MATLAB®* or Mathematica®†), web application servers, and six graphical user interface toolkits." ggplot ~~~~~~ * http://ggplot.yhathq.com/ According to the ggplot python Web page ggplot is a plotting system for Python based on R's ggplot2. It allows to quickly generate some plots quickly with little effort. Often it may be easier to use than matplotlib directly. seaborn ~~~~~~~~ http://www.data-analysis-in-python.org/t_seaborn.html The good library for plotting is called seaborn which is build on top of matplotlib. It provides high level templates for common statistical plots. * Gallery: http://stanford.edu/~mwaskom/software/seaborn/examples/index.html * Original Tutorial: http://stanford.edu/~mwaskom/software/seaborn/tutorial.html * Additional Tutorial: https://stanford.edu/~mwaskom/software/seaborn/tutorial/distributions.html Bokeh ~~~~~ Bokeh is an interactive visualization library with focus on web browsers for display. Its goal is to provide a similar experience as D3.js * URL: http://bokeh.pydata.org/ * Gallery: http://bokeh.pydata.org/en/latest/docs/gallery.html pygal ~~~~~ Pygal is a simple API to produce graphs that can be easily embedded into your Web pages. It contains annotations when you hover over data points. It also allows to present the data in a table. * URL: http://pygal.org/ Network and Graphs ------------------ * igraph: http://www.pythonforsocialscientists.org/t_igraph.html * networkx: https://networkx.github.io/ Examples ---------------------------------------------------------------------- - :doc:`Fingerprint Analysis `