____ _ ____ ___ | __/ | _____ ___ _ |___ \ / _ \ | |_ | |/ _ \ \ /\ / | | | | __) || | | | | _/| | (_) \ V V /| |_| | / __/ | |_| | |_| |_|\___/ \_/\_/ \__, | |_____(_)___/ |___/ Fast execution of stream based IP flow queries Johannes Schauer 1. Introduction =============== .-. .-. ,-( _)-. ,-( _)-. .-(_ (_ )-. .-(_ (_ )-. (_ LAN ) +----------+ (_ LAN ) `-(______)-' | Analyzer | `-(______)-' \ | +----------+ \ | | .-. \ | | ,-( _)-. \ | | .-(_ (_ )-. +------------------+ +-------------------+ (_ LAN )----| NetFlow Exporter |---| NetFlow Collector | `-(______)-' +------------------+ +-------------------+ | | | .-. ,-( _)-. .-(_ (_ )-. (_ Internet ) `-(______)-' * NetFlow records are identified by: - srcaddr - dstaddr - srcport - dstport - prot (TCP, UDP, ICMP...) - input (input interface) - tos (IP type of service) * and contain aggregated information like: - dflows - dpkts - doctets - first - last - tcp_flags - ... * current analysis software like flow-tools and nfdump allows only limited analysis of flow record traces due to restriction on absolute filtering * Flowy allows relative filtering using a flow based query language 2. Query Language ================= 2.1 Filtering Pipeline ---------------------- flow records group records tuples flow records |<---------------------->||<---------------------->||<--------->||<---------->| +--------+ +---------+ +-------------+ _| Filter |->| Grouper |->| Groupfilter |_ .-->output / +--------+ +---------+ +-------------+ \ / +----------+ +--------+ +-----------+ | Splitter | | Merger |->| Ungrouper |->output +----------+ +--------+ +-----------+ \_+--------+ +---------+ +-------------+_/ \ | Filter |->| Grouper |->| Groupfilter | `-->output +--------+ +---------+ +-------------+ 2.2 Example Flow Query ---------------------- 3. Flowy Improvements ===================== * HDF/PyTables * Cython+C instead of Python 4. C Implementation of Flowy core ================================= 5. Benchmark ============ number | runtime old | runtime new of records | Python Flowy in s | C Flowy in s ------------+-------------------+------------- 103k | 1177 | 0.3 337k | 20875 | 3.4 656k | 70035 | 13 868k | 131578 | 23 1161k | 234714 (2.7 days) | 86 * python * O(n³) * deepcopy * passing records 6. Outlook ========== * combining C core implementation and Python * tree search * multithreading 7. Conclusion ============= flow-tools, nfdump unix_secs, unix_nsecs, sysuptime, exaddr, dflows, dpkts, doctets, first, last, engine_type, engine_id, nexthop, output, tcp_flags, src_mask, dst_mask, src_as, dst_as, in_encaps, out_encaps, peer_nexthop, router_sc, marked_tos, extra_pkts, src_tag, dst_tag