PySpark Plaso
Release 2019
A tool for distributed extraction of timestamps from various files using extractors adapted from the Plaso engine to Apache Spark.
|
Public Member Functions | |
def | __init__ (self, hdfs_base_uri, spark_context) |
def | extract (self, hdfs_path="") |
![]() | |
def | __init__ (self, hdfs_base_uri) |
def | make_hdfs_uri (self, hdfs_path) |
def | strip_hdfs_uri (self, hdfs_path) |
Public Attributes | |
spark_context | |
![]() | |
hdfs_base_uri | |
Controller for extraction of events by the Palso.
def plaso.tarzan.app.controllers.plasocontroller.PlasoController.__init__ | ( | self, | |
hdfs_base_uri, | |||
spark_context | |||
) |
Create a new controller that will be utilizing HDFS URI and SparkContext. :param hdfs_base_uri: the base HDFS URI to store :param spark_context: the Spark context
def plaso.tarzan.app.controllers.plasocontroller.PlasoController.extract | ( | self, | |
hdfs_path = "" |
|||
) |
Run Plaso Extractors on a given HDFS path to generate events. :param hdfs_path: the path where to extract events from :return: the Flask Response with a JSON document of extracted events
plaso.tarzan.app.controllers.plasocontroller.PlasoController.spark_context |