EQL statements are used to derive and aggregate information from one or more streams of events, and to join event streams. This section outlines EQL syntax. It also outlines the built-in views, which are the building blocks for deriving and aggregating information from event streams.
EQL is similar to SQL in its use of the select clause and the where clause. Where EQL differs most from SQL is in the use of tables. EQL replaces tables with the concept of event streams.
EQL statements contain definitions of one or more views. Similar to tables in an SQL statement, views define the data available for querying and filtering. Some views represent windows over a stream of events. Other views derive statistics from event properties, group events or handle unique event property values. Views can be staggered onto each other to build a chain of views. The Esper engine makes sure that views are reused among EQL statements for efficiency.
The built-in set of views is:
Views that represent moving event windows: win:length, win:time, win:time_batch, win:ext_time, ext:sort_window
Views for aggregation: std:unique, std:groupby, std:lastevent
Views that derive statistics: std:size, stat:uni, stat:linest, stat:correl, stat:weighted_avg, stat:multidim_stat
Esper can be extended by plugging-in custom developed views.
EQL queries are created and stored in the engine, and publish results as events are received by the engine or timer events occur that match the criteria specified in the query. Events can also be pulled from running EQL queries.
The select clause in an EQL query specifies the event properties or events to retrieve. The from clause in an EQL query specifies the event stream definitions and stream names to use. The where clause in an EQL query specifies search conditions that specify which event or event combination to search for. For example, the following statement returns the average price for IBM stock ticks in the last 30 seconds if the average hit 75 or more.
select average from StockTick(symbol='IBM').win:time(30).stat:uni('price') where average >= 75;
EQL queries follow the below syntax. EQL queries can be simple queries or more complex queries. A simple select contains only a select clause and a single stream definition. Complex EQL queries can be build that feature a more elaborate select list utilizing expressions, may join multiple streams or may contain a where clause that with search conditions.
select select_list from stream_def [as name] [, stream_def [as name]] [,...] [where search_conditions] [group by grouping_expression_list] [having grouping_search_conditions] [output output_specification]
The select clause is required in all EQL statements. The select clause can be used to select all properties via the wildcard *, or to specify a list of event properties and expressions. The select clause defines the event type (event property names and types) of the resulting events published by the statement, or pulled from the statement.
The syntax for selecting all event properties in a stream is:
select * from stream_def
The following statement selects all univariate statistics properties for the last 30 seconds of IBM stock ticks for price.
select * from StockTick(symbol='IBM').win:time(30).stat:uni('price')
In a join statement, using the select * syntax selects event properties that contain the events representing the joined streams themselves.
To chose the particular event properties to return:
select event_property [, event_property] [, ...] from stream_def
The following statement selects the count and standard deviation properties for the last 100 events of IBM stock ticks for volume.
select count, stdev from StockTick(symbol='IBM').win:length(100).stat:uni('volume')
The select clause can contain one or more expressions.
select expression [, expression] [, ...] from stream_def
The following statement selects the volume multiplied by price for a time batch of the last 30 seconds of stock tick events.
select volume * price from StockTick.win:time_batch(30)
Event properties and expressions can be renamed using below syntax.
select [event property | expression] as identifier [, ...]
The following statement selects volume multiplied by price and specifies the name volPrice for the event property.
select volume * price as volPrice from StockTick.win:length(100)
The from clause is required in all EQL statements. It specifies one or more event streams. Each event stream can optionally be given a name by means of the as syntax.
from stream_def [as name] [, stream_def [as name]] [, ...]
The event stream definition stream_def as shown in the syntax above consists of an event type, an optional filter property list and an optional list of views that derive data from a stream must be supplied. The syntax for an event stream definition is as below:
event_type ( [filter_criteria] ) [.view_spec] [.view_spec] [...]
The following EQL statement selects all event properties for the last 100 events of IBM stock ticks for volume. In the example, the event type is the fully qualified Java class name org.esper.example.StockTick. The expression filters for events where the property symbol has a value of "IBM". The optional view specifications for deriving data from the StockTick events are a length window and a view for computing statistics on volume. The name for the event stream is "volumeStats".
select * from org.esper.example.StockTick(symbol='IBM').win:length(100).stat:uni('volume') as volumeStats
Instead of the fully-qualified Java class name any other event name can be mapped via Configuration to a Java class, making the resulting statement more readable.
select * from StockTick(symbol='IBM').win:length(100).stat:uni('volume') as volumeStats
In the example above the event type was org.esper.example.StockTick. The event type is simply the fully qualified Java class name. Interfaces and abstract classes are also supported. Alternatively, via configuration an alias for an event type can be defined and used instead of the fully qualified class name. The below example shows one way to obtain the fully qualified class name of a given Java class StockTick.
String eventName = StockTick.class.getName(); String stmt = "from " + eventName + ".win:length(100)"
Filter criteria follow the same syntax as outlined in the event pattern section on filters; see Section 5.3, “Filter Expressions”. Filter criteria operators are: =, < , > , >=, <=. Ranges use the in keyword and round (...) or square brackets [].
Esper filters out events in an event stream as defined by filter criteria before it sends events to subsequent views. Thus, compared to search conditions in a where-clause, filter criteria remove unneeded events early.
The below example is a filter criteria list that removes events based on category, price and grade.
from mypackage.RfidEvent(category="Perishable", price<1.00, grade in [1, 2])
Views are used to derive or aggregate data. Views can be staggered onto each other. The section below outlines the views available and plug-in of custom views.
Views can optionally take one or parameters. These parameters can consist of primitive constants such as String, boolean or numeric types. String arrays are also supported as a view parameter type.
Views can optionally take one or parameters. These parameters can consist of primitive constants such as String, boolean or numeric types. String arrays are also supported as a view parameter type.
The below example uses the car location event. It specifies an empty list of filter criteria by adding a empty round brackets () after the event type. The first view "std:groupby('carId')" groups car location events by car id. The second view "win:length(4)" keeps a length window of the 4 last events, with one length window for each car id. The next view "std:groupby({'expressway', 'direction', 'segment'})" groups each event by it's expressway, direction and segment property values. Again, the grouping is done for each car id considering the last 4 events only. The last view "std:size()" is used to report the number of events. Thus the below example reports the number of events per car id and per expressway, direction and segment considering the last 4 events for each car id only. The "as accSegment" syntax assigns the name accSegment to the resulting event stream.
String carLocEvent = CarLocEvent.class.getName(); String joinStatement = "select * from " + carLocEvent + ".std:groupby('carId').win:length(4).std:groupby({'expressway', 'direction', 'segment'}).std:size() as accSeg," + carLocEvent + ".win:time(30).std:unique('carId') as curCarSeg" + " where accSeg.size >= 4" + " and accSeg.expressway = curCarSeg.expressway" + " and accSeg.direction = curCarSeg.direction" + " and (" + "(accSeg.direction=0 " + " and curCarSeg.segment < accSeg.segment" + " and curCarSeg.segment > accSeg.segment - 5)" + " or " + "(accSeg.direction=1 " + " and curCarSeg.segment > accSeg.segment" + " and curCarSeg.segment < accSeg.segment + 5)" + ")";
The where clause is an optional clause in EQL statements. Via the where clause event streams can be joined and events can be filtered.
Comparison operators =, < , > , >=, <=, !=, <>, is null, is not null and logical combinations via and and or are supported in the where clause. The where clause can also introduce join conditions as outlined in Section 6.10, “Joining Event Streams”. Where-clauses can also contain expressions. Some examples are listed below.
...where fraud.severity = 5 and amount > 500 ...where (orderItem.orderId is null) or (orderItem.class != 10) ...where (orderItem.orderId = null) or (orderItem.class <> 10) ...where itemCount / packageCount > 10
The aggregate functions are sum, avg, count, max, min, median, stddev, avedev. You can use aggregate functions to calculate and summarize data from event properties. For example, to find out the total price for all stock tick events in the last 30 seconds, type:
select sum(price) from StockTickEvent.win:time(30)
Here is the syntax for aggregate functions:
aggregate_function( [all | distinct] expression)
You can apply aggregate functions to all events in an event stream window or other view, or to one or more groups of events. From each set of events to which an aggregate function is applied, Esper generates a single value.
Expression is usually an event property name. However it can also be a constant, function, or any combination of event property names, constants, and functions connected by arithmetic operators.
For example, to find out the average price for all stock tick events in the last 30 seconds if the price was doubled:
select avg(price * 2) from StockTickEvent.win:time(30)
You can use the optional keyword distinct with all aggregate functions to eliminate duplicate values before the aggregate function is applied. The optional keyword all which performs the operation on all events is the default.
The syntax of the aggregation functions and the results they produce are shown in below table.
Table 6.1. Syntax and results of aggregate functions
Aggregate Function | Result |
---|---|
sum([all|distinct] expression) | Totals the (distinct) values in the expression, returning a value of long, double, float or integer type depending on the expression |
avg([all|distinct] expression) | Average of the (distinct) values in the expression, returning a value of double type |
count([all|distinct] expression) | Number of the (distinct) non-null values in the expression, returning a value of long type |
count(*) | Number of events, returning a value of long type |
max([all|distinct] expression) | Highest (distinct) value in the expression, returning a value of the same type as the expression itself returns |
min([all|distinct] expression) | Lowest (distinct) value in the expression, returning a value of the same type as the expression itself returns |
median([all|distinct] expression) | Median (distinct) value in the expression, returning a value of double type |
stddev([all|distinct] expression) | Standard deviation of the (distinct) values in the expression, returning a value of double type |
avedev([all|distinct] expression) | Mean deviation of the (distinct) values in the expression, returning a value of double type |
You can use aggregation functions in a select clause and in a having clause. You cannot use aggregate functions in a where clause, but you can use the where clause to restrict the events to which the aggregate is applied. The next query computes the average and sum of the price of stock tick events for the symbol IBM only, for the last 10 stock tick events regardless of their symbol.
select 'IBM stats' as title, avg(price) as avgPrice, sum(price) as sumPrice from StockTickEvent.win:length(10) where symbol='IBM'
In the above example the length window of 10 elements is not affected by the where-clause, i.e. all events enter and leave the length window regardless of their symbol. If we only care about the last 10 IBM events, we need to add filter criteria as below.
select 'IBM stats' as title, avg(price) as avgPrice, sum(price) as sumPrice from StockTickEvent(symbol='IBM').win:length(10) where symbol='IBM'
You can use aggregate functions with any type of event property or expression, with the following exceptions:
You can use sum, avg, median, stddev, avedev with numeric event properties only
Esper ignores any null values returned by the event property or expression on which the aggregate function is operating, except for the count(*) function, which counts null values as well. All aggregate functions return null if the data set contains no events, or if all events in the data set contain only null values for the aggregated expression.
The group by clause is optional in all EQL statements. The group by clause divides the output of an EQL statement into groups. You can group by one or more event property names, or by the result of computed expressions. When used with aggregate functions, group by retrieves the calculations in each subgroup. You can use group by without aggregate functions, but generally that can produce confusing results.
For example, the below statement returns the total price per symbol for all stock tick events in the last 30 seconds:
select symbol, sum(price) from StockTickEvent.win:time(30) group by symbol
The syntax of the group by clause is:
group by arregate_free_expression [, arregate_free_expression] [, ...]
Esper places the following restrictions on expressions in the group by clause:
Expressions in the group by cannot contain aggregate functions
Event properties that are used within aggregate functions in the select clause cannot also be used in a group by expression
You can list more then one expression in the group by clause to nest groups. Once the sets are established with group by the aggregation functions are applied. This statement posts the median volume for all stock tick events in the last 30 seconds per symbol and tick data feed. Esper posts one event for each group to statement listeners:
select symbol, tickDataFeed, median(volume) from StockTickEvent.win:time(30) group by symbol, tickDataFeed
In the statement above the event properties in the select list (symbol, tickDataFeed) are also listed in the group by clause. The statement thus follows the SQL standard which prescribes that non-aggregated event properties in the select list must match the group by columns.
Esper also supports statements in which one or more event properties in the select list are not listed in the group by clause. The statement below demonstrates this case. It calculates the standard deviation for the last 30 seconds of stock ticks aggregating by symbol and posting for each event the symbol, tickDataFeed and the standard deviation on price.
select symbol, tickDataFeed, stddev(price) from StockTickEvent.win:time(30) group by symbol
The above example still aggregates the price event property based on the symbol, but produces one event per incoming event, not one event per group.
Additionally, Esper supports statements in which one or more event properties in the group by clause are not listed in the select list. This is an example that calculates the mean deviation per symbol and tickDataFeed and posts one event per group with symbol and mean deviation of price in the generated events. Since tickDataFeed is not in the posted results, this can potentially be confusing.
select symbol, avedev(price) from StockTickEvent.win:time(30) group by symbol, tickDataFeed
Expressions are also allowed in the group by list:
select symbol * price, count(*) from StockTickEvent.win:time(30) group by symbol * price
If the group by expression resulted in a null value, the null value becomes its own group. All null values are aggregated into the same group. If you are using the count(expression) aggregate function which does not count null values, the count returns zero if only null values are encountered.
You can use a where clause in a statement with group by. Events that do not satisfy the conditions in the where clause are eliminated before any grouping is done. For example, the statement below posts the number of stock ticks in the last 30 seconds with a volume larger then 100, posting one event per group (symbol).
select symbol, count(*) from StockTickEvent.win:time(30) where volume > 100 group by symbol
Use the having clause to pass or reject events defined by the group-by clause. The having clause sets conditions for the group by clause in the same way where sets conditions for the select clause, except where cannot include aggregate functions, while having often does.
This statement is an example of a having clause with an aggregate function. It posts the total price per symbol for the last 30 seconds of stock tick events for only those symbols in which the total price exceeds 1000. The having clause eliminates all symbols where the total price is equal or less then 1000.
select symbol, sum(price) from StockTickEvent.win:time(30) group by symbol having sum(price) > 1000
To include more then one condition in the having clause combine the conditions with and, or or not. This is shown in the statement below which selects only groups with a total price greater then 1000 and an average volume less then 500.
select symbol, sum(price), avg(volume) from StockTickEvent.win:time(30) group by symbol having sum(price) > 1000 and avg(volume) < 500
Esper places the following restrictions on expressions in the having clause:
Any expressions that contain aggregate functions must also occur in the select clause
A statement with the having clause should also have a group by clause. If you omit group-by, all the events not excluded by the where clause return as a single group. In that case having acts like a where except that having can have aggregate functions.
The having clause can also be used without group by clause as the below example shows. The example below posts events where the price is less then the current running average price of all stock tick events in the last 30 seconds.
select symbol, price, avg(price) from StockTickEvent.win:time(30) having price < avg(price)
When you include filters, the where condition, the group by clause and the having condition in an EQL statement the sequence in which each clause affects events determines the final result:
The event stream's filter condition, if present, dictates which events enter a window (if one is used). The filter discards any events not meeting filter criteria.
The where clause excludes events that do not meet its search condition.
Aggregate functions in the select list calculate summary values for each group.
The having clause excludes events from the final results that do not meet its search condition.
The following query illustrates the use of filter, where, group by and having clauses in one statement with a select clause containing an aggregate function.
select tickDataFeed, stddev(price) from StockTickEvent(symbol='IBM').win:length(10) where volume > 1000 group by tickDataFeed having stddev(price) > 0.8
Esper filters events using the filter criteria for the event stream StockTickEvent. In the example above only events with symbol IBM enter the length window over the last 10 events, all other events are simply discarded. The where clause removes any events posted by the length window (events entering the window and event leaving the window) that do not match the condition of volume greater then 1000. Remaining events are applied to the stddev standard deviation aggregate function for each tick data feed as specified in the group by clause. Each tickDataFeed value generates one event. Esper applies the having clause and only lets events pass for tickDataFeed groups with a standard deviation of price greater then 0.8.
The output clause is optional in Esper and is used to stabilize the rate at which events are output. For example, the following statement batches old and new events and outputs them at the end of every 90 second interval.
select * from StockTickEvent.win:length(5) output every 90 seconds
Here is the syntax for output rate limiting:
output [all | last] every number [minutes | seconds | events]
The optional last keyword specifies to only output the very last event, while the all keyword is the default and specifies to output all events in a batch. The batch size can be specified in terms of time or number of events.
The time interval can also be specified in terms of minutes; the following statement is identical to the first one.
select * from StockTickEvent.win:length(5) output every 1.5 minutes
A second way that output can be stabilized is by batching events until a certain number of events have been collected. The next statement only outputs when either 5 (or more) new or 5 (or more) old events have been batched.
select * from StockTickEvent.win:time(30) output every 5 events
Additionally, event output can be further modified by the optional last keyword, which causes output of only the last event to arrive into an output batch.
select * from StockTickEvent.win:time(30) output last every 5 events
The output clause interacts in two ways with the group by and having clauses. First, in the output every n events case, the number n refers to the number of events arriving into the group by clause. That is, if the group by clause outputs only 1 event per group, or if the arriving events don't satisfy the having clause, then the actual number of events output by the statement could be fewer than n.
Second, the last and all keywords have special meanings when used in a statement with aggregate functions and the group by clause. The last keyword specifies that only groups whose aggregate values have been updated with the most recent batch of events should be output. The all keyword (the default) specifies that the most recent data for all groups seen so far should be output, whether or not these groups' aggregate values have just been updated.
Single-row functions return a single value for every single event result row generated by your statement. These functions can appear in the select clause, in the where clause and in the having clause.
The below table outlines the single-row functions available.
Table 6.2. Syntax and results of single-row functions
Single-row Function | Result |
---|---|
max(expression, expression, [, expression [,...]]) | Returns the highest numeric value among the 2 or more comma-separated expressions. |
min(expression, expression, [, expression [,...]]) | Returns the lowest numeric value among the 2 or more comma-separated expressions. |
An example showing the use of the min single-row function is below.
select symbol, min(ticks.timestamp, news.timestamp) as minT from StockTickEvent.win:time(30) as ticks, NewsEvent.win:time(30) as news where ticks.symbol = news.symbol
This chapter outlines the views that are built into Esper.
Creates a moving window extending the specified number of elements into the past.
The below example calculates basic univariate statistics for the last 5 stock ticks for symbol IBM.
StockTickEvent(symbol='IBM').win:length(5).stat:uni('price')
The next example keeps a length window of 10 events of stock trade events, with a separate window for each symbol. The statistics on price is calculated only for the last 10 events for each symbol.
StockTickEvent.std:groupby('symbol').win:length(10).stat:uni('price')
The time_window creates a moving time window extending from the specified time interval in seconds into the past based on the system time.
For the IBM stock tick events in the last 1000 milliseconds, calculate statistics on price.
StockTickEvent(symbol='IBM').win:time(1).stat:uni('price')
Similar to the time window this view moving time window extending from the specified time interval in seconds into the past, but based on the millisecond time value supplied by an event property.
This view holds stock tick events of the last 10 seconds based on the timestamp property in StockTickEvent.
StockTickEvent.win:ext_timed(10, 'timestamp')
This window view buffers events and releases them every specified time interval in one update.
The below example batches events into a 5 second window releasing new batches every 5 seconds. Listeners to updates posted by this view receive updated information only every 5 seconds.
StockTickEvent.win:time_batch(5)
The uniqueview is a view that includes only the most recent among events having the same value for the specified field.
The below example creates a view that retains only the last event per symbol.
StockTickEvent.std:unique('symbol')
This view groups events into sub-views by the value of the specified field.
This example calculates statistics on price separately for each symbol.
StockTickEvent.std:groupby('symbol').stat:uni('price')
The next example keeps a length window of 10 events of stock trade events, with a separate window for each symbol. Now the statistics on price is calculated only for the last 10 events for each symbol.
StockTickEvent.std:groupby('symbol').win:length(10).stat:uni('price')
This view returns the number of elements in view.
This example view reports the number of events within the last 1 minute.
StockTickEvent.win:time(60000).std:size()
This view calculated basic univariate statistics on an event property.
Table 6.3. Univariate statistics derived properties
Property Name | Description |
---|---|
count | Number of values |
sum | Sum of values |
average | Average of values |
variance | Variance |
stdev | Sample standard deviation (square root of variance) |
stdevpa | Population standard deviation |
The below example calculates price statistics on stock tick events for the last 10 events.
StockTickEvent.win:length(10).stat:uni('price')
This view calculates regression on two event properties.
Calculate slope and y-intercept on price and offer for all events in the last 10 seconds.
StockTickEvent.win:time(10000).stat:linest('price', 'offer')
This view calculates the correlation value on two event properties.
Table 6.5. Correlation derived properties
Property Name | Description |
---|---|
correl | Correlation between two event properties |
Calculate correlation on price and offer over all stock tick events for IBM.
StockTickEvent(symbol='IBM').stat:correl('price', 'offer')
This view returns the weighted average given a weight field and a field to compute the average for. Syntax: weighted_avg(field, weightField)
Views that derive the volume-weighted average price for the last 3 seconds.
StockTickEvent(symbol='IBM').win:time(3000).stat:weighted_avg('price', 'volume')
This view works similar to the std:groupby views in that it groups information by one or more event properties. The view accepts 3 or more parameters: The first parameter to the view defines the univariate statistics values to derive. The second parameter is the property name to derive data from. The remaining parameters supply the event property names to use to derive dimensions.
The example below derives the count, average and standard deviation latency of service measurement events per customer.
ServiceMeasurement.stat:multidim_stats({‘count’, ‘average’, ‘stdev’}, 'latency', 'customer')
This example derives the average latency of service measurement events per customer, service and error status for events in the last 30 seconds.
ServiceMeasurement.win:length(30000).stat:multidim_stats({‘average’}, 'latency', 'customer', 'service', 'status')
This view sorts by values in the specified event property and keeps only the top elements up to the given size.
The syntax for this view is : sort(String propertyName, boolean isDescending, int size) .
These view can be used to sort on price descending keeping the lowest 10 prices and reporting statistics on price.
StockTickEvent.ext:sort('price', true, 10).stat:uni('price'))
Two or more event streams can be part of the from clause and thus both streams determine the resulting events. The where-clause lists the join conditions that Esper uses to relate events in the two or more streams.
Each point in time that an event arrives to one of the event streams, the two event streams are joined and output events are produced according to the where-clause.
This example joins 2 event streams. The first event stream consists of fraud warning events for which we keep the last 30 minutes (1800 seconds). The second stream is withdrawal events for which we consider the last 30 seconds. The streams are joined on account number.
select fraud.accountNumber as accntNum, fraud.warning as warn, withdraw.amount as amount, max(fraud.timestamp, withdraw.timestamp) as timestamp, 'withdrawlFraud' as desc from net.esper.example.atm.FraudWarningEvent.win:time(1800) as fraud, net.esper.example.atm.WithdrawalEvent.win:time(30) as withdraw where fraud.accountNumber = withdraw.accountNumber
Esper supports left outer joins, right outer joins and full outer joins between an unlimited number of event streams.
If the outer join is a left outer join, there will be an output event for each event of the stream on the left-hand side of the clause. For example, in the left outer join shown below we will get output for each event in the stream RfidEvent, even if the event does not match any event in the event stream OrderList.
select * from net.esper.example.rfid.RfidEvent.win:time(30) as rfid left outer join net.esper.example.rfid.OrderList.win:length(10000) as orderlist on rfid.itemId = orderList.itemId
Similarly, if the join is a Right Outer Join, then there will be an output event for each event of the stream on the right-hand side of the clause. For example, in the right outer join shown below we will get output for each event in the stream OrderList, even if the event does not match any event in the event stream RfidEvent.
select * from net.esper.example.rfid.RfidEvent.win:time(30) as rfid right outer join net.esper.example.rfid.OrderList.win:length(10000) as orderlist on rfid.itemId = orderList.itemId
For all types of outer joins, if the join condition is not met, the select list is computed with the event properties of the arrived event while all other event properties are considered to be null.
select * from net.esper.example.rfid.RfidEvent.win:time(30) as rfid full outer join net.esper.example.rfid.OrderList.win:length(10000) as orderlist on rfid.itemId = orderList.itemId
The last type of outer join is a full outer join. In a full outer join, each point in time that an event arrives to one of the event streams, one or more output events are produced. In the example below, when either an RfidEvent or an OrderList event arrive, one or more output event is produced.