Developed and tested on macOS, OpenJDK build 15+36. Should work on other *nixes and from Java 11 onwards.
Execute the script connected-source-hosts
, or alternatively gradlew runConnectedSourceHosts --args="<arguments and options>"
.
Usage (connected-source-hosts --help
):
Usage: connected-source-hosts [OPTIONS] LOG_FILE_NAME TARGET
Options:
-f, --from INT timestamp from inclusive [ms] (default: 0)
-t, --to INT timestamp to inclusive [ms] (default: 9223372036854775807)
-h, --help Show this message and exit
Arguments:
LOG_FILE_NAME log file name
TARGET search target of incoming connections
Default value for the -t/--to
option is Long.MAX_VALUE
Example:
./connected-source-hosts input-file-10000.txt Aaliayh --from=1565656607767 --to=1565680778409
Execute the script periodic-reports
, or alternatively gradlew runPeriodicReports --args="<arguments and options>"
.
Usage (periodic-reports --help
):
Usage: periodic-reports [OPTIONS] LOG_FILE_NAME HOST
Options:
-f, --frequency INT reporting frequency [ms] (default: 3600000)
-l, --max-lag INT maximum tolerable lag of log entries [ms] (default: 300000)
-t, --timeout INT log inactivity timeout [ms] (default: 30000)
-h, --help Show this message and exit
Arguments:
LOG_FILE_NAME log file name
HOST host to report both incoming and outgoing connections
Default value for the -f/--frequency
option is 1h. Default value for the -l/--max-lag
option is 5m
Example:
./periodic-reports input-file-10000.txt Ferid -f 3600000 -t 0
- Blank log lines are allowed. They are skipped during the processing
- Malformed log throw an exception and stop the processing
periodic-reports
script ("Unlimited Input Parser" part),HOST
argument sets both the target host for searching connected source hosts and the source host for searching target hosts that received connections from that host- If there are multiple source hosts that generated the maximum number of connections,
periodic-reports
script prints all the source hosts and the number of connections - First line timestamp sets the boundary of reporting periods
The solution follows the Dependency Inversion principle (aka Hexagonal Architecture, Clean Architecture, you name it).
The main entry points are main
functions in ConnectedSourceHosts.kt
and PeriodicReports.kt
files.
Entry points interact with the domain through handlers (aka use case or application services): ConnectedSourceHostsHandler
and
PeriodicReportsHandler
. ConnectedSourceHostsHandler
is rather simple and self-contained. PeriodicReportsHandler
collaborates with
peers from the periodicreports
package.
log
package contains types used in both handlers - some basic type definitions and log reading abstraction (LogReader
).
Infrastructure implementation of the domain abstractions is implemented in file
and renderer
packages.
Top connection counter has n
space complexity and n log(n)
time complexity.
To run the programs with a low-memory JVM, add this option to the runConnectedSourceHosts
and runPeriodicReports
tasks:
jvmArgs = listOf("-Xmx128M")
I used Kotlin inline classes to wrap primitive types and avoid thus primitive obsession . They don't increase memory and execution overhead - Kotlin compiler replaces them with the wrapped type during the compilation. They are still in beta in Kotlin 1.4 and will become stable in Kotlin 1.5 (under development). As explained in Components stability, Kotlin beta features can be used in typical production scenarios, although there can be breaking changes between Kotlin versions.
Although the solution could be implemented using Java thread API, I decided to
use Kotlin coroutines and suspend
functions. Coroutine runtime and testing library
gives a higher level of abstraction over concurrency and decouples the execution policies (thread pools, single-threaded event loops) from
concurrency constructs.
As you can see in FileLogReaderTest
, production code is tested with TestCoroutineContext
which gives the precise control over timing. I
was able to verify the timeouts without executing FileLogReader
with a thread pool. As a result, the tests are very fast (no sleeps)
and deterministic.
I followed classical/Chicago TDD with micro- to small commits (check the history).
Unit tests are social - they are testing handlers and their dependencies - PeriodicReportGenerator
and ReportCollector
, since only
the PeriodicReportGenerator
has a non-trivial amount of branching logic. No mocking library was required.
Both use cases have their acceptance tests with the sample log file attached to the assignment.
Run LogGenerator.kt
to generate random logs with the given number of lines.