aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDavid Timber <mieabby@gmail.com>2021-07-07 21:23:21 +1000
committerDavid Timber <mieabby@gmail.com>2021-07-07 21:47:00 +1000
commit3fbf08ab6522c91e8209b21d66430a2db4ea71cb (patch)
tree07145bcb17545d68b1a8248236b1f8e1fb5f5d15
parent9d963486f66a864aa67e668742b6aa6a6e72fb1f (diff)
Documentation in progress.
-rw-r--r--COPYING.md5
-rw-r--r--README.md46
-rw-r--r--doc/dev_notes.md77
-rw-r--r--doc/proto.md6
-rw-r--r--doc/sws.md205
-rw-r--r--doc/user_guide.md2
6 files changed, 341 insertions, 0 deletions
diff --git a/COPYING.md b/COPYING.md
new file mode 100644
index 0000000..7c143a8
--- /dev/null
+++ b/COPYING.md
@@ -0,0 +1,5 @@
+# Proone Worm Project - On the License
+My wish is that this project is published under a FOSS compatible license like
+GPL. But it occurred to me that what's the point of seeking protection from the
+law if you're breaking the law. So, there you go. Do whatever you want with the
+source code. But I'd appreciate it if you keep this project FOSS.
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..01d8271
--- /dev/null
+++ b/README.md
@@ -0,0 +1,46 @@
+# Proone Worm Project
+**Proone** is a Linux worm designed to target unconfigured IoT embedded devices
+with MMU. It features self-contained breaking and entering, replicating, IPv6
+support and CNC using DNS over TLS.
+
+## Foreword
+In a nutshell, this project is a reengineered version of Mirai, but in a serious
+tone and with some extras. Inspired by the work of original authors of Mirai,
+I started this project on the new year's eve of 2020. I don't mean any harm to
+this world. This is merely one of my "art projects" and I hope it will stay that
+way.
+
+I named this project "**Proone**" because the first idea as to what to do with
+this worm was "pruning" bad devices off this big tree called the Internet. The
+bad devices I refer to here are neglected/obsolete devices running unpatched
+software and poorly made devices with built-in security vulnerabilities like
+predictible default logins and unlocked maintenance backdoors. Especially, these
+vulnerable devices running on a network without a firewall fall victim of being
+botnets for nefarious purposes. My original idea was a "search and destroy"
+operation against these devices for a good cause.
+
+During the development, I came to realise that this is a bad idea and that I
+lack the balls to pull this off. Therefore I hereby abandon the idea by
+publishing my work online.
+
+Call this whatever you want: reinventing the wheel, copycat, waste of time...
+Whatever you want to call it, working on this project helped me a lot.
+
+## Message to General Public
+**This software is a malware**. This software has been tested to work in an
+orchestrated virtual environment. In principle, it works by scanning the
+Internet and local network for computers with security vulnerabilities. This
+software is programmed to do something illegal! If you wish to use this
+software, please do so in a controlled environment safely isolated from the
+Internet.
+
+## Index of Documents
+Where to go from here
+
+* [User Guide](doc/user_guide.md)
+* [Software Design Spec](doc/sws.md)
+* [Protocol Spec](doc/proto.md)
+* [Dev Notes](doc/dev_notes.md)
+
+## Subprojects
+* proone-xcomp: Infrastructure for building and testing cross-compiled builds (TODO)
diff --git a/doc/dev_notes.md b/doc/dev_notes.md
new file mode 100644
index 0000000..07f190f
--- /dev/null
+++ b/doc/dev_notes.md
@@ -0,0 +1,77 @@
+# Proone Dev Notes
+
+## Potential Improvements
+### TODO switching to real threads?
+
+### Put Mbed TLS on Diet
+The build is not light because the Mbed TLS library is extensive. Proone is
+tested using default Mbed TLS config included in Buildroot, but size reduction
+may be achieved by disabling unnessary features like threading and DTLS support.
+
+### Don't Build Clean-up Code
+Excluding clean-up code for release build is widely accepted technique to reduce
+code size. Proone does not expect user's intervention. Proone is programmed to
+exit when SIGINT is received for debugging purposes only. You can also see some
+code size reduction from removing the handling of the signal as well.
+
+## Bugs Found
+### Musl SOCKETCALL
+In the early stage of development, Musl was considered for the libc
+implementation as it seemed to have more benefits than uClinux can
+[offer](http://www.etalabs.net/compare_libcs.html).
+However, it was later determined that, regardless of the benefits, I could not
+take the risk of encountering more bugs like
+[this one](https://www.openwall.com/lists/musl/2020/08/03/6).
+Using Musl is abandoned immediately after the discovery of the bug.
+
+### Mbed TLS `getrandom()` Blocks
+https://github.com/ARMmbed/mbedtls/issues/3551
+
+Mbed TLS uses `getrandom()` to initialise CTR_DRBG contexts. On systems where
+the function is not available, the library falls back to using `/dev/urandom`,
+which never blocks. This contracts the behaviour of `getrandom()` function.
+
+`prne_mbedtls_entropy_init()` had to be implemented to modify the "factory"
+function for creating CTR_DRBG contexts so that the library always uses
+`/dev/urandom`. This would have been unacceptable measure if Proone handles
+sensitive data, but the main purpose of Proone using TLS is to hide its
+characteristics so that it's hard for law enforcements or ISPs to filter the
+traffic of Proone.
+
+### Pthsem's Improper Use of `FD_SET()`
+Calling `FD_SET()` with a negative fd value is undefined. Pthsem uses `select()`
+for internal scheduling and the fd value is not check in `pth_poll()`. Therefore
+calling `pth_poll()` with `pollfd` with negative fds results in undefined
+behaviour because the fd values are propagated to `FD_SET()`. uClibc does not
+take this well and the program crashes with SIGBUS. Nothing serious happens if
+the program is linked with Glibc on x86 hosts.
+
+To get around this issue, `prne_pth_poll()` is used where the use of
+`pth_poll()` is required. In `prne_pth_poll()`, the `pollfd` elements with
+negative fd values are transparently filtered out before passed to `pth_poll()`.
+
+## Problems
+### Evading Packet Sniffing
+Lawful interception is conducted in most countries. Law enforcements often use
+the characteristics exhibited by malwares to prevent the spread by filtering
+traffic. These are the "characteristics" of Proone.
+
+* SYN packets to remote port 64420[^2]
+* The ALPN string "prne-htbt" in TLS hello messages
+* Client and server certificates in TLS hello messages
+* Spewing of crafted SYN packets followed by RST packets if the remote end has
+ that port open[^1]
+
+Most of the characteristics can be changed by regenerating the PKI or using
+different port for Heartbeat.
+
+The use of ALPN can be disabled by not setting the ALPN list for ssl config(ie.
+not calling `mbedtls_ssl_conf_alpn_protocols()`).
+
+
+[^1]: The crafted packets are not recognised by the kernel because no socket is
+associated with the port. The kernel is forced to send a RST back and this
+packet will reach the remote end if there's no firewall in the way that filters
+it.
+[^2]: The port 64420 is in the ephemeral port range. Blocking this port may lead
+to mild consequences for ISPs.
diff --git a/doc/proto.md b/doc/proto.md
new file mode 100644
index 0000000..71cde47
--- /dev/null
+++ b/doc/proto.md
@@ -0,0 +1,6 @@
+# Proone Protocol Spec
+TODO
+
+## TODO File Formats
+### Data Vault
+### Binary Archive
diff --git a/doc/sws.md b/doc/sws.md
new file mode 100644
index 0000000..e1d7c66
--- /dev/null
+++ b/doc/sws.md
@@ -0,0 +1,205 @@
+# Proone Software Design Spec
+This document is part of **Proone Worn Project**. For overview, refer to
+[README.md](/README.md).
+
+* TODO structure
+ * TODO workers and functions
+ * TODO CNC TXT REC
+ * TODO dvault
+* TODO IPv6
+* TODO classes
+
+## Subsystems
+### Heartbeat
+**Heartbeat** is a subsystem of Proone that consists of a backdoor and CNC
+mechanism on infected devices. **The Heartbeat protocol** is an point-to-point
+or a broadcast framing protocol that works over a transport stream such as
+TCP/IP. The protocol is documented separately in **[Protocol
+Spec](proto.md)**. The overview of the protocol is followed below.
+
+Heartbeat subsystem takes up a large portion of the Proone code base. The
+subsystem mainly works as a format in DNS TXT records and a TCP/IP framing
+protocol. A complete heartbeat connection consists of an **authoritive end** and
+a **submissive end**.
+
+In the Heartbeat protocol, a request is usually initiated by one initiating
+frame by one end and the other end responds to it by one or more response
+frames. The protocol also employs the concept of "protocol upgrade" like that of
+WebSocket in which both request and response frames can be streams of frames.
+
+A request-response session is distinguished by a message id number. A message id
+number is generated by the end which initiated the session and is used for the
+duration of the session. The idea behind having message id number is to make the
+protocol pipe-lineable so that simple request-response pairs can be processed in
+parallel. This is merely a future-proof design and does not play a significant
+role.
+
+Unlike conventional botnets, Proone instances(aka "bots") are controlled by TXT
+DNS records containing one or more request frames of an authoritive end. In this
+scheme, a request is initiated by Proone instances acting as a submissive end
+quering and reading the contents of the TXT records. Any response data resulted
+in the process is discarded. The heartbeat protocol binary is represented in
+base64 encoding because most DNS management software do not accept binary data
+for the value of TXT records although [the RFC
+spec](https://datatracker.ietf.org/doc/html/rfc1035#section-3.3) does not impose
+such restriction.
+
+Only public DNS servers which support DNS over TLS are used to counter lawful
+interception. The reason being, the DNS protocol is not encrypted and ISPs or
+law enforcfements can easily filter out TXT REC CNC traffic simply by doing
+plain-text string search. A TLS library is used to implement the SSH attack
+vector, so using the library for another purpose was an enticing choice. Proone
+queries public DNS servers directly rather than using system functions. This
+eliminates the chance of letting ISP DNS servers giving false results. Using
+public DNS servers is also beneficial since law enforcements would have to take
+down the domain itself as it would be difficult to convince the operators of
+public DNS servers to block a recursive query to a particular name server.
+Another benefit is not having to run CNC servers for simple tasks like running
+shell scripts.
+
+There are 2 recommended applications. One typical application is having a
+`PRNE_HTBT_OP_HOVER`(Hand Over Operation) request frame in TXT records so that
+instances will connect to servers running authoritive htbt implementations for
+furthur instructions. The second application is having a
+`PRNE_HTBT_OP_RUN_CMD`(Run Command Operation) frame or a
+`PRNE_HTBT_OP_RUN_BIN`(Run Binary Operation) containing a simple minified shell
+script for instances to run.
+
+Using CNC TXT records to transfer a large amount of data is possible but not
+recommended. In theory, doing `PRNE_HTBT_OP_NY_BIN`(Binary Upgrade Operation)
+with CNC TXT REC is possible. However, For Proone instances, quering TXT
+records, decoding base64 data and running a slave heartbeat client is costly
+operation. It's not a simple task and prone to failure.
+
+### Use Cases
+To stop all Proone instances, issue command `kill -9 0` or `reboot -nf` with
+detach flag unset. To disable all hosts, issue command `half -nf`.
+
+In order to do things of complexity, it's recommended to implement an
+authoritive server implementation and command Proone instances to take orders
+from the servers running the implementation. Load balancing can be done at the
+DNS level using techniques like round-robin DNS or GeoDNS. Once a Proone
+instance connects to an authoritive server, the server can fully utilise the
+heartbeat protocol to do the tasks described below.
+
+Shell scripts can be run on Proone hosts with `PRNE_HTBT_OP_RUN_BIN`(Run Binary
+Operation) as long as the script contains a shebang line at the very start of
+the script. Note that most embedded devices run lightweight shells like
+Ash(BusyBox) and Toysh(Toybox)[^1]. The best is strategy is targetting Bourne
+shell, which has been a default shell for the majority of systems(historically).
+
+To make hosts run an arbitrary binary executable, `PRNE_HTBT_OP_HOST_INFO`(Host
+Info Operation) can be used to query the archeticture type of the host to select
+a suitable binary for upload.
+
+To replace the Proone binary, `PRNE_HTBT_OP_NY_BIN`(Binary Upgrade Operation)
+can be used. The binary format for the operation is specified in a [separate
+document](proto.md). Upon successful upload, the Proone instance will attempt to
+`exec()` to the new binary after **binary recombination**(explained in a
+separate section) is performed. All this is done in the parent process. In the
+event of failure, Proone continues to operate with the existing binary. The only
+way to check the result of the operation is through reestablishing the
+connection to the Proone instance and querying the version of the binary through
+`PRNE_HTBT_OP_HOST_INFO` request.
+
+The protocol leaves room for implementing M2M mechanisms. A Proone instance
+checks if the target host is already infected by attempting to connect to a
+**local back door**(or simply, **LBD**) on the target host. An LBD port is
+served by a submissive Heartbeat client. The future versions of Proone can
+utilise the LBD port to update the binary of the target instance if old one is
+encountered. **proone-htbtclient** can be used to examine and maintain the
+Proone instance via this port.
+
+### Binary Archive and Data Vault
+Proone aims to be a decentralised botnet. To spread without binary distribution
+servers, Proone carries all the executables of arch types it supports. For this,
+a special file structure is designed.
+
+The **Data Vault**("**DVault**") is a binary block containing large and
+sensitive data necessary for operation of Proone. DVault is a kempt version of
+the data table of Mirai. DVault also helps reduce the size of Proone. Each
+executable contains the *.data* section. If there's a long string in the
+program, the value of the string will end up in each *.data* section of the
+executables. Compression leviates this issue but there's a limit because the
+size of data dictionary blocks can only get big. Having a custom *.data* section
+for large data solves this issue at the cost of the size of code for fetching
+and unmasking values from DVault. This implies that, in some cases, storing
+static values in the *.data* section of an ELF is more efficient[^2]. Another
+purpose of DVault is masking sensitive data like `PRNE_DATA_KEY_CNC_TXT_REC` and
+`PRNE_DATA_KEY_CRED_DICT` so that they're not revealed when `strings` command is
+run on the executable or when the process is core dumped. DVault is loaded when
+Proone initialises. The loaded contents remain in memory masked and unmasked
+only when needed.
+
+The contents of DVault are XORed with a 256 byte array of random numbers
+generated on each compilation. This process makes it impossible to compress the
+DVault binary block because of high entropy. Therefore it's not recommended to
+use DVault to store exceptionally large values. This issue may be solved by
+compressing the value separately at the cost of CPU time.
+
+The **Binary Archive**("**BA**") is a binary block containing compressed
+executables and an index of the executables.
+
+## Requirements
+### Targetting Wide Range of Devices and Kernel Configurations
+A number of methods has been employed in efforts to target a wide range of Linux
+devices. The assumption is that there are still devices running old images of
+Linux and targetting these devices means coding up to the standard of old POSIX
+specs and testing under old versions of Linux(namely 2.6.x).
+`_POSIX_C_SOURCE=200112L` macro is defined to meet this requirement. Note that
+using this macro does not give you an error when you accidentally use APIs not
+in the 200112L standard. The compiler will only give you a warning and your code
+will compile just fine. If you happen to use a function that the kernel of the
+host does not support, the syscall will fail with `ENOSYS`. If the feature
+requiring the new API can be silently switched off at runtime, removal of the
+macro is recommended.
+
+The Linux kernel is highly configurable. Pesudo file systems and the device file
+system may not be present on a Linux host since they can be disabled. Disabling
+any of these file systems is unusual for PCs but practical on embedded devices.
+Proone do not assume that these file systems are available on the host and try
+to run without using them if not available.
+
+### Running Lean
+Proone is designed under the assumption that honouring other processes on the
+system will decrease the change of getting caught by system administrators.
+
+Proone is compartmentalised so that it's somewhat immune to syscall fails. This
+design is to counter `ENOMEM` as it runs lean on lean embedded systems. This
+implies that proone can be initialised "half-complete". For example,
+it can be initialised with all the workers running except the Heartbeat worker.
+In this case, proone will be able to infect other devices on the network
+while unable to respond to CNC TXT REC. Another notable case would be an
+instance running without the Recon worker. It will respond to the CNC TXT REC
+and serve the local backdoor connections while unable to infecting the other
+devices on the network. Proone does not reattempt to start the workers it failed
+to run on start. The assumption is that the system is already running with its
+memory full to the brim and it's futile to wait for resource it failed to claim
+as it's likley that the other services on the system will claim the reource at
+some point.
+
+Proone does cooperative multitasking by using **Pthsem** library. This is one
+of many efforts to "run lean" whereby restricting CPU usage to one logical
+thread. This may seem as a huge missed opportunity if Proone scores infecting
+itself onto a beefy multi-core system. Keep in mind that Proone is designed to
+run on resource-scarce embedded devices. Most poorly-designed vulnerable devices
+will be single core, anways. The strategy is getting the most small-powered
+devices infected rather than having a few infected high-performance systems.
+
+### Volatile Operation
+TODO
+
+## Dependencies
+The dependencies for Proone have been kept to absolute necessities. **libssh2**
+is used for the SSH brute force vector. Coupled with libssh2's SSL backend
+is **Mbedtls** for TLS connection to public name servers and the Heartbeat
+protocol. **zlib** is used to implement binary archive. All the libraries are
+compiled with default configurations. **Pthsem** is used for threading.
+
+**libyaml** and **mariadb-connector-c-devel** is required for **hostinfod**
+build. YAML has been chosen for the configuration file format and MariaDB for DB
+backend.
+
+
+[^1]: Maybe in the future when Toybox gains marketshare?
+[^2]: i.e. representing values in code: `int value = 123;`
diff --git a/doc/user_guide.md b/doc/user_guide.md
new file mode 100644
index 0000000..7f82da5
--- /dev/null
+++ b/doc/user_guide.md
@@ -0,0 +1,2 @@
+# Proone User Guide
+TODO