App Stream is a node.js utility that allows you to curate comment
operations from the Steem blockchain into a MYSQL DB. You can then access your content from the DB by an API, direct SQL or GUI such as Workbench/PHPMYAdmin.
Consider this challenge:
- a user creates a query(conventional blog post or comment in the Steem terminology) on PeerQuery.com
- someone comments on the post from Steemit.com
- the post author edits the same post using Busy.org
- eventually someone replies to the comment made on the post using the eSteem app
How should Peer Query we track all these activities made from several clients on our posts?
App Stream attempts to address this challenge by storing all posts
made from a target with a key which is defined as /@author/permlink
. Keys of new comment/reply
operations on the Steem Blockchain are compared with the keys in the DB and if a match if found, the new comment
operation is added to the DB as a comment/reply along with its own key. The process is designed to be as efficient and modular as possible.
Aside from solving the above problem, App Stream can be used for lots of other exiting uses including:
- curate all posts made by a client(eg: steemit.com) along with all comments and replies made to these posts no matter which other client was used to make the comments or replies
- curate all posts made by a author(eg: adsactly) along with all comments and replies made to these posts no matter which other client was used to make the comments or replies
- curate all posts made by a category(eg: life) along with all comments and replies made to these posts no matter which other client was used to make the comments or replies
- App Stream also catches all edits to these posts/comments/replies, no match the client/app on which the edit was made from. You can access the curated posts along with their comments/replies from the DB via API or direct SQL query.
- App Stream
- App Stream solution
- Table of Contents
- Quick local install
- Terminology definitions
- Info configurations(optional)
controller
- Features
- Access control
- Instance examples
- Curate all posts from any Steem app
- Curate all posts from the Steem user
adsactly
- Curate all posts from the Steem category
love
- Curate all posts and comments from any Steem app
- Curate all posts and comments from the Steem user
adsactly
- Curate all posts and comments from the Steem category
love
- Curate all posts, comments and replies from any Steem app
- Curate all posts, comments and replies from the Steem user
adsactly
- Curate all posts, comments and replies from the Steem category
love
- Fetch API examples
- Curate API examples
- Stats API examples
- Search API examples
- Filter API examples
- Cool addons
- Cool hacks
- Recommendations
- Limitations
- Contributing
- Code of Conduct
- Acknowledgment
- Happy App Streaming Steem!
Table of Contents generated with DocToc
npm
- MYSQL 5.7+ for
json
support
- Clone repo:
git clone http://github.com/peerquery/app-stream.git
- Switch to working directory:
cd App Stream
- Create a new empty DB with no tables or anything
- Enter the details of your DB into the
.env
file - See a sample of
.env
file in /docs/sample.env.
- Install dependencies:
npm install
- If you do not want default setup, tweak:
/config/config.js
- Run
npm setup
- You system is live, see the console for confirmation messages
- Access your DB content via MYSQL Workbench, PHPMYAdmin or the API on localhost
On the Steem Blockchain, what is identified in conventional sense as post
, comment
and reply
are identified simply as comment
operations.
In this document and App Stream, we use the terms post
, comment
and reply
in the conventional sense. The glossary below maps these terms to their Steem counterparts.
post
as used in this document and in App Stream refers to blog posts
in the conventional sense which on Steem corresponds to a root
comment
transaction with an empty parent_author
field. When fetched from the Steem API, it returns a depth of 0
.
comment
as used in this document and in App Stream refers to comments
in the conventional sense - a response to a post. On the Steem blockchain it corresponds to a comment
transaction whose parent_author
fields are not empty. When fetched from the Steem API, comments
have a depth of 1.
reply
as used in this document and in App Stream refers to response to a comment
in the conventional sense. On the Steem blockchain it corresponds to a comment
transaction whose parent_author
fields are not empty and when fetched from the Steem API, comments
have a depth greater than 1.
Target filter allows you to determine which target
by which to curate operations. The target filter could be an app, username or category depending on the DB engines you set.
For post/comments/replies by app
, the target should be the value of the app
field in the JSON_metadata
of the targeted app. Below are the current app
values used some of Steem clients:
App | config.app_match = 'common' |
config.app_match = 'strict' |
---|---|---|
config.target must be |
config.target must be |
|
Steemit: | steemit |
steemit/0.1 |
Busy: | busy |
busy/2.4.0 |
Dtube: | dtube |
dtube/0.7 |
eSteem: | esteem |
esteem/1.6.0 esteem/1.5.1 |
DSound: | dsound |
dsound/0.3 |
Steem Press: | steempress |
steempress/1.2 steempress/1.1 |
DLive: | dlive |
dlive/0.1 |
Peer Query: | peerquery |
peerquery/beta |
Steem shot: | steepshot |
steepshot/0.1.2.21 |
DMania | dmania |
dmania/0.7 |
Zappl: | zappl |
zappl/0.1 |
Streemian: | streemian |
streemian/scheduledpost |
Beam | beem |
beem/0.19.23 |
Post Promoter | postpromoter |
postpromoter/1.8.9 |
DPorn | dporn.app |
dporn.app/v0.0.3 |
Utopian uses busy/2.4.0
hence you might want to curate by utopian-io
category instead.
Default: config.app_match = 'common'
; alt: config.app_match = 'strict'
common
mode will index all posts coming from an app, regardless of its version. Most Steem clients are frequently updated, yet not users switch to the latest version. Steem Press posts coming from versions steempress/1.2
and steempress/1.1
while SteemJS posts come in as steemjs/test
, steemjs/example
, ... The default mode is common
and it will only consider the name of the app, not its version or whatever comes after the slash: app/whatever
. You could use an SQL query to search the json_metadata
fields and retrieve only content from the app version you want.
strict
mode will ensure that the curated content comes from app versions that matches exactly the defined value, disregarding content from other versions of the app. Eg: in strict mode where app is esteem/1.6.0
, content from esteem
, esteem/1.5.1
, esteem/whatever
apps will be ignored.
This settings only applies to when curating by an app using the following engines:
post-by-app
post-comment-by-app
post-comment-reply-by-app
The option to choose which app
to use can be set in the config
file. The default app
type is set to ops
.
config.app_match = '' is NOT to be confused with
config.source_app`.
config.source_app
ensures that when curating by author
or category
, content is only curated when it was generated by the source
.
Example: I want to curate all posts by author dzivenu
but only when the content originates from Steemit.com - ignoring all those by dzivenu
which are from Busy.org, Utopian, eSteem or any other Steem client.
In that case I would set config.source_app = 'dzivenu'
This setting only applies to when curating by author or category:
post-by-author
post-by-category
post-comment-by-author
post-comment-by-category
post-comment-reply-by-author
post-comment-reply-by-category
The info section of the config are only displayed on URL endpoint of the app. Below are the default settings, edit them according to your brand.
//app info
config.app_name = 'app stream'; // the name of your system after your brand
config.app_version = '1.0.0'; // app version
config.api_version = '1.0.0'; // api version
config.app_owner = ''; // name of the project owner
config.app_admin = ''; // name of project admin
//filter settings
config.target = 'steemit'; // target whose posts you want to curate, could be user, app or category
config.streamer_app = 'ops'; // determine which streamer app(method) to use. second option is 'blocks'
config.db_engine = 'post-by-app'; // determines activities(post/comment/reply) to curate and type(user/app/category)
config.app_match = 'common'; // for curating by app only. eg: 'common' for 'steemit' and 'strict' for 'steemit/0.1'
config.source_app = ''; // determines if curation should be done for only the specified 'source_app' app
Remember that the default values of these values are OK. You could run the program for production with them.
Default: config.app_state = 'on'
. Turn off: config.app_state = 'off'
Turns the core indexing functionality on/off. If you have run the app with this value set as on for a while with your DB populated and you no longer want to index new content into the DB, then turn off the indexing function by setting this option to off. If 'off' API will work but DB will not receive new data, which means you can still use the API to serve already indexed content - provided the API function is also not turned off.
Default: config.api_state = 'on'
. Turn off: config.api_state = 'off'
Turns API functionality on/off. In case you do not want to enable API access to indexed content, turn off api functionality with the option. You can turn the APi off id you want to access content using only direct SQL query.
config.api_mode = 'open'
Does nothing for now. Meant to signal if API key is required for API use.
Default: config.subdomain = 'api'
. Turn off using config.subdomain = ''
This determines if your server should use a subdomain like the default api.whatevermy.host
or rpc
, data
, node
, ...
Default: config.api_guide = 'on'
. Turn off using config.api_guide = 'on'
Determines whether or not to enable API guides on API endpoint bases, further explanation on api guides can be found below.
Default: config.db_setup = 'true'
. Turn off using config.db_setup = 'false'
The DB of App Stream is very complex and consists of:
- 2 tables with all kinds of fields: json, text, timestamp, varchar, auto-increment, ...
- 1 view of the each of the two tables above
- 3 complex conditional db indexing stored procedures each with 11 IN variables and 1 OUT variable
- 10 db api stored procedures with IN and OUT variables
Manual setup is difficult and will generate lots of errors due to the different types of MYSQL clients and versions. Also, the main indexer app has to run only after the DB set up is successful.
db_setup
automates the process by a custom db_manager
module which will first perform the following actions and only activates the core system when it has succeeded without error:
- connect to DB
- create tables for
posts
andcomment/replies
- if they do not exist - create views for
posts
andcomment/replies
- if they do not exist - create all stored procedures for db engines - if they do not exist
- create all stored procedures for db apis - if they do not exist
- logs all progress and termites process if error occurs
- activates main app if db set up is successful
These activities are necessary whenever you are connecting to a DB for the first time, as it will set up the db complete with tables, views and all stored procedures.
You may still leave it on even when your are reconnecting to DB on which has been already setup - it will not delete any existing content.
Despite the complexity, the db_manager
is fully automated and near instant in speed! The raw sql files can be viewed in the sql
folder.
App Stream indexes all posts and their subsequent comments and replies from its target
, along with all edits to them.
Due to the modular nature of most functions, App Stream only loads modules which it needs.
Example 1: fetching one of the two apps
- It fetches one of the two apps(ops/blocks) as defined by user in the
config.js
using:var app = require('./src/app/' + config.streamer_app);
- It turns
app
on/off as defined by user inconfig.js
using:if (config.app_state == 'on') app();
Example 2: fetching and configuring subdomain
- It configures subdomain using:
if (config.subdomain !== '') server.use(subdomain(config.subdomain, express.Router() ));
Example 3: fetching, configuring and using one of the six db engines
- It determines DB engine using as set by user in
config.js
usingvar db_engine = require('./../../config/config').db_engine;
- It fetches the engine using:
var engine = require('./../../src/db-engines/' + db_engine);
- It then uses the engine like this:
engine(op, tx_id, block, timestamp)
App Stream activity stream processors. There are two types:
ops
- streamsoperations
and filters forcomment
operations - courtesy of @almost-digitalblocks
- streamsblock numbers
, then fetchesblocks
and then filters forcomment
operations courtesy of @howo.
Supported curation routes(for posts
since they are the primary source of activity):
app
- curatepost
only when itsapp
value inJSON_metadata
corresponds to the target apptag/category
- curatepost
only when itsparent_permlink
value corresponds to the target appauthor
- curatepost
only when itsauthor
value inJSON_metadata
corresponds to the target app
Supported curated activities include:
post
comment
replies
votes
- coming soon
Supported engines for data processing include:
post-by-app
post-by-author
post-by-category
post-comment-by-app
post-comment-by-author
post-comment-by-category
post-comment-reply-by-app
post-comment-reply-by-author
post-comment-reply-by-category
App Stream uses a custom CORS module for handling CORS access to your server/endpoint. Without the proper CORS, external websites would not be able to acess data from the API.
The default CORS as can be seen in /config/cors.js
and allows you to list a list of allowed origins that may access the API.
Make sure you add the IP from which you hope to access the API from to the list. By default only two origins are allowed ['lvh.me', 'localhost']
.
There is also a template in the same file to allow you to open up the origin to allow all sites to access it. This is not enabled by default to prevent against potential unauthorized access of your APIs by web origins/bots - in high volumes it may overload your server.
Web crawlers will crawl anything on the web. We do not want our APIs exposed so the robots file in routes/robots/robots.js
is as follows:
User-agent: *
Disallow: /
This can be accessed from the endpoint whatevermy.host:port/robots.txt
. On localhost:80 its at localhost/robots.txt
Remember that there are only two places to configure during run:
.env
file - server and DB settingsconfig.js
- main config files, this is where there entire configuration happens
Below are the corresponding settings for the config.js
files in the respective instances.
Set value of config.target
in the filter settings
section of the config.js
file to the app's name, except for Utopian which uses busy
- so you might want to curate by utopian-io
category instead for Utopian posts.
config.target = 'steemit'; // target app/client is Steemit.com
config.streamer_app = 'ops'; // default, you can set to 'blocks' if you prefer that app
config.db_engine = 'post-by-app'; // activates the 'post-by-app' engine
config.source_app = ''; // it is recommended that you leave this option as it is
config.db_setup = true'; //must be 'true' for the first run on a DB
Set value of config.target
in the filter settings
section of the config.js
file to the Steem username of your target.
config.target = 'adsactly'; // target account is 'adsactly'
config.streamer_app = 'ops'; // default, you can set to 'blocks' if you prefer that app
config.db_engine = 'post-by-author'; // activates the 'post-by-author' engine
config.source_app = ''; // it is recommended that you leave this option as it is
config.db_setup = true'; //must be 'true' for the first run on a DB
Set value of config.target
in the filter settings
section of the config.js
file to the category you want to target.
config.target = 'love'; // target category is 'love'
config.streamer_app = 'ops'; // default, you can set to 'blocks' if you prefer that app
config.db_engine = 'post-by-category'; // activates the 'post-by-category' engine
config.source_app = ''; // it is recommended that you leave this option as it is
config.db_setup = true'; //must be 'true' for the first run on a DB
Set value of config.target
in the filter settings
section of the config.js
file to the app id(as listed above).
config.target = 'steemit'; // target app/client is Steemit.com
config.streamer_app = 'ops'; // default, you can set to 'blocks' if you prefer that app
config.db_engine = 'post-comment-by-app'; // activates the 'post-comment-by-app' engine
config.source_app = ''; // it is recommended that you leave this option as it is
config.db_setup = true'; //must be 'true' for the first run on a DB
Set value of config.target
in the filter settings
section of the config.js
file to the Steem username of your target.
config.target = 'adsactly'; // target account is 'adsactly'
config.streamer_app = 'ops'; // default, you can set to 'blocks' if you prefer that app
config.db_engine = 'post-comment-by-author'; // activates the 'post-comment-by-author' engine
config.source_app = ''; // it is recommended that you leave this option as it is
config.db_setup = true'; //must be 'true' for the first run on a DB
Set value of config.target
in the filter settings
section of the config.js
file to the category you want to target.
config.target = 'love'; // target category is 'love'
config.streamer_app = 'ops'; // default, you can set to 'blocks' if you prefer that app
config.db_engine = 'post-comment-by-category'; // activates the 'post-comment-by-category' engine
config.source_app = ''; // it is recommended that you leave this option as it is
config.db_setup = true'; //must be 'true' for the first run on a DB
Set value of config.target
in the filter settings
section of the config.js
file to the app id(as listed above).
config.target = 'steemit'; // target app/client is Steemit.com
config.streamer_app = 'ops'; // default, you can set to 'blocks' if you prefer that app
config.db_engine = 'post-comment-reply-by-app'; // activates the 'post-comment-reply-by-app' engine
config.source_app = ''; // it is recommended that you leave this option as it is
config.db_setup = true'; //must be 'true' for the first run on a DB
Set value of config.target
in the filter settings
section of the config.js
file to the Steem username of your target.
config.target = 'adsactly'; // target account is 'adsactly'
config.streamer_app = 'ops'; // default, you can set to 'blocks' if you prefer that app
config.db_engine = 'post-comment-reply-by-author'; // activates the 'post-replies-comment-by-author' engine
config.source_app = ''; // it is recommended that you leave this option as it is
config.db_setup = true'; //must be 'true' for the first run on a DB
Set value of config.target
in the filter settings
section of the config.js
file to the category you want to target.
config.target = 'love'; // target category is 'love'
config.streamer_app = 'ops'; // default, you can set to 'blocks' if you prefer that app
config.db_engine = 'post-comment-reply-by-category'; // activates the 'post-comment-reply-by-category' engine
config.source_app = ''; // it is recommended that you leave this option as it is
config.db_setup = true'; //must be 'true' for the first run on a DB
After server is set up, and has populated content, you can them access the API routes, provided the API function is not turned off by setting config.api_state = 'off'
.
Again, remember that you can only access content which has already been indexed since the app began to run.
Scheme:
/api/fetch/type/@:author/:permlink
Example, provided that the link is expected to be have been indexed:
/api/fetch/type/@elear/utopian-already-rewarding-early-adopters
Returns:
post
if url is a postcomment
if url is a comment or replynone
if not found as post, comment or reply
Scheme:
/api/fetch/@:author/:permlink
Example, provided that the link is expected to be have been indexed:
/api/fetch/@elear/utopian-already-rewarding-early-adopters
Returns a json
with the following for post:
id
block
tx_id
author
permlink
category
title
body
json_metadata
timestamp
last_update
url
depth
Returns a json
with the following for comment/reply:
id
block
tx_id
author
permlink
parent_author
parent_permlink
body
timestamp
json_metadata
url
depth
source
This function will search the post table for the url and if it does not find it will then search the comments/replies table.
This method for fetching might not be recommended if you already know the type(post or comment/reply) of url you are requesting since it would demand the DB to search two tables to find the url.
If you already know the type of a url, use the functions below to reduce load on the DB server.
Scheme:
/api/fetch/post/@:author/:permlink
Example, provided that the link is expected to be have been indexed:
api/fetch/post/@elear/utopian-already-rewarding-early-adopters
Returns a json
with the following for post:
id
block
tx_id
author
permlink
category
title
body
json_metadata
timestamp
last_update
url
depth
/api/fetch/comment/@:author/:permlink
Example, provided that the link is expected to be have been indexed:
/api/fetch/comment/@blockrush/re-elear-utopian-already-rewarding-early-adopters-20171005t143700907z
Returns a json
with the following for comment/reply:
id
block
tx_id
author
permlink
parent_author
parent_permlink
body
timestamp
json_metadata
url
depth
source
Scheme: /api/curate/:num
Returns a json
containing only the url
, body
, title
and timestamp
of the latest num
number of posts
Exmaple: /api/curate/10
- returns a json
of url
, body
, title
and timestamp
of latest 10 posts
Scheme: /api/curate/comments/:num
Returns a json
containing only the urls
,body
, and timestamp
of the latest num
number of comments
Exmaple: /api/curate/comments/10
- returns a json
of urls
, body
, and timestamp
of latest 10 comments
Scheme: /api/curate/replies/:num
Returns a json
containing only the urls
, body
, and timestamp
of the latest num
number of comments and replies
Exmaple: /api/curate/replies/10
- returns a json
of urls
, body
, and timestamp
of latest 10 comments and replies
Scheme: /api/stats/count/posts/:days
Returns json
object of post count for days
Maximum is restricted to 100 days to avoid overloading the DB; days counts over 100 will be reset to 100.
Scheme: /api/stats/count/comments/:days
Returns json
object of comment count for days
Maximum is restricted to 100 days to avoid overloading the DB; days counts over 100 will be reset to 100.
Scheme: /api/stats/count/replies/:days
Returns json
object of comment and replies count for days
Maximum is restricted to 100 days to avoid overloading the DB; days counts over 100 will be reset to 100.
There are four search APIs:
/api/search/title/:text
- findtext
in post titles/api/search/author/:author/:text
- find post by author withtext
in post titles/api/search/category/:category/:text
- find posts from category withtext
in post titles/api/search/category/author/:category/:author/:text
- find posts by an author from a category withtext
in post titles
Example: search for posts by author @stoodkev
in #dev
category whose title contain 'SteemJS', do: /api/search/category/author/dev/stoodkev/SteemJS
. The results are a JSON
with the fields:
id
block
tx_id
(steem transaction id)author
permlink
category
title
body
json_metadata
timestamp
url
last_update
- and
depth
Sometimes you might just want to return posts from a particular author or just a particular category. The new filter APIs allow just that. Endpoint base with minimalist docs: localhost/api/filter, and like the other, the default max returned rows is 20 for performance issues.
localhost/api/filter/author/:author, eg: localhost/api/filter/author/utopian-io would return 20 recent posts from the @utopian-io. The results are a JSON with the fields:
id
block
tx_id
(steem transaction id)author
permlink
category
title
body
json_metadata
timestamp
url
last_update
- and
depth
.
This would not be necessary if your App Stream's target is post/comment/reply by author
- if you are already curating by author, then everything in your DB is only from your target author
.
localhost/api/filter/category/:category
, eg: localhost/api/filter/category/life
would return 20 recent posts from the life category. The results are a JSON
with the fields:
id
block
tx_id
(steem transaction id)author
permlink
category
title
body
json_metadata
timestamp
url
last_update
- and
depth
.
This would not be necessary if your App Stream's target is post/comment/reply by category
- if you are already curating by category, then everything in your DB is only from your target category
.
API guides provide API endpoint bases with basic documentation. There are 4 API endpoint bases:
/api
/api/fetch
/api/curate
/api/stats
Example, when api guides is on, the endpoint base localhost/api/curate
serves a highlighted version of the following plain text:
API guide for curate route
Below are the supported calls for the /curate API route
------------------------------------------------------------
/api/curate/:num - fetch latest ? posts
/api/curate/comments/:num - fetch latest ? comments
/api/curate/replies/:num - fetch latest ? comments and replies
------------------------------------------------------------
num is number of posts to be returned, maximum is set to 100
example: to get latest 20 posts for curation do: /api/curate/20
return to main API home page at ./api
For each post, comment or reply indexed, its appropriate depth value is automatically generated.
App Stream support subdomains. This can be configured in the control
section of the config.js
file.
Default is config.subdomain = 'api';
. You can set the subdomain to whatever you want config.subdomain = 'rpc';
, config.subdomain = 'data';
, ...
On localhost you can acess the default api subdomain: api.localhost
. Turn the subdomain functionality off by setting config.subdomain = '';
- Modular custom functions for portability
- Use we raw SQL instead of ORM/Sequelize for speed
- MYSQL for popularity and ease of use
- Stored procedures instead of inline queries for complex queries.
- Connection pools instead of single connections
dsteem
instead ofsteemjs
for speed
Aside from using App Stream to index content from your client, there are some exciting potential uses too.
You can remove the target filters so app will curate all posts, comments and replies from all Steem regardless of the app/client, author or category.
You could host App Stream and serve data over the API.
You can pair App Stream with your hivemind
node for polarizingly richer data access interface/platform.
Now with App Stream you can index all posts, comments and replies of all posts made on any client, by any user or tag.
This is a great tool for experiments, monitoring and even analysis of posts from a Steem app, author or category.
Always remember that in oder to index content, the app has to literally:
- stream all new Steem operations,
- curate
comment
operations - apply your filters
- compare every new Steem post with already indexed posts in the MYSQL DB.
Imagine comparing every single new comment with over 1 million entries on two different tables(if not found in posts table then search comments/replies table) and performing an insert/update transaction based on the result of the comparison.
This process means the server load is already potentially at its max. Avoid adding more queries to interfere with the content indexing DB processes else it may fail to process some new contents.
Keep this understanding of the potentially maxed out DB server state in mind when building a custom API or doing a custom MYSQL query, hence keep your queries as easy/light as possible.
Also, it is advised to maintain a separate DB for other needs of your site instead of creating new tables for other use, and to run this app as a standalone system solely for the purposes of indexing and serving by API.
App Stream only curates content created from the moment it was setup; it does not curate content created before it was setup. If you want to curate content already created we would recommend using Steemit Inc's SBDS
solution.
App Stream does not yet curate votes. This feature will be added to through subsequent updates.
App Stream will not curate any Steem post that lacks a valid json_metadata
field for the following reasons:
- MYSQL generate error when trying to insert content where the
json_metadata
field is empty - strict app mode only works if we are able to find the
app
from thejson_metadata
- we consider it good practice for clients/apps to add
json_metadata
to their posts
Most main stream Steem clients provide valid json_metadata
fields content so this should not be a problem.
Kindly refer to the Contributor guide.
App Stream adheres to Peer Query's Community standards: /docs/code-of-conduct.
Thanks to @smooth and @howo for their recommendations on the db indexing structure.
- The Peer Query team