Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nov 2022 pi_news tutorial not functioning #1

Open
paulwratt opened this issue Nov 27, 2022 · 14 comments
Open

Nov 2022 pi_news tutorial not functioning #1

paulwratt opened this issue Nov 27, 2022 · 14 comments

Comments

@paulwratt
Copy link

paulwratt commented Nov 27, 2022

I am using the RPi Debian 10 download (32bit Buster) from the latest (rampart-0.1.1)

I used it in-place based on the output of ./install.js, and use a . rampart source file with the mentioned explicit PATH set, as I first manually read the file but didn't find any install options I liked for my use case.

https://rampart.dev/docs/tutorial-pi_news.html

I followed the above tut to test out rampart and give me something useful at the same time

There is no mention of how to start the server, even tho the tut mention that the config is pulled from web_server/web_server_conf.js (which is where I found the server start/stop scripts)

The server Starts up ok-ish but its not running right:
There is an error trying to set owner of the PID file at startup.
There is no path to "/images/*" in the 404 output page.
Both directoryFunc: true/dirlist produce 500 Internal error
(BTW that is the only setting I changed, nothing else in the file, even though I read the whole thing a couple of times, comparing it to my console output of the server startup script)

there are things in the tut that are not very clear, or not done (in the initial console output)

like not setting chmod a+w pi_news_aggregator.js and the #! path is wrong for any type of installation besides /usr/local/bin/rampart - I got it to work with . ../rampart ; rampart ./pi_news_aggregator.js --first-run

The /apps/pi_news/search.js does not work. in this repo I found and copied /html/index.html which redirects to /apps/pi_nesw/search.html/ which also does not work, giving a 404 error.

Based on the docs tut , accessing /app/pi_news gives a 404 error and /app/pi_news/ gives a 500 error (mentioned above). On the website (not the docs tut) there is a view source link to an editor that shows the code and location for /apps/pi_news.js which I downloaded (from the editor) and that works (sort-of), at least it brings up HTML in a browser when I access either of those URLS, but the search functionality does not work (there are no results EDIT: for hackerspace - I just tried pico and that works) - During the initial update I noticed there are some items in the title that I wanted to search for (like hackerspace, but that is not how the search functions.

NOTES:

I figured out which "search" page was producing the HTML output by comparing the file sources. It appears that the source code in this repo is (mostly) inline with the docs tut, and much newer that that used in the demo (does not use Bootstrap)

I could post the 500 Internal error text, but its really related to rampart and not any of the tutorials, so I did not post it here atm.

I forked the relevant repos but I didn't want to PR anything yet and I think there is a better way to #! (like perl does) but it needs testing in the other (and default) installation (some of which I can test), so I'd like some input on the above before I start doing anything else ..

@aflin
Copy link
Owner

aflin commented Nov 27, 2022

Sorry for the trouble. I'll get on this as soon as I get back in town and can run through the whole thing with a fresh Pi.

@ramisdb
Copy link
Collaborator

ramisdb commented Nov 28, 2022

Hi @paulwratt,
I created a AWS Lightsail Debian instance and performed the install as you did and ran into some odd permissions issues which I think are related to what you experienced. We're working on figuring out what's going on.

@aflin
Copy link
Owner

aflin commented Nov 28, 2022

Thanks for your comments. Here are a few things we need to do:

  1. Have an INSTALL.txt file for each tutorial, so there is no guessing about how to go about making it work painlessly.
  2. Make the explanation of how to use rampart when not installed as root clearer.
  3. Web server should give you an indication of what to do if you run it before building the database.
  4. Make it clear that you need to edit web_server_conf.js if you want to bind to anything beyond just the local.
  5. Fix the dirlist function.

This is the short version of the order of getting it to work as a non-root user (should work regardless of platform):

rampart@raspberrypi:~ $ wget https://rampart.dev/downloads/rampart-0.1.1/rampart-0.1.1-raspberry_pi_os-buster-armv7l.tar.gz
--2022-11-27 19:57:47--  https://rampart.dev/downloads/rampart-0.1.1/rampart-0.1.1-raspberry_pi_os-buster-armv7l.tar.gz
Resolving rampart.dev (rampart.dev)... 184.105.177.37
Connecting to rampart.dev (rampart.dev)|184.105.177.37|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39197559 (37M) [application/octet-stream]
Saving to: ‘rampart-0.1.1-raspberry_pi_os-buster-armv7l.tar.gz’

rampart-0.1.1-raspberry_pi_os 100%[=================================================>]  37.38M  19.8MB/s    in 1.9s

2022-11-27 19:57:49 (19.8 MB/s) - ‘rampart-0.1.1-raspberry_pi_os-buster-armv7l.tar.gz’ saved [39197559/39197559]

rampart@raspberrypi:~ $ tar -zxf rampart-0.1.1-raspberry_pi_os-buster-armv7l.tar.gz
rampart@raspberrypi:~ $ PATH=/home/rampart/rampart/bin:$PATH
rampart@raspberrypi:~ $ git clone https://github.com/aflin/rampart_tutorials.git
Cloning into 'rampart_tutorials'...
remote: Enumerating objects: 176, done.
remote: Counting objects: 100% (176/176), done.
remote: Compressing objects: 100% (128/128), done.
remote: Total 176 (delta 49), reused 152 (delta 31), pack-reused 0
Receiving objects: 100% (176/176), 27.31 MiB | 11.80 MiB/s, done.
Resolving deltas: 100% (49/49), done.
rampart@raspberrypi:~ $ cd rampart_tutorials/pi_news/
rampart@raspberrypi:~/rampart_tutorials/pi_news $ rampart pi_news_aggregator.js --first-run
Getting new article urls for https://hackaday.com/category/raspberry-pi-2/
200    - https://hackaday.com/robots.txt
200    - https://hackaday.com/category/raspberry-pi-2/
200    - https://hackaday.com/category/raspberry-pi-2/page/2/

...

200    - https://www.raspberrypi.com/news/code-an-homage-to-excitebike-wireframe-66/
updating https://www.raspberrypi.com/news/code-an-homage-to-excitebike-wireframe-66/
200    - https://www.makeuseof.com/install-use-ghost-blogging-platform-raspberry-pi/
updating https://www.makeuseof.com/install-use-ghost-blogging-platform-raspberry-pi/
raspberrypi is up to date
makeuseof is up to date
updating fulltext index
Indexing data:
0%----------------25%-----------------50%-----------------75%--------------100%
###############################################################################
Final merge to index:
0%----------------25%-----------------50%-----------------75%--------------100%
###############################################################################
rampart@raspberrypi:~/rampart_tutorials/pi_news $ cd web_server/
rampart@raspberrypi:~/rampart_tutorials/pi_news/web_server $ #change web server to listen on all ips
rampart@raspberrypi:~/rampart_tutorials/pi_news/web_server $ rex -R'0.0.0.0' 127.0.0.1 web_server_conf.js
rampart@raspberrypi:~/rampart_tutorials/pi_news/web_server $ rex -R'[::]' '\[::1\]' web_server_conf.js
rampart@raspberrypi:~/rampart_tutorials/pi_news/web_server $ ./start_web_server.sh
Starting Web Server
Starting https server
set connection timeout to 20 sec and 0 microseconds
set script timeout to 20 sec and 0 microseconds
HTTP server - initializing with 4 threads
Error: error changing  ownership: Operation not permitted
    at [anon] (/usr/local/src/rampart/src/duktape/globals/rampart-utils.c:2829) internal
    at [anon] () native strict preventsyield
    at global (./web_server_conf.js:301) preventsyield
rampart@raspberrypi:~/rampart_tutorials/pi_news/web_server $ mapping not found  to function   404                  ->    function notfound()
mapping dir  list  to function   Directory List       ->    function dirlist()
mapping dir   path to mod folder ws://wsapps/         ->    module path:/home/rampart/rampart_tutorials/pi_news/web_server/wsapps/
mapping dir   path to mod folder /apps/               ->    module path:/home/rampart/rampart_tutorials/pi_news/web_server/apps/
mapping filesystem folder        /                    ->    /home/rampart/rampart_tutorials/pi_news/web_server/html/
binding to 0.0.0.0 port 8088
binding to :: port 8088

rampart@raspberrypi:~/rampart_tutorials/pi_news/web_server $

After that, you should be able to navigate to http://(pi-ip-addr):8088/ and see the results.

For your other comments:

  1. The #! -- Agreed - When rampart is not installed into or linked to /usr/local/bin/rampart, that clearly doesn't work. I'll look for a way to make that work, or at least make the failure a bit more clear. However, we tend to run scripts as rampart pi_news_aggregator.js --first-run.
  2. The pid ownership error message, if running as non-root, shouldn't happen and is confusing, but is of no consequence. We will fix.
  3. Path to "/images/*" - not sure what you are seeing. It looks ok on this end.
  4. Dirlist function: We changed how stat works and failed to update. Will fix. For now, you can change the line if (st.isDirectory()) to if (st.isDirectory) and set directoryFunc: dirlist.
  5. Differences between what is on the website and what is in the tutorial: We simplified the tutorial in the hopes that it would be easier to understand. We are hoping it is a good starting point for modification and writing your own scripts.

Thanks for giving us a try. We'll get to work on the fixes and I'll close this when they are done.

Let me know if I missed anything.

@paulwratt
Copy link
Author

paulwratt commented Nov 28, 2022

Hey guys, thanks for looking into things. By the lack of Issues I would say its just a lack of exposure, so there are just situations that had not been tried yet (BTW I came here from a Google News article on my phone, if you wondered how I found out about the project).

  1. use Perl style execute - #!/usr/bin/env rampart - that finds it not matter where it is, as long as its in a PATH somewhere
  2. where is PID file going? BTW I have a ~/tmp folder too (on all my systems) which I know is not common, on top of the tmpfs at /tmp (99% of Debian based are tmpfs), but I think they end up in /run linked to /var/run right?
  3. ok, I figured it out, its the name of the default 404 image used, its ends with a t not a d - I'll leave that to you to figure out which one to change (the script or the image name)
  4. OK, cool, I can test that
  5. yeah I figured as much, just lack of exposure (see above comment) meaning lack of corner case tests (which is where I usually end up) - NOTE I would not have got it working without the "older" install on the Demo server. Also altho I found the Tut repo, I did not download it, just copied that one root index.html file (which still gives me 404 on redirect).

I would say "add a piece to the end of tut about what is expected and from where" (ie the urls/paths/config) including "run agregator, wait", "start server", and "what url to use"

I think you guys did a pretty good job overall, considering that you have already updated it 6-7 months ago, and the in-code documentation was a "god-send" which explained alot both related and unrelated to the Tut, for someone who has not read any other part of the Docs yet.

BTW has anyone tested rampart on an Alpine install ..

EDIT:

I would clarify why you need those empty "chat" files at the opening of the Tut, and what if the Chat Tut had already been performed, and like me you on going to run single instances as needed (but with max functionality)

Maybe a note that the current "full text search" does_not_ apply to the Title (seen while watching the aggrigator doing its thing) - an excersize for the user - I got results with "pico" and "hackaday" but not with "hackerspace" (which is present in alot of RPi url Titles - at the end - they have lots of "MagPi" and "WireFrame" too )

maybe a comment on whats expected without --first-run - I presume it updates without the "create" part, and appends data (there is some comment about large data volume over time - in the src?)

I noticed the rex commands in the above output, interesting, shows usage too, but also not changes I made, but changes probably quite a few others might make.

I forgot to mention, I played with Nodejs when it first came out, it seems you guys have captured that simplicity, while adding the functionality, without incurring the Bloat that has caused Nodejs to become a Security risk.

@ramisdb
Copy link
Collaborator

ramisdb commented Nov 28, 2022

I'm not sure why the demo does not index the title but you can fix that by changing the following:

in pi_news_aggregator.js
line 446 from:
"create fulltext index pipages_text_ftx on pipages(text) " +
line 445 to:
"create fulltext index pipages_text_ftx on pipages(title\\text) " +

and in apps/pi_news/search.js
line 143 from:
sqlStatement = "select url, img_url, title, stringformat('%mbH',?query,abstract(text, 0,'querymultiple',?query)) Ab from pipages where text likep ?query";
line 143 to:
sqlStatement = "select url, img_url, title, stringformat('%mbH',?query,abstract(text, 0,'querymultiple',?query)) Ab from pipages where title\\text likep ?query";

See the section on Compound Indexes in the docs here:
https://www.rampart.dev/docs/rampart-sql.html#fulltext-indexes

@paulwratt
Copy link
Author

paulwratt commented Nov 29, 2022

Thanks for that, I had not attempted any code change yet.

My only question at this point is, having followed the Tut, do I need to rather use the Tut repo files to get the urls to work?

With the copied file (from the demo website) into web_server/apps/pi_news.js I get functional access via /apps/pi_news and /apps/pi_news/. Even though I have (as per the Docs Tut) web_server/apps/pi_news/search.js both /apps/pi_news/ and /apps/pi_news/search return a 404, as does the HTML redirect using the Tut repos' web_server/html/index.html which tries to access /apps/pi_new/search.html/ (note the trailing /).

EDIT:

I presume the reason noone has mentioned anything regarding this is that a copy from the Tut repo does work without changes .. yes?

Is there any way to reparse the fulltext index on the current database without using --first-run again, or is that explained in the above Docs reference (like I said, I have not got to any other parts of the Docs yet)

@ramisdb
Copy link
Collaborator

ramisdb commented Nov 29, 2022

I'm not specifically familiar with the pi_news app but I am familiar with Rampart overall. What I noticed is that the tutorial provides its own fairly replete directory tree, and this may be the source of some confusion. I can only provide generic answers here because I don't know the specifics of where you installed things on your machine.

I created a virgin AWS lightsail instance and installed Rampart as the user "rampart" this is what your tree should look like:

rampart@ip-172-26-10-66:~/rampart$ pwd
/home/rampart/rampart
rampart@ip-172-26-10-66:~/rampart$ ls -l
total 52
-rw-r--r-- 1 rampart rampart   599 Jul 12 22:52 LICENSE
drwxr-xr-x 2 rampart rampart  4096 Jul 12 22:52 bin
drwxr-xr-x 3 rampart rampart  4096 Jul 12 22:52 examples
drwxr-xr-x 2 rampart rampart  4096 Jul 12 22:52 include
-rwxr-xr-x 1 rampart rampart 17957 Jul 12 22:52 install.js
drwxr-xr-x 2 rampart rampart  4096 Jul 12 22:52 modules
-rwxr-xr-x 1 rampart rampart   485 Jul 12 22:52 run_tests.sh
drwxr-xr-x 2 rampart rampart  4096 Jul 12 22:52 test
drwxr-xr-x 7 rampart rampart  4096 Nov 29 20:26 web_server
rampart@ip-172-26-10-66:~/rampart$ cd web_server
rampart@ip-172-26-10-66:~/rampart/web_server$ ls -l
total 44
drwxr-xr-x 3 rampart rampart  4096 Jul 12 22:52 apps
drwxr-xr-x 2 rampart rampart  4096 Jul 12 22:52 data
drwxr-xr-x 4 rampart rampart  4096 Jul 12 22:52 html
drwxr-xr-x 2 rampart rampart  4096 Jul 12 22:52 logs
-rwxr-xr-x 1 rampart rampart   339 Jul 12 22:52 start_server.sh
-rwxr-xr-x 1 rampart rampart    36 Jul 12 22:52 stop_server.sh
-rw-r--r-- 1 rampart rampart 15985 Jul 12 22:52  web_server_conf.js
drwxr-xr-x 3 rampart rampart  4096 Jul 12 22:52 wsapps
rampart@ip-172-26-10-66:~/rampart/web_server$ 

All pathing issues for the web server are controlled by the web_server_conf.js that you ran when you started the server. SO, if I started the server with rampart web_server_conf.js while my CWD was the directory above, ./apps would contain all the apps runable via a URL and ./html/* would contain the URL obtainable static web content. This is because of the following lines in web_server_conf.js:

var serverConf = {
 ...
    htmlRoot:       process.scriptPath + "/html",
    appsRoot:       process.scriptPath + "/apps",
    wsappsRoot:     process.scriptPath + "/wsapps",
...
}

Have a look at structure of the Rampart website at https://www.rampart.dev/apps/editor/ and its web
server_conf.js file and it'll give you more insight into how stuff functions. It's all pretty normal and we tried to mimic Nginx's behavior and features in general.

I'm flummoxed by your question about "reparse the fulltext index on the current database." In the pi news application. The --first-run option only affects whether it's starting anew in gathering data from the sites or just getting new articles. This is stated in the script:

// running in two stages - first gets all articles from several index pages
//                       - second gets latest articles from main index page
// First stage is done by running with command line argument '--first-run'

Fulltext indexes are much like any other standard SQL index. They are assistive in speeding things up by avoiding a linear scan of the records in a table. An index is never "reparsed", it is the new or modified data in a table that gets parsed and added to the index. In the aggregator.js script there's a function called make_index() in which he invokes the SQL create fulltext index command on the table. Above that function is a comment which explains this. Presumably he calls make_index() after every execution of the aggregator.js script. Rampart does not have to entirely recreate a fulltext index after any modifications to a table, an index update will only process new or changed rows.

Hope this helps clarify.

@paulwratt
Copy link
Author

paulwratt commented Nov 30, 2022

Based on the Tut, I unzipped rampart to ~/rampart. The Tut then copies the web_server folder to the current directory of where you are applying the Pi-News tutorial.

I did not install rampart. I ran the install.js which has #!./bin/rampart (or similar) and if you dont choose any install method (or cant ) it tells you it was not run with sudo and prints out a PATH string that points to the (full pathname of the) current folder. I added that alone to a "source" file (. rampart) so that when I want to run rampart, I must source that file first. The start/stop server scripts look for rampart in the PATH (not /bin/rampart or /usr/local/bin/rampart) - Rampart is intelligent enough that you can pass it a script that does have #! in it, and will ignore it.

The "reparse the fulltext index on the current database" is based on the above amendments to the code. Without knowing how the rampart fulltext index functionality works, I just presumed that it is like other fulltext db functionality and builds a seperate "index db", and I would guess thats where the confusion is, Rampart fulltext index does not function so, but rather dynamically based of the changes outlined above (maybe, maybe not based on comments about make_index() ).

Anyway, at this point, it seems I need to try a few things, including redoing the Tut in a couple of different ways (with and without the Tut Repo), and see how the above mentioned Title search affects results.

On my setup --first-run takes a long time (and creates the required db structure), and I already have the db full of the require data to build a "fulltext index" that includes the Title (which the Tut does not do). I think I should just be able to carve a few pieces of code, or add a modifier to existing code, that can essentially --re-index .. like I said, because its not in the Tut, maybe just better left as "an exercise for the user" ..

@ramisdb
Copy link
Collaborator

ramisdb commented Nov 30, 2022

The reason first-run takes a long time is because it is getting pages from the sites. Everything else is pretty fast. :)

@paulwratt
Copy link
Author

my internet is slow too, and although it gets the initial pages from just 3 sites, it total there are well over a couple of hundred articles to process .. (hence the "slow" reference ;)

@ramisdb
Copy link
Collaborator

ramisdb commented Nov 30, 2022

It's also because theres a rate limiter built into the page scraper. This avoids making the site owners angry.

var crawlDelay = 10;  // go slow - delay in seconds between fetches from same si
te.

@paulwratt
Copy link
Author

paulwratt commented Dec 2, 2022

hmm. seems those edits are only enough for a new --first-run without some complex (dirty?) hacks

the db knows that the indexing has changed :

sql prep error: 100 Metamorph Index pipages_text_ftx already exists on pipages(text)

as far as I can tell, the only way to "drop unique index" is to physically delete it, but there is probably other meta-data saying the pi_news table has a unique index in one of the other main db files (the CAPS files in web_server/data/pi_news/)

I think I'll just post a PR to add the above code to the Tut Repo, and have a look at the Docs Repo and make sure that tut matches the Tut Repo and post any matching PR for the Docs Repo

.. after I have tested everything, again, from scratch, again ..

EDIT:

actually, there does seem to be a possible way (using "alter"):

/*
    Regular indexes are updated on each insert/update.
    Text indexes, however, need to be manually updated.
      -  When a new row is inserted, it still is available for search,
         but that search is a linear scan of the document, so it is slower.
      -  A text index can be updated with either:
          1) sql.exec("alter index pipages_text_ftx OPTIMIZE");
              or
          2) issuing the same command that created the index as in make_index() below.
         Here we will issue the same command when creating and updating for simplicity.
*/

but I still think the "unique index" needs to be dropped first, so again, the simple-est way is still to run a new --first-run

@ramisdb
Copy link
Collaborator

ramisdb commented Dec 2, 2022

As an aside, Aaron (@aflin) told me that the title text was in-fact included in the main text field as a part of the page parsing and that including it with a compound index (field\field) was not needed in this case.

DO NOT MANUALLY DELETE AN INDEX with a file-system command, it'll mess up the SYSTABLES in the database and effectively trash the whole DB. Use SQL's "drop index INDEXNAME". Some knowledge of SQL general precepts would be useful.

Again, fulltext index creation time for the amount of data you have isn't that long in this case. Dropping and recreating the index should be a relatively trivial matter.

The unique index is present on the URL field to ensure that replicate articles are not inserted. There should be no need to drop that index unless you intend to fully re-crawl all the data (--first-run).

If we start this whole thread anew, there is NOTHING wrong with the application as provided on the site. The main issue and almost all subsequent issues were , I believe, caused by some confusion because the tutorial's files were installed in a different file tree than the main install. Aaron and I had a discussion about this and why it was done this way. His reasoning was that he wanted the tutorials to be self-contained, and I see his point. Had we included all tutorials in the main tarball this problem would not have happened. However, I argued against putting superfluous files in the main tarball because I wanted the installs to be as lean as possible, and to err on the side of no-bloat.

You've used the term "PR" a couple times, I don't know its meaning.

@ramisdb
Copy link
Collaborator

ramisdb commented Dec 20, 2022

Hey @paulwratt , did you ever resolve your issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants