Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new integration tests against MC news-search-api and test ES data #17

Merged
merged 9 commits into from
Dec 8, 2023
71 changes: 71 additions & 0 deletions .github/workflows/mc-integration-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
name: Integration test against news-search-api:main

on:
push:
branches: ["main"]
pull_request:
branches: ["main"]

jobs:
fixture-integration-test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10"]

name: Integration test with dummy ES data
steps:

# setup ES index
- name: Configure sysctl limits
run: |
sudo swapoff -a
sudo sysctl -w vm.swappiness=1
sudo sysctl -w fs.file-max=262144
sudo sysctl -w vm.max_map_count=262144
- name: Run Elasticsearch
uses: elastic/elastic-github-actions/elasticsearch@master
with:
stack-version: 8.8.2
security-enabled: false
- name: Verify Elasticsearch is reachable
run: |
curl --verbose --show-error http://localhost:9200

# setup news-search-api server and dummy data
- name: Checkout news-search-api server
uses: actions/checkout@v4
with:
repository: mediacloud/news-search-api
path: news-search-api
- name: Install news-search-api server python dependencies
working-directory: news-search-api
run: |
pip install -r requirements.txt
- name: Install fixtures
working-directory: news-search-api
run: |
python -m test.create_fixtures
- name: Run news-search-api server
working-directory: news-search-api
run: |
python api.py &
sleep 5
- name: Verify news-search-api server is reachable
working-directory: news-search-api
run: |
curl --verbose --show-error http://localhost:8000

# set up api client code and run test
- name: Main checkout
uses: actions/checkout@v4
with:
path: main
- name: Install python dependencies
working-directory: main
run: |
pip install -e .[dev]
- name: Run integration test
working-directory: main
run: |
pytest waybacknews/tests/test_fixtures.py
27 changes: 0 additions & 27 deletions .github/workflows/pylint.yml

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,33 +1,30 @@
name: do-testing

on:
on:
push:
branches: ["main"]
pull_request:
branches: ["main"]

permissions:
contents: read

jobs:

build:
runs-on: ubuntu-latest
strategy:
strategy:
matrix:
python-version: ["3.8", "3.9", "3.10"]
python-version: ["3.10"]

steps:
- uses: actions/checkout@v3

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v3
with:
python-version: ${{ matrix.python-version }}

- name: Install Deps
run: |
pip install -e .[dev]
pip install -e .[dev]
- name: Run Pytest
run: |
pytest
pytest waybacknews/tests/test_waybacknews.py
59 changes: 59 additions & 0 deletions waybacknews/tests/test_fixtures.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
from unittest import TestCase
import datetime as dt

import waybacknews.searchapi as searchapi

INTEGRATION_TEST_COLLECTION = "mediacloud_test"
INTEGRATION_TEST_HOST = "http://127.0.0.1:8000"


class TestMediaCloudCollection(TestCase):

def setUp(self) -> None:
self._api = searchapi.SearchApiClient(INTEGRATION_TEST_COLLECTION)
self._api.API_BASE_URL = f"{INTEGRATION_TEST_HOST}/{searchapi.VERSION}/"

def test_count(self):
results = self._api.count("*", dt.datetime(2023, 1, 1), dt.datetime(2024, 1, 1))
assert results > 0
assert results < 5000

def test_count_over_time(self):
results = self._api.count_over_time("*", dt.datetime(2020, 1, 1), dt.datetime(2025, 1, 1))
assert len(results) > 30
for day in results:
assert 'date' in day
assert 'count' in day
assert 'timestamp' in day

def test_count_no_results(self):
results = self._api.count("*", dt.datetime(2010, 1, 1), dt.datetime(2010, 1, 1))
assert results == 0

def test_count_date_filter(self):
all = self._api.count("*", dt.datetime(2023, 1, 1), dt.datetime(2024, 1, 1))
assert all > 0
w1 = self._api.count("*", dt.datetime(2023, 11, 1), dt.datetime(2024, 11, 8))
assert all > w1

def test_paged_articles(self):
query = "*"
start_date = dt.datetime(2023, 10, 1)
end_date = dt.datetime(2023, 12, 31)
story_count = self._api.count(query, start_date, end_date)
# make sure test case is reasonable size (ie. more than one page, but not too many pages
assert story_count > 1000
assert story_count < 10000
# fetch first page
page1, next_token1 = self._api.paged_articles(query, start_date, end_date)
assert len(page1) > 0
assert next_token1 is not None
page1_url1 = page1[0]['url']
# grab token, fetch next page
page2, next_token2 = self._api.paged_articles(query, start_date, end_date, pagination_token=next_token1)
assert len(page2) > 0
assert next_token2 is not None
assert next_token1 != next_token2 # verify paging token changed
page2_urls = [s['url'] for s in page2]
assert page1_url1 not in page2_urls # verify pages don't overlap

Loading