mirror of
https://github.com/blackboxprogramming/simulation-theory.git
synced 2026-03-17 05:57:19 -05:00
Co-authored-by: blackboxprogramming <118287761+blackboxprogramming@users.noreply.github.com>
Scrapers
Python web scrapers for collecting data relevant to the simulation-theory research repository.
Scrapers
| Script | Source | Topics |
|---|---|---|
arxiv_scraper.py |
arXiv | Simulation hypothesis, Gödel incompleteness, Riemann zeta, qutrit/ternary quantum, halting problem, IIT consciousness |
wikipedia_scraper.py |
Wikipedia | SHA-256, Riemann hypothesis, quantum computing, Euler's identity, fine-structure constant, Turing machine, DNA, Blockchain |
oeis_scraper.py |
OEIS | Prime numbers, Fibonacci, pi digits, Euler–Mascheroni constant, Catalan numbers, partition numbers |
Setup
pip install -r requirements.txt
Usage
arXiv scraper
# Use default topic list
python arxiv_scraper.py
# Custom query, limit to 3 results per query
python arxiv_scraper.py --query "Riemann hypothesis zeros" --max 3
# Save to file
python arxiv_scraper.py --output arxiv_results.json
Wikipedia scraper
# Use default topic list
python wikipedia_scraper.py
# Custom topics
python wikipedia_scraper.py --topics "Riemann hypothesis" "SHA-2" "Turing machine"
# Save to file
python wikipedia_scraper.py --output wikipedia_results.json
OEIS scraper
# Use default sequence list
python oeis_scraper.py
# Custom sequence IDs
python oeis_scraper.py --ids A000040 A000045 A000796
# Save to file
python oeis_scraper.py --output oeis_results.json
Output format
All scrapers output JSON to stdout by default, or to a file with --output.
arXiv — dict keyed by query, each value is a list of:
{
"title": "...",
"authors": ["..."],
"published": "2024-01-01T00:00:00Z",
"abstract": "...",
"url": "https://arxiv.org/abs/..."
}
Wikipedia — list of:
{
"topic": "SHA-2",
"title": "SHA-2",
"url": "https://en.wikipedia.org/wiki/SHA-2",
"summary": "..."
}
OEIS — list of:
{
"id": "A000040",
"name": "The prime numbers.",
"description": "...",
"values": ["2", "3", "5", "7", "11", "..."],
"url": "https://oeis.org/A000040"
}