互联网存档系统:ArchiveBox安装使用

简介

ArchiveBox 是一个用Python编写的自托管且功能强大的互联网存档解决方案,是可用于Linux、macOS和Windows系统的跨平台工具。

它使您能够收集、保存和查看要脱机保存的站点,当前ArchiveBox可以设置为命令行工具、桌面应用程序或通过web访问,可以把你想静态化的任何网站进行静态化,包括文本、图片、PDF 甚至视频。

Github地址:https://github.com/ArchiveBox/ArchiveBox/

官方网站:https://archivebox.io/

前期准备

由于pip命令无法使用root权限运行,需要添加一个普通带sudo权限的账号:

adduser archivebox && usermod -a archivebox -G sudo && su archivebox

安装

一键安装

curl -sSL 'https://get.archivebox.io' | sh

手动安装

这边以Ubuntu为例,其他系统可以参考:官方手动安装文档,更好的方式还是Docker。

安装依赖

sudo apt install python3 python3-pip python3-distutils git wget curl youtube-dl
sudo apt install chromium-browser

安装archivebox

python3 -m pip install --upgrade archivebox

警告

  WARNING: The script sqlformat is installed in '/home/allen/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script pygmentize is installed in '/home/allen/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script normalizer is installed in '/home/allen/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script django-admin is installed in '/home/allen/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The scripts ipython and ipython3 are installed in '/home/allen/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script dateparser-download is installed in '/home/allen/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script archivebox is installed in '/home/allen/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.

解决方案:

执行命令:

echo 'export PATH=/home/allen/.local/bin:$PATH' >>~/.bashrc

将黄色警告部分提示的路径复制添加在 export PATH= 后面的, 你需要把你的黄色警告提示的路径复制粘贴替换.

然后再重新安装:

python3 -m pip install --upgrade archivebox

运行

初始化:

mkdir /home/allen/data && cd /home/allen/data
archivebox init

创建管理员账户:

archivebox manage createsuperuser

我的密码设置太简单出现红色的警告。

启动服务:

archivebox server 0.0.0.0:8000

浏览器打开,正常访问。

点击上面的 ADD ,添加 URL 地址:

等待抓取:

一段时间后可以看到抓取成功:

扩展

反向代理

Nginx的简单配置:

server {
    listen 80;
    listen [::]:80;
    server_name archivebox.yydnas.cn;
    index index.php index.html index.htm;

    location / {
    proxy_pass  http://localhost:8000;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header REMOTE-HOST $remote_addr;
    }
}

后台运行

程序默认是在终端中运行,最简单的是运行以下命令:

nohup archivebox server 0.0.0.0:8000 &> /dev/null &

也可以创建一个名为 start-archivebox.sh ,放在你的 archivebox 目录,内容如下:

#!/bin/bash

ps -aux | grep "archivebox server" | grep -v grep > /dev/null
if [ "${?}" == "0" ]; then
 # echo archivebox is running
 exit 1
fi

ABPath=/home/allen/data         #替换为你的安装目录
ABPort=8000

if [ -f ${ABPath}/ArchiveBox.conf ]; then
 cd ${ABPath}
    nohup archivebox server 0.0.0.0:${ABPort} &> /dev/null &
 exit 0
fi

exit 2

运行: bash start-archivebox.sh

这个是参考的知乎上面的一篇文章开源的私人档案馆ArchiveBox简介,及二段补强

最后

这只是最简单的安装,更多的使用方法请查阅 ArchiveBox Usage

不过这个程序好像无法设置语言,默认就是英文界面,但是由于界面元素不多,正常使用肯定是没有问题的。

0 0 投票数
文章评分
订阅评论
提醒
guest
0 评论
内联反馈
查看所有评论
0
希望看到您的想法,请您发表评论x
滚动至顶部