```
feat(readme): 对部分文本进行格式调整,包括金额数字空格分隔、API 参数说明优化、标题层级对齐等,提升可读性。 ```
This commit is contained in:
33
.gitignore
vendored
Normal file
33
.gitignore
vendored
Normal file
@@ -0,0 +1,33 @@
|
||||
# 依赖目录
|
||||
node_modules/
|
||||
|
||||
# 日志文件
|
||||
npm-debug.log*
|
||||
yarn-debug.log*
|
||||
yarn-error.log*
|
||||
pnpm-debug.log*
|
||||
|
||||
# 环境变量文件
|
||||
.env
|
||||
.env.local
|
||||
.env.*.local
|
||||
|
||||
# 编辑器目录和文件
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
|
||||
# 操作系统文件
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
|
||||
# 构建输出
|
||||
dist/
|
||||
build/
|
||||
*.log
|
||||
|
||||
# 临时文件
|
||||
*.tmp
|
||||
.cache/
|
||||
108
README.md
108
README.md
@@ -1,17 +1,17 @@
|
||||
# 南京公共工程建设中心 - 公告抓取工具
|
||||
# 南京公共工程建设中心 - 公告采集工具
|
||||
|
||||
一个用于抓取南京公共工程建设中心公告信息的 Web 可视化工具。
|
||||
一个用于采集南京公共工程建设中心公告信息的 Web 可视化工具。
|
||||
|
||||
## 功能特性
|
||||
|
||||
- ✅ 抓取公告列表(支持分页)
|
||||
- ✅ 按时间范围智能抓取
|
||||
- ✅ 抓取公告详情内容
|
||||
- ✅ 采集公告列表(支持分页)
|
||||
- ✅ 按时间范围智能采集
|
||||
- ✅ 采集公告详情内容
|
||||
- ✅ 智能提取预算金额
|
||||
- ✅ 生成统计报告
|
||||
- ✅ Web可视化界面
|
||||
- ✅ 导出Word/Markdown报告
|
||||
- ✅ RESTful API支持
|
||||
- ✅ Web 可视化界面
|
||||
- ✅ 导出 Word/Markdown 报告
|
||||
- ✅ RESTful API 支持
|
||||
|
||||
## 安装
|
||||
|
||||
@@ -34,21 +34,24 @@ npm start
|
||||
### 3. 功能介绍
|
||||
|
||||
**公告列表标签**
|
||||
|
||||
- 快速查看所有公告
|
||||
- 支持分页浏览
|
||||
- 一键获取最新公告列表
|
||||
|
||||
**详情抓取标签**
|
||||
- 批量抓取公告详情
|
||||
- 支持按时间范围抓取
|
||||
**详情采集标签**
|
||||
|
||||
- 批量采集公告详情
|
||||
- 支持按时间范围采集
|
||||
- 自动提取预算金额
|
||||
- 可自定义抓取数量
|
||||
- 可自定义采集数量
|
||||
|
||||
**生成报告标签**
|
||||
|
||||
- 支持按时间范围生成报告
|
||||
- 设置金额阈值筛选项目
|
||||
- 实时统计项目信息
|
||||
- 一键导出Word/Markdown报告
|
||||
- 一键导出 Word/Markdown 报告
|
||||
|
||||
## 报告示例
|
||||
|
||||
@@ -60,8 +63,8 @@ npm start
|
||||
## 统计摘要
|
||||
|
||||
- 总项目数: 10
|
||||
- 超过50万元的项目: 3
|
||||
- 总金额: 5395.50万元
|
||||
- 超过 50 万元的项目: 3
|
||||
- 总金额: 5395.50 万元
|
||||
|
||||
## 项目列表
|
||||
|
||||
@@ -69,7 +72,7 @@ npm start
|
||||
|
||||
- **发布日期**: 2025-12-12
|
||||
- **发布时间**: 2025-12-12 10:35:00
|
||||
- **预算金额**: 5000万元
|
||||
- **预算金额**: 5000 万元
|
||||
- **链接**: https://...
|
||||
```
|
||||
|
||||
@@ -78,14 +81,18 @@ npm start
|
||||
服务器启动后提供以下 RESTful API 接口:
|
||||
|
||||
### 1. 获取公告列表
|
||||
|
||||
```
|
||||
GET /api/list?url=<列表页URL>&page=<页码>
|
||||
```
|
||||
|
||||
参数:
|
||||
- `url` (可选): 列表页URL,默认为官网首页
|
||||
- `page` (可选): 页码,默认为1
|
||||
|
||||
- `url` (可选): 列表页 URL,默认为官网首页
|
||||
- `page` (可选): 页码,默认为 1
|
||||
|
||||
### 2. 按时间范围获取列表
|
||||
|
||||
```
|
||||
POST /api/list-daterange
|
||||
Content-Type: application/json
|
||||
@@ -98,6 +105,7 @@ Content-Type: application/json
|
||||
```
|
||||
|
||||
### 3. 批量获取详情
|
||||
|
||||
```
|
||||
POST /api/details
|
||||
Content-Type: application/json
|
||||
@@ -109,6 +117,7 @@ Content-Type: application/json
|
||||
```
|
||||
|
||||
### 4. 生成报告
|
||||
|
||||
```
|
||||
POST /api/report
|
||||
Content-Type: application/json
|
||||
@@ -121,6 +130,7 @@ Content-Type: application/json
|
||||
```
|
||||
|
||||
### 5. 按时间范围生成报告
|
||||
|
||||
```
|
||||
POST /api/report-daterange
|
||||
Content-Type: application/json
|
||||
@@ -137,8 +147,8 @@ Content-Type: application/json
|
||||
|
||||
- **后端**: Node.js + Express
|
||||
- **爬虫**: Axios + Cheerio
|
||||
- **前端**: 原生HTML/CSS/JavaScript
|
||||
- **编码处理**: iconv-lite (支持GBK/UTF-8)
|
||||
- **前端**: 原生 HTML/CSS/JavaScript
|
||||
- **编码处理**: iconv-lite (支持 GBK/UTF-8)
|
||||
- **文档导出**: docx.js
|
||||
|
||||
## 项目结构
|
||||
@@ -156,61 +166,69 @@ Content-Type: application/json
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. 抓取速度已限制为每条延迟500ms-1s,避免请求过快
|
||||
1. 采集速度已限制为每条延迟 500ms-1s,避免请求过快
|
||||
2. 仅支持 gjzx.nanjing.gov.cn 域名的详情页解析
|
||||
3. 金额提取基于正则匹配,支持多种格式(预算金额、最高限价等)
|
||||
4. Web服务器默认端口3000,可在 server.js 中修改
|
||||
5. 按时间范围抓取会在检测到所有公告早于起始日期时自动停止
|
||||
6. 编码自动识别,支持GBK和UTF-8网页
|
||||
4. Web 服务器默认端口 3000,可在 server.js 中修改
|
||||
5. 按时间范围采集会在检测到所有公告早于起始日期时自动停止
|
||||
6. 编码自动识别,支持 GBK 和 UTF-8 网页
|
||||
|
||||
## 核心功能说明
|
||||
|
||||
### 时间范围抓取逻辑
|
||||
### 时间范围采集逻辑
|
||||
|
||||
按时间范围抓取时,程序会:
|
||||
1. 从第一页开始顺序抓取
|
||||
按时间范围采集时,程序会:
|
||||
|
||||
1. 从第一页开始顺序采集
|
||||
2. 检查每页公告的日期是否在指定范围内
|
||||
3. 如果某页所有公告都早于起始日期,自动停止抓取
|
||||
4. 支持设置最大页数限制,避免过度抓取
|
||||
3. 如果某页所有公告都早于起始日期,自动停止采集
|
||||
4. 支持设置最大页数限制,避免过度采集
|
||||
|
||||
### 金额提取规则
|
||||
|
||||
支持识别以下格式:
|
||||
- 预算金额: XX万元
|
||||
- 最高限价: XX万元
|
||||
- 预算: XX万元
|
||||
- 金额: XX万元
|
||||
- 直接数字: XX万元
|
||||
|
||||
- 预算金额: XX 万元
|
||||
- 最高限价: XX 万元
|
||||
- 预算: XX 万元
|
||||
- 金额: XX 万元
|
||||
- 直接数字: XX 万元
|
||||
|
||||
### 编码处理
|
||||
|
||||
自动识别网页编码:
|
||||
|
||||
- 优先读取 Content-Type 中的 charset
|
||||
- 自动处理 GBK、GB2312 编码
|
||||
- 默认使用 UTF-8
|
||||
|
||||
## 常见问题
|
||||
|
||||
### Q: 为什么抓取速度比较慢?
|
||||
A: 为了避免对服务器造成过大压力,程序限制了请求频率(每条延迟500ms-1s)。这是一个负责任的爬虫设计。
|
||||
### Q: 为什么采集速度比较慢?
|
||||
|
||||
### Q: 如何抓取指定日期范围的公告?
|
||||
A: 在Web界面的"详情抓取"和"生成报告"标签中勾选"按时间范围抓取",然后输入起始和结束日期即可。
|
||||
A: 为了避免对服务器造成过大压力,程序限制了请求频率(每条延迟 500ms-1s)。这是一个负责任的爬虫设计。
|
||||
|
||||
### Q: 如何采集指定日期范围的公告?
|
||||
|
||||
A: 在 Web 界面的"详情采集"和"生成报告"标签中勾选"按时间范围采集",然后输入起始和结束日期即可。
|
||||
|
||||
### Q: 导出的报告在哪里?
|
||||
A: 点击"导出Word"或"导出Markdown"按钮后会自动下载到浏览器的默认下载目录。
|
||||
|
||||
### Q: 可以抓取其他网站吗?
|
||||
A: 需要修改 server.js 中的 BASE_URL 和相应的解析函数,因为不同网站的HTML结构不同。
|
||||
A: 点击"导出 Word"或"导出 Markdown"按钮后会自动下载到浏览器的默认下载目录。
|
||||
|
||||
### Q: 可以采集其他网站吗?
|
||||
|
||||
A: 需要修改 server.js 中的 BASE_URL 和相应的解析函数,因为不同网站的 HTML 结构不同。
|
||||
|
||||
## 更新日志
|
||||
|
||||
### v1.0.0 (2025-12-12)
|
||||
- Web可视化界面
|
||||
- 支持按时间范围抓取
|
||||
|
||||
- Web 可视化界面
|
||||
- 支持按时间范围采集
|
||||
- 支持分页浏览
|
||||
- 支持导出Word/Markdown报告
|
||||
- RESTful API接口
|
||||
- 支持导出 Word/Markdown 报告
|
||||
- RESTful API 接口
|
||||
- 自动编码识别
|
||||
- 智能金额提取
|
||||
|
||||
|
||||
81
node_modules/.package-lock.json
generated
vendored
81
node_modules/.package-lock.json
generated
vendored
@@ -4,6 +4,46 @@
|
||||
"lockfileVersion": 3,
|
||||
"requires": true,
|
||||
"packages": {
|
||||
"node_modules/@napi-rs/canvas": {
|
||||
"version": "0.1.80",
|
||||
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas/-/canvas-0.1.80.tgz",
|
||||
"integrity": "sha512-DxuT1ClnIPts1kQx8FBmkk4BQDTfI5kIzywAaMjQSXfNnra5UFU9PwurXrl+Je3bJ6BGsp/zmshVVFbCmyI+ww==",
|
||||
"license": "MIT",
|
||||
"workspaces": [
|
||||
"e2e/*"
|
||||
],
|
||||
"engines": {
|
||||
"node": ">= 10"
|
||||
},
|
||||
"optionalDependencies": {
|
||||
"@napi-rs/canvas-android-arm64": "0.1.80",
|
||||
"@napi-rs/canvas-darwin-arm64": "0.1.80",
|
||||
"@napi-rs/canvas-darwin-x64": "0.1.80",
|
||||
"@napi-rs/canvas-linux-arm-gnueabihf": "0.1.80",
|
||||
"@napi-rs/canvas-linux-arm64-gnu": "0.1.80",
|
||||
"@napi-rs/canvas-linux-arm64-musl": "0.1.80",
|
||||
"@napi-rs/canvas-linux-riscv64-gnu": "0.1.80",
|
||||
"@napi-rs/canvas-linux-x64-gnu": "0.1.80",
|
||||
"@napi-rs/canvas-linux-x64-musl": "0.1.80",
|
||||
"@napi-rs/canvas-win32-x64-msvc": "0.1.80"
|
||||
}
|
||||
},
|
||||
"node_modules/@napi-rs/canvas-win32-x64-msvc": {
|
||||
"version": "0.1.80",
|
||||
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-win32-x64-msvc/-/canvas-win32-x64-msvc-0.1.80.tgz",
|
||||
"integrity": "sha512-Z8jPsM6df5V8B1HrCHB05+bDiCxjE9QA//3YrkKIdVDEwn5RKaqOxCJDRJkl48cJbylcrJbW4HxZbTte8juuPg==",
|
||||
"cpu": [
|
||||
"x64"
|
||||
],
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"os": [
|
||||
"win32"
|
||||
],
|
||||
"engines": {
|
||||
"node": ">= 10"
|
||||
}
|
||||
},
|
||||
"node_modules/@types/node": {
|
||||
"version": "24.10.3",
|
||||
"resolved": "https://registry.npmmirror.com/@types/node/-/node-24.10.3.tgz",
|
||||
@@ -971,6 +1011,15 @@
|
||||
"node": ">= 0.6"
|
||||
}
|
||||
},
|
||||
"node_modules/nodemailer": {
|
||||
"version": "7.0.11",
|
||||
"resolved": "https://registry.npmmirror.com/nodemailer/-/nodemailer-7.0.11.tgz",
|
||||
"integrity": "sha512-gnXhNRE0FNhD7wPSCGhdNh46Hs6nm+uTyg+Kq0cZukNQiYdnCsoQjodNP9BQVG9XrcK/v6/MgpAPBUFyzh9pvw==",
|
||||
"license": "MIT-0",
|
||||
"engines": {
|
||||
"node": ">=6.0.0"
|
||||
}
|
||||
},
|
||||
"node_modules/nth-check": {
|
||||
"version": "2.1.1",
|
||||
"resolved": "https://registry.npmmirror.com/nth-check/-/nth-check-2.1.1.tgz",
|
||||
@@ -1099,6 +1148,38 @@
|
||||
"url": "https://opencollective.com/express"
|
||||
}
|
||||
},
|
||||
"node_modules/pdf-parse": {
|
||||
"version": "2.4.5",
|
||||
"resolved": "https://registry.npmmirror.com/pdf-parse/-/pdf-parse-2.4.5.tgz",
|
||||
"integrity": "sha512-mHU89HGh7v+4u2ubfnevJ03lmPgQ5WU4CxAVmTSh/sxVTEDYd1er/dKS/A6vg77NX47KTEoihq8jZBLr8Cxuwg==",
|
||||
"license": "Apache-2.0",
|
||||
"dependencies": {
|
||||
"@napi-rs/canvas": "0.1.80",
|
||||
"pdfjs-dist": "5.4.296"
|
||||
},
|
||||
"bin": {
|
||||
"pdf-parse": "bin/cli.mjs"
|
||||
},
|
||||
"engines": {
|
||||
"node": ">=20.16.0 <21 || >=22.3.0"
|
||||
},
|
||||
"funding": {
|
||||
"type": "github",
|
||||
"url": "https://github.com/sponsors/mehmet-kozan"
|
||||
}
|
||||
},
|
||||
"node_modules/pdfjs-dist": {
|
||||
"version": "5.4.296",
|
||||
"resolved": "https://registry.npmmirror.com/pdfjs-dist/-/pdfjs-dist-5.4.296.tgz",
|
||||
"integrity": "sha512-DlOzet0HO7OEnmUmB6wWGJrrdvbyJKftI1bhMitK7O2N8W2gc757yyYBbINy9IDafXAV9wmKr9t7xsTaNKRG5Q==",
|
||||
"license": "Apache-2.0",
|
||||
"engines": {
|
||||
"node": ">=20.16.0 || >=22.3.0"
|
||||
},
|
||||
"optionalDependencies": {
|
||||
"@napi-rs/canvas": "^0.1.80"
|
||||
}
|
||||
},
|
||||
"node_modules/process-nextick-args": {
|
||||
"version": "2.0.1",
|
||||
"resolved": "https://registry.npmmirror.com/process-nextick-args/-/process-nextick-args-2.0.1.tgz",
|
||||
|
||||
229
package-lock.json
generated
229
package-lock.json
generated
@@ -13,7 +13,193 @@
|
||||
"cors": "^2.8.5",
|
||||
"docx": "^9.5.1",
|
||||
"express": "^5.2.1",
|
||||
"iconv-lite": "^0.6.3"
|
||||
"iconv-lite": "^0.6.3",
|
||||
"nodemailer": "^7.0.11",
|
||||
"pdf-parse": "^2.4.5"
|
||||
}
|
||||
},
|
||||
"node_modules/@napi-rs/canvas": {
|
||||
"version": "0.1.80",
|
||||
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas/-/canvas-0.1.80.tgz",
|
||||
"integrity": "sha512-DxuT1ClnIPts1kQx8FBmkk4BQDTfI5kIzywAaMjQSXfNnra5UFU9PwurXrl+Je3bJ6BGsp/zmshVVFbCmyI+ww==",
|
||||
"license": "MIT",
|
||||
"workspaces": [
|
||||
"e2e/*"
|
||||
],
|
||||
"engines": {
|
||||
"node": ">= 10"
|
||||
},
|
||||
"optionalDependencies": {
|
||||
"@napi-rs/canvas-android-arm64": "0.1.80",
|
||||
"@napi-rs/canvas-darwin-arm64": "0.1.80",
|
||||
"@napi-rs/canvas-darwin-x64": "0.1.80",
|
||||
"@napi-rs/canvas-linux-arm-gnueabihf": "0.1.80",
|
||||
"@napi-rs/canvas-linux-arm64-gnu": "0.1.80",
|
||||
"@napi-rs/canvas-linux-arm64-musl": "0.1.80",
|
||||
"@napi-rs/canvas-linux-riscv64-gnu": "0.1.80",
|
||||
"@napi-rs/canvas-linux-x64-gnu": "0.1.80",
|
||||
"@napi-rs/canvas-linux-x64-musl": "0.1.80",
|
||||
"@napi-rs/canvas-win32-x64-msvc": "0.1.80"
|
||||
}
|
||||
},
|
||||
"node_modules/@napi-rs/canvas-android-arm64": {
|
||||
"version": "0.1.80",
|
||||
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-android-arm64/-/canvas-android-arm64-0.1.80.tgz",
|
||||
"integrity": "sha512-sk7xhN/MoXeuExlggf91pNziBxLPVUqF2CAVnB57KLG/pz7+U5TKG8eXdc3pm0d7Od0WreB6ZKLj37sX9muGOQ==",
|
||||
"cpu": [
|
||||
"arm64"
|
||||
],
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"os": [
|
||||
"android"
|
||||
],
|
||||
"engines": {
|
||||
"node": ">= 10"
|
||||
}
|
||||
},
|
||||
"node_modules/@napi-rs/canvas-darwin-arm64": {
|
||||
"version": "0.1.80",
|
||||
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-darwin-arm64/-/canvas-darwin-arm64-0.1.80.tgz",
|
||||
"integrity": "sha512-O64APRTXRUiAz0P8gErkfEr3lipLJgM6pjATwavZ22ebhjYl/SUbpgM0xcWPQBNMP1n29afAC/Us5PX1vg+JNQ==",
|
||||
"cpu": [
|
||||
"arm64"
|
||||
],
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"os": [
|
||||
"darwin"
|
||||
],
|
||||
"engines": {
|
||||
"node": ">= 10"
|
||||
}
|
||||
},
|
||||
"node_modules/@napi-rs/canvas-darwin-x64": {
|
||||
"version": "0.1.80",
|
||||
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-darwin-x64/-/canvas-darwin-x64-0.1.80.tgz",
|
||||
"integrity": "sha512-FqqSU7qFce0Cp3pwnTjVkKjjOtxMqRe6lmINxpIZYaZNnVI0H5FtsaraZJ36SiTHNjZlUB69/HhxNDT1Aaa9vA==",
|
||||
"cpu": [
|
||||
"x64"
|
||||
],
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"os": [
|
||||
"darwin"
|
||||
],
|
||||
"engines": {
|
||||
"node": ">= 10"
|
||||
}
|
||||
},
|
||||
"node_modules/@napi-rs/canvas-linux-arm-gnueabihf": {
|
||||
"version": "0.1.80",
|
||||
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-linux-arm-gnueabihf/-/canvas-linux-arm-gnueabihf-0.1.80.tgz",
|
||||
"integrity": "sha512-eyWz0ddBDQc7/JbAtY4OtZ5SpK8tR4JsCYEZjCE3dI8pqoWUC8oMwYSBGCYfsx2w47cQgQCgMVRVTFiiO38hHQ==",
|
||||
"cpu": [
|
||||
"arm"
|
||||
],
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"os": [
|
||||
"linux"
|
||||
],
|
||||
"engines": {
|
||||
"node": ">= 10"
|
||||
}
|
||||
},
|
||||
"node_modules/@napi-rs/canvas-linux-arm64-gnu": {
|
||||
"version": "0.1.80",
|
||||
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-linux-arm64-gnu/-/canvas-linux-arm64-gnu-0.1.80.tgz",
|
||||
"integrity": "sha512-qwA63t8A86bnxhuA/GwOkK3jvb+XTQaTiVML0vAWoHyoZYTjNs7BzoOONDgTnNtr8/yHrq64XXzUoLqDzU+Uuw==",
|
||||
"cpu": [
|
||||
"arm64"
|
||||
],
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"os": [
|
||||
"linux"
|
||||
],
|
||||
"engines": {
|
||||
"node": ">= 10"
|
||||
}
|
||||
},
|
||||
"node_modules/@napi-rs/canvas-linux-arm64-musl": {
|
||||
"version": "0.1.80",
|
||||
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-linux-arm64-musl/-/canvas-linux-arm64-musl-0.1.80.tgz",
|
||||
"integrity": "sha512-1XbCOz/ymhj24lFaIXtWnwv/6eFHXDrjP0jYkc6iHQ9q8oXKzUX1Lc6bu+wuGiLhGh2GS/2JlfORC5ZcXimRcg==",
|
||||
"cpu": [
|
||||
"arm64"
|
||||
],
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"os": [
|
||||
"linux"
|
||||
],
|
||||
"engines": {
|
||||
"node": ">= 10"
|
||||
}
|
||||
},
|
||||
"node_modules/@napi-rs/canvas-linux-riscv64-gnu": {
|
||||
"version": "0.1.80",
|
||||
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-linux-riscv64-gnu/-/canvas-linux-riscv64-gnu-0.1.80.tgz",
|
||||
"integrity": "sha512-XTzR125w5ZMs0lJcxRlS1K3P5RaZ9RmUsPtd1uGt+EfDyYMu4c6SEROYsxyatbbu/2+lPe7MPHOO/0a0x7L/gw==",
|
||||
"cpu": [
|
||||
"riscv64"
|
||||
],
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"os": [
|
||||
"linux"
|
||||
],
|
||||
"engines": {
|
||||
"node": ">= 10"
|
||||
}
|
||||
},
|
||||
"node_modules/@napi-rs/canvas-linux-x64-gnu": {
|
||||
"version": "0.1.80",
|
||||
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-linux-x64-gnu/-/canvas-linux-x64-gnu-0.1.80.tgz",
|
||||
"integrity": "sha512-BeXAmhKg1kX3UCrJsYbdQd3hIMDH/K6HnP/pG2LuITaXhXBiNdh//TVVVVCBbJzVQaV5gK/4ZOCMrQW9mvuTqA==",
|
||||
"cpu": [
|
||||
"x64"
|
||||
],
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"os": [
|
||||
"linux"
|
||||
],
|
||||
"engines": {
|
||||
"node": ">= 10"
|
||||
}
|
||||
},
|
||||
"node_modules/@napi-rs/canvas-linux-x64-musl": {
|
||||
"version": "0.1.80",
|
||||
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-linux-x64-musl/-/canvas-linux-x64-musl-0.1.80.tgz",
|
||||
"integrity": "sha512-x0XvZWdHbkgdgucJsRxprX/4o4sEed7qo9rCQA9ugiS9qE2QvP0RIiEugtZhfLH3cyI+jIRFJHV4Fuz+1BHHMg==",
|
||||
"cpu": [
|
||||
"x64"
|
||||
],
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"os": [
|
||||
"linux"
|
||||
],
|
||||
"engines": {
|
||||
"node": ">= 10"
|
||||
}
|
||||
},
|
||||
"node_modules/@napi-rs/canvas-win32-x64-msvc": {
|
||||
"version": "0.1.80",
|
||||
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-win32-x64-msvc/-/canvas-win32-x64-msvc-0.1.80.tgz",
|
||||
"integrity": "sha512-Z8jPsM6df5V8B1HrCHB05+bDiCxjE9QA//3YrkKIdVDEwn5RKaqOxCJDRJkl48cJbylcrJbW4HxZbTte8juuPg==",
|
||||
"cpu": [
|
||||
"x64"
|
||||
],
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"os": [
|
||||
"win32"
|
||||
],
|
||||
"engines": {
|
||||
"node": ">= 10"
|
||||
}
|
||||
},
|
||||
"node_modules/@types/node": {
|
||||
@@ -983,6 +1169,15 @@
|
||||
"node": ">= 0.6"
|
||||
}
|
||||
},
|
||||
"node_modules/nodemailer": {
|
||||
"version": "7.0.11",
|
||||
"resolved": "https://registry.npmmirror.com/nodemailer/-/nodemailer-7.0.11.tgz",
|
||||
"integrity": "sha512-gnXhNRE0FNhD7wPSCGhdNh46Hs6nm+uTyg+Kq0cZukNQiYdnCsoQjodNP9BQVG9XrcK/v6/MgpAPBUFyzh9pvw==",
|
||||
"license": "MIT-0",
|
||||
"engines": {
|
||||
"node": ">=6.0.0"
|
||||
}
|
||||
},
|
||||
"node_modules/nth-check": {
|
||||
"version": "2.1.1",
|
||||
"resolved": "https://registry.npmmirror.com/nth-check/-/nth-check-2.1.1.tgz",
|
||||
@@ -1111,6 +1306,38 @@
|
||||
"url": "https://opencollective.com/express"
|
||||
}
|
||||
},
|
||||
"node_modules/pdf-parse": {
|
||||
"version": "2.4.5",
|
||||
"resolved": "https://registry.npmmirror.com/pdf-parse/-/pdf-parse-2.4.5.tgz",
|
||||
"integrity": "sha512-mHU89HGh7v+4u2ubfnevJ03lmPgQ5WU4CxAVmTSh/sxVTEDYd1er/dKS/A6vg77NX47KTEoihq8jZBLr8Cxuwg==",
|
||||
"license": "Apache-2.0",
|
||||
"dependencies": {
|
||||
"@napi-rs/canvas": "0.1.80",
|
||||
"pdfjs-dist": "5.4.296"
|
||||
},
|
||||
"bin": {
|
||||
"pdf-parse": "bin/cli.mjs"
|
||||
},
|
||||
"engines": {
|
||||
"node": ">=20.16.0 <21 || >=22.3.0"
|
||||
},
|
||||
"funding": {
|
||||
"type": "github",
|
||||
"url": "https://github.com/sponsors/mehmet-kozan"
|
||||
}
|
||||
},
|
||||
"node_modules/pdfjs-dist": {
|
||||
"version": "5.4.296",
|
||||
"resolved": "https://registry.npmmirror.com/pdfjs-dist/-/pdfjs-dist-5.4.296.tgz",
|
||||
"integrity": "sha512-DlOzet0HO7OEnmUmB6wWGJrrdvbyJKftI1bhMitK7O2N8W2gc757yyYBbINy9IDafXAV9wmKr9t7xsTaNKRG5Q==",
|
||||
"license": "Apache-2.0",
|
||||
"engines": {
|
||||
"node": ">=20.16.0 || >=22.3.0"
|
||||
},
|
||||
"optionalDependencies": {
|
||||
"@napi-rs/canvas": "^0.1.80"
|
||||
}
|
||||
},
|
||||
"node_modules/process-nextick-args": {
|
||||
"version": "2.0.1",
|
||||
"resolved": "https://registry.npmmirror.com/process-nextick-args/-/process-nextick-args-2.0.1.tgz",
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
"name": "gjzx-scraper",
|
||||
"version": "1.0.0",
|
||||
"type": "module",
|
||||
"description": "工具:抓取 https://gjzx.nanjing.gov.cn/gggs/ 公示列表信息及详情",
|
||||
"description": "工具:采集 https://gjzx.nanjing.gov.cn/gggs/ 公示列表信息及详情",
|
||||
"main": "src/server.js",
|
||||
"scripts": {
|
||||
"start": "node src/server.js"
|
||||
@@ -13,6 +13,7 @@
|
||||
"cors": "^2.8.5",
|
||||
"docx": "^9.5.1",
|
||||
"express": "^5.2.1",
|
||||
"iconv-lite": "^0.6.3"
|
||||
"iconv-lite": "^0.6.3",
|
||||
"nodemailer": "^7.0.11"
|
||||
}
|
||||
}
|
||||
|
||||
205
public/app.js
205
public/app.js
@@ -134,11 +134,11 @@ async function fetchDetails() {
|
||||
|
||||
listData = await dateRangeResponse.json();
|
||||
} else {
|
||||
// 普通模式 - 按数量抓取多页
|
||||
// 普通模式 - 按数量采集多页
|
||||
const url = document.getElementById('detailUrl').value;
|
||||
const limit = parseInt(document.getElementById('detailLimit').value);
|
||||
|
||||
// 抓取多页直到获得足够数量
|
||||
// 采集多页直到获得足够数量
|
||||
const allItems = [];
|
||||
let page = 1;
|
||||
const maxPagesToFetch = Math.ceil(limit / 10) + 1; // 假设每页约10条
|
||||
@@ -177,7 +177,7 @@ async function fetchDetails() {
|
||||
return;
|
||||
}
|
||||
|
||||
// 抓取详情
|
||||
// 采集详情
|
||||
const limit = useDetailDateRange ? listData.data.length : parseInt(document.getElementById('detailLimit').value);
|
||||
const detailResponse = await fetch(`${API_BASE}/details`, {
|
||||
method: 'POST',
|
||||
@@ -202,7 +202,7 @@ async function fetchDetails() {
|
||||
function displayDetails(items, container) {
|
||||
const html = `
|
||||
<div style="max-height: 380px; overflow-y: auto;">
|
||||
<h3 style="margin-bottom: 15px;">抓取了 ${items.length} 条详情</h3>
|
||||
<h3 style="margin-bottom: 15px;">采集了 ${items.length} 条详情</h3>
|
||||
${items.map((item, index) => `
|
||||
<div class="list-item">
|
||||
<h3>${index + 1}. ${item.title}</h3>
|
||||
@@ -212,7 +212,7 @@ function displayDetails(items, container) {
|
||||
${item.detail.budget ? `
|
||||
<span class="budget">${item.detail.budget.amount}${item.detail.budget.unit}</span>
|
||||
` : '<div class="meta">未找到预算信息</div>'}
|
||||
` : '<div class="error">抓取失败</div>'}
|
||||
` : '<div class="error">采集失败</div>'}
|
||||
<br><a href="${item.href}" target="_blank">查看原文 →</a>
|
||||
</div>
|
||||
`).join('')}
|
||||
@@ -271,6 +271,7 @@ async function generateReport() {
|
||||
currentReport = data.data;
|
||||
displayReport(data.data, results);
|
||||
exportBtn.style.display = 'inline-block';
|
||||
document.getElementById('sendEmailBtn').style.display = 'inline-block';
|
||||
} else {
|
||||
results.innerHTML = `<div class="error">错误: ${data.error}</div>`;
|
||||
}
|
||||
@@ -475,3 +476,197 @@ async function exportReport() {
|
||||
document.body.removeChild(a);
|
||||
URL.revokeObjectURL(url);
|
||||
}
|
||||
|
||||
// ========== 邮件功能 ==========
|
||||
|
||||
// 页面加载时加载邮件配置
|
||||
document.addEventListener('DOMContentLoaded', function() {
|
||||
loadEmailConfig();
|
||||
});
|
||||
|
||||
// 保存邮件配置到localStorage
|
||||
function saveEmailConfig() {
|
||||
const config = {
|
||||
smtpHost: document.getElementById('smtpHost').value,
|
||||
smtpPort: parseInt(document.getElementById('smtpPort').value) || 587,
|
||||
smtpUser: document.getElementById('smtpUser').value,
|
||||
smtpPass: document.getElementById('smtpPass').value,
|
||||
recipients: document.getElementById('recipients').value
|
||||
};
|
||||
|
||||
// 验证配置
|
||||
if (!config.smtpHost || !config.smtpUser || !config.smtpPass || !config.recipients) {
|
||||
showEmailStatus('请填写所有必填项', 'error');
|
||||
return;
|
||||
}
|
||||
|
||||
// 保存到localStorage
|
||||
localStorage.setItem('emailConfig', JSON.stringify(config));
|
||||
showEmailStatus('邮件配置已保存', 'success');
|
||||
}
|
||||
|
||||
// 从localStorage加载邮件配置
|
||||
function loadEmailConfig() {
|
||||
const configStr = localStorage.getItem('emailConfig');
|
||||
if (configStr) {
|
||||
try {
|
||||
const config = JSON.parse(configStr);
|
||||
document.getElementById('smtpHost').value = config.smtpHost || '';
|
||||
document.getElementById('smtpPort').value = config.smtpPort || 587;
|
||||
document.getElementById('smtpUser').value = config.smtpUser || '';
|
||||
document.getElementById('smtpPass').value = config.smtpPass || '';
|
||||
document.getElementById('recipients').value = config.recipients || '';
|
||||
} catch (e) {
|
||||
console.error('加载邮件配置失败:', e);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// 测试邮件配置
|
||||
async function testEmailConfig() {
|
||||
const config = {
|
||||
smtpHost: document.getElementById('smtpHost').value,
|
||||
smtpPort: parseInt(document.getElementById('smtpPort').value) || 587,
|
||||
smtpUser: document.getElementById('smtpUser').value,
|
||||
smtpPass: document.getElementById('smtpPass').value,
|
||||
recipients: document.getElementById('recipients').value
|
||||
};
|
||||
|
||||
// 验证配置
|
||||
if (!config.smtpHost || !config.smtpUser || !config.smtpPass || !config.recipients) {
|
||||
showEmailStatus('请填写所有必填项', 'error');
|
||||
return;
|
||||
}
|
||||
|
||||
// 创建测试报告
|
||||
const testReport = {
|
||||
summary: {
|
||||
total_count: 1,
|
||||
filtered_count: 1,
|
||||
threshold: '50万元',
|
||||
total_amount: '100.00万元',
|
||||
generated_at: new Date().toISOString()
|
||||
},
|
||||
projects: [{
|
||||
title: '这是一封测试邮件',
|
||||
date: new Date().toLocaleDateString('zh-CN'),
|
||||
publish_time: new Date().toLocaleString('zh-CN'),
|
||||
budget: {
|
||||
amount: 100,
|
||||
unit: '万元',
|
||||
text: '测试金额',
|
||||
originalUnit: '万元'
|
||||
},
|
||||
url: 'https://gjzx.nanjing.gov.cn'
|
||||
}]
|
||||
};
|
||||
|
||||
showEmailStatus('正在发送测试邮件...', 'info');
|
||||
|
||||
try {
|
||||
const response = await fetch(`${API_BASE}/send-email`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
emailConfig: config,
|
||||
report: testReport
|
||||
})
|
||||
});
|
||||
|
||||
const data = await response.json();
|
||||
|
||||
if (data.success) {
|
||||
showEmailStatus('测试邮件发送成功!请检查收件箱', 'success');
|
||||
} else {
|
||||
showEmailStatus(`发送失败: ${data.error}`, 'error');
|
||||
}
|
||||
} catch (error) {
|
||||
showEmailStatus(`请求失败: ${error.message}`, 'error');
|
||||
}
|
||||
}
|
||||
|
||||
// 发送报告到邮箱
|
||||
async function sendReportByEmail() {
|
||||
if (!currentReport) {
|
||||
alert('请先生成报告');
|
||||
return;
|
||||
}
|
||||
|
||||
// 从localStorage加载邮件配置
|
||||
const configStr = localStorage.getItem('emailConfig');
|
||||
if (!configStr) {
|
||||
alert('请先在"邮件配置"标签页配置邮件服务器');
|
||||
return;
|
||||
}
|
||||
|
||||
let emailConfig;
|
||||
try {
|
||||
emailConfig = JSON.parse(configStr);
|
||||
} catch (e) {
|
||||
alert('邮件配置格式错误,请重新配置');
|
||||
return;
|
||||
}
|
||||
|
||||
// 验证配置
|
||||
if (!emailConfig.smtpHost || !emailConfig.smtpUser || !emailConfig.smtpPass || !emailConfig.recipients) {
|
||||
alert('邮件配置不完整,请在"邮件配置"标签页检查配置');
|
||||
return;
|
||||
}
|
||||
|
||||
const sendBtn = document.getElementById('sendEmailBtn');
|
||||
const originalText = sendBtn.textContent;
|
||||
sendBtn.disabled = true;
|
||||
sendBtn.textContent = '正在发送...';
|
||||
|
||||
try {
|
||||
const response = await fetch(`${API_BASE}/send-email`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
emailConfig: emailConfig,
|
||||
report: currentReport
|
||||
})
|
||||
});
|
||||
|
||||
const data = await response.json();
|
||||
|
||||
if (data.success) {
|
||||
alert('报告已成功发送到邮箱!');
|
||||
} else {
|
||||
alert(`发送失败: ${data.error}`);
|
||||
}
|
||||
} catch (error) {
|
||||
alert(`请求失败: ${error.message}`);
|
||||
} finally {
|
||||
sendBtn.disabled = false;
|
||||
sendBtn.textContent = originalText;
|
||||
}
|
||||
}
|
||||
|
||||
// 显示邮件配置状态
|
||||
function showEmailStatus(message, type) {
|
||||
const statusDiv = document.getElementById('emailConfigStatus');
|
||||
const bgColors = {
|
||||
success: '#d4edda',
|
||||
error: '#f8d7da',
|
||||
info: '#d1ecf1'
|
||||
};
|
||||
const textColors = {
|
||||
success: '#155724',
|
||||
error: '#721c24',
|
||||
info: '#0c5460'
|
||||
};
|
||||
|
||||
statusDiv.innerHTML = `
|
||||
<div style="background: ${bgColors[type]}; color: ${textColors[type]}; padding: 15px; border-radius: 8px;">
|
||||
${message}
|
||||
</div>
|
||||
`;
|
||||
|
||||
// 3秒后自动隐藏成功消息
|
||||
if (type === 'success') {
|
||||
setTimeout(() => {
|
||||
statusDiv.innerHTML = '';
|
||||
}, 3000);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -3,7 +3,7 @@
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>南京公共工程建设中心 - 公告抓取工具</title>
|
||||
<title>南京公共工程建设中心 - 公告采集工具</title>
|
||||
<style>
|
||||
* {
|
||||
margin: 0;
|
||||
@@ -335,13 +335,14 @@
|
||||
<div class="container">
|
||||
<div class="header">
|
||||
<h1>南京公共工程建设中心</h1>
|
||||
<p>公告抓取与分析工具</p>
|
||||
<p>公告采集与分析工具</p>
|
||||
</div>
|
||||
|
||||
<div class="tabs">
|
||||
<button class="tab active" onclick="switchTab('list')">公告列表</button>
|
||||
<button class="tab" onclick="switchTab('detail')">详情抓取</button>
|
||||
<button class="tab" onclick="switchTab('detail')">详情采集</button>
|
||||
<button class="tab" onclick="switchTab('report')">生成报告</button>
|
||||
<button class="tab" onclick="switchTab('email')">邮件配置</button>
|
||||
</div>
|
||||
|
||||
<div class="content">
|
||||
@@ -359,7 +360,7 @@
|
||||
|
||||
<div id="listLoading" class="loading">
|
||||
<div class="spinner"></div>
|
||||
<p>正在抓取...</p>
|
||||
<p>正在采集...</p>
|
||||
</div>
|
||||
|
||||
<div id="listResults" class="results"></div>
|
||||
@@ -372,12 +373,12 @@
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- 详情抓取 -->
|
||||
<!-- 详情采集 -->
|
||||
<div id="detail" class="tab-content">
|
||||
<div class="form-group">
|
||||
<div class="checkbox-wrapper" onclick="document.getElementById('useDetailDateRange').click();">
|
||||
<input type="checkbox" id="useDetailDateRange" onchange="toggleDetailDateRange()" onclick="event.stopPropagation();">
|
||||
<label for="useDetailDateRange">按时间范围抓取</label>
|
||||
<label for="useDetailDateRange">按时间范围采集</label>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
@@ -391,7 +392,7 @@
|
||||
<input type="date" id="detailEndDate">
|
||||
</div>
|
||||
<div class="form-group">
|
||||
<label>最大抓取页数</label>
|
||||
<label>最大采集页数</label>
|
||||
<input type="number" id="detailMaxPages" value="1" min="1">
|
||||
</div>
|
||||
</div>
|
||||
@@ -402,16 +403,16 @@
|
||||
<input type="text" id="detailUrl" placeholder="默认: https://gjzx.nanjing.gov.cn/gggs/">
|
||||
</div>
|
||||
<div class="form-group">
|
||||
<label>抓取数量</label>
|
||||
<label>采集数量</label>
|
||||
<input type="number" id="detailLimit" value="5" min="1" max="50">
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<button class="btn" onclick="fetchDetails()">开始抓取</button>
|
||||
<button class="btn" onclick="fetchDetails()">开始采集</button>
|
||||
|
||||
<div id="detailLoading" class="loading">
|
||||
<div class="spinner"></div>
|
||||
<p>正在抓取详情...</p>
|
||||
<p>正在采集详情...</p>
|
||||
</div>
|
||||
|
||||
<div id="detailResults" class="results"></div>
|
||||
@@ -422,7 +423,7 @@
|
||||
<div class="form-group">
|
||||
<div class="checkbox-wrapper" onclick="document.getElementById('useDateRange').click();">
|
||||
<input type="checkbox" id="useDateRange" onchange="toggleDateRange()" onclick="event.stopPropagation();">
|
||||
<label for="useDateRange">按时间范围抓取</label>
|
||||
<label for="useDateRange">按时间范围采集</label>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
@@ -436,7 +437,7 @@
|
||||
<input type="date" id="endDate">
|
||||
</div>
|
||||
<div class="form-group">
|
||||
<label>最大抓取页数</label>
|
||||
<label>最大采集页数</label>
|
||||
<input type="number" id="maxPages" value="1" min="1" >
|
||||
</div>
|
||||
</div>
|
||||
@@ -447,7 +448,7 @@
|
||||
<input type="text" id="reportUrl" placeholder="默认: https://gjzx.nanjing.gov.cn/gggs/">
|
||||
</div>
|
||||
<div class="form-group">
|
||||
<label>抓取数量</label>
|
||||
<label>采集数量</label>
|
||||
<input type="number" id="reportLimit" value="15" min="1" max="50">
|
||||
</div>
|
||||
</div>
|
||||
@@ -459,6 +460,7 @@
|
||||
|
||||
<button class="btn" onclick="generateReport()">生成报告</button>
|
||||
<button class="btn export-btn" onclick="exportReport()" id="exportBtn" style="display:none;">导出Word</button>
|
||||
<button class="btn" onclick="sendReportByEmail()" id="sendEmailBtn" style="display:none; background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%);">发送邮件</button>
|
||||
|
||||
<div id="reportLoading" class="loading">
|
||||
<div class="spinner"></div>
|
||||
@@ -467,6 +469,56 @@
|
||||
|
||||
<div id="reportResults" class="results"></div>
|
||||
</div>
|
||||
|
||||
<!-- 邮件配置 -->
|
||||
<div id="email" class="tab-content">
|
||||
<h2 style="margin-bottom: 20px; color: #667eea;">邮件配置</h2>
|
||||
<p style="color: #666; margin-bottom: 20px;">配置SMTP邮件服务器信息,用于发送报告到指定邮箱</p>
|
||||
|
||||
<div class="form-group">
|
||||
<label>SMTP服务器地址 *</label>
|
||||
<input type="text" id="smtpHost" placeholder="例如: smtp.qq.com, smtp.163.com, smtp.gmail.com">
|
||||
</div>
|
||||
|
||||
<div class="form-group">
|
||||
<label>SMTP端口 *</label>
|
||||
<input type="number" id="smtpPort" value="587" placeholder="通常为 587 (TLS) 或 465 (SSL)">
|
||||
</div>
|
||||
|
||||
<div class="form-group">
|
||||
<label>发件人邮箱 (SMTP用户名) *</label>
|
||||
<input type="email" id="smtpUser" placeholder="your-email@example.com">
|
||||
</div>
|
||||
|
||||
<div class="form-group">
|
||||
<label>SMTP密码/授权码 *</label>
|
||||
<input type="password" id="smtpPass" placeholder="邮箱密码或授权码">
|
||||
</div>
|
||||
|
||||
<div class="form-group">
|
||||
<label>收件人邮箱 (多个用逗号分隔) *</label>
|
||||
<input type="text" id="recipients" placeholder="email1@example.com, email2@example.com">
|
||||
</div>
|
||||
|
||||
<button class="btn" onclick="saveEmailConfig()">保存配置</button>
|
||||
<button class="btn" onclick="testEmailConfig()" style="background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%);">测试连接</button>
|
||||
|
||||
<div id="emailConfigStatus" style="margin-top: 20px;"></div>
|
||||
|
||||
<div style="margin-top: 30px; padding: 20px; background: #f0f8ff; border-radius: 8px; border-left: 4px solid #667eea;">
|
||||
<h3 style="margin-top: 0; color: #667eea;">常用邮箱配置参考</h3>
|
||||
<ul style="line-height: 1.8; color: #666;">
|
||||
<li><strong>QQ邮箱:</strong> smtp.qq.com, 端口 587 或 465, 需要使用授权码</li>
|
||||
<li><strong>163邮箱:</strong> smtp.163.com, 端口 465 或 25, 需要使用授权码</li>
|
||||
<li><strong>Gmail:</strong> smtp.gmail.com, 端口 587 或 465, 需要开启"允许不够安全的应用"</li>
|
||||
<li><strong>Outlook:</strong> smtp-mail.outlook.com, 端口 587</li>
|
||||
<li><strong>企业邮箱:</strong> 请咨询您的IT管理员获取SMTP配置</li>
|
||||
</ul>
|
||||
<p style="margin: 10px 0 0 0; color: #999; font-size: 13px;">
|
||||
提示: QQ和163邮箱需要在邮箱设置中开启SMTP服务并生成授权码,授权码不是邮箱密码。
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
213
src/emailService.js
Normal file
213
src/emailService.js
Normal file
@@ -0,0 +1,213 @@
|
||||
import nodemailer from 'nodemailer';
|
||||
|
||||
// 创建邮件发送服务
|
||||
export async function sendReportEmail(emailConfig, report) {
|
||||
try {
|
||||
// 创建SMTP传输器
|
||||
const transporter = nodemailer.createTransport({
|
||||
host: emailConfig.smtpHost,
|
||||
port: emailConfig.smtpPort || 587,
|
||||
secure: emailConfig.smtpPort === 465, // true for 465, false for other ports
|
||||
auth: {
|
||||
user: emailConfig.smtpUser,
|
||||
pass: emailConfig.smtpPass,
|
||||
},
|
||||
});
|
||||
|
||||
// 生成HTML格式的报告内容
|
||||
const htmlContent = generateReportHtml(report);
|
||||
|
||||
// 发送邮件
|
||||
const info = await transporter.sendMail({
|
||||
from: `"公告采集系统" <${emailConfig.smtpUser}>`,
|
||||
to: emailConfig.recipients,
|
||||
subject: `采购公告分析报告 - ${new Date().toLocaleDateString('zh-CN')}`,
|
||||
html: htmlContent,
|
||||
});
|
||||
|
||||
return {
|
||||
success: true,
|
||||
messageId: info.messageId,
|
||||
};
|
||||
} catch (error) {
|
||||
console.error('发送邮件失败:', error);
|
||||
throw new Error(`邮件发送失败: ${error.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
// 生成HTML格式的报告
|
||||
function generateReportHtml(report) {
|
||||
const { summary, projects } = report;
|
||||
|
||||
return `
|
||||
<!DOCTYPE html>
|
||||
<html lang="zh-CN">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>采购公告分析报告</title>
|
||||
<style>
|
||||
body {
|
||||
font-family: 'PingFang SC', 'Microsoft YaHei', Arial, sans-serif;
|
||||
line-height: 1.6;
|
||||
color: #333;
|
||||
max-width: 900px;
|
||||
margin: 0 auto;
|
||||
padding: 20px;
|
||||
background-color: #f5f5f5;
|
||||
}
|
||||
.container {
|
||||
background: white;
|
||||
border-radius: 8px;
|
||||
padding: 30px;
|
||||
box-shadow: 0 2px 10px rgba(0,0,0,0.1);
|
||||
}
|
||||
h1 {
|
||||
color: #667eea;
|
||||
border-bottom: 3px solid #667eea;
|
||||
padding-bottom: 10px;
|
||||
margin-bottom: 20px;
|
||||
}
|
||||
.summary {
|
||||
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
||||
color: white;
|
||||
padding: 20px;
|
||||
border-radius: 8px;
|
||||
margin-bottom: 30px;
|
||||
}
|
||||
.summary h2 {
|
||||
margin-top: 0;
|
||||
margin-bottom: 15px;
|
||||
font-size: 18px;
|
||||
}
|
||||
.stat-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
|
||||
gap: 15px;
|
||||
}
|
||||
.stat {
|
||||
background: rgba(255,255,255,0.15);
|
||||
padding: 12px;
|
||||
border-radius: 6px;
|
||||
}
|
||||
.stat-label {
|
||||
font-size: 13px;
|
||||
opacity: 0.9;
|
||||
margin-bottom: 5px;
|
||||
}
|
||||
.stat-value {
|
||||
font-size: 22px;
|
||||
font-weight: bold;
|
||||
}
|
||||
.project-list {
|
||||
margin-top: 20px;
|
||||
}
|
||||
.project-item {
|
||||
background: #f9f9f9;
|
||||
border-left: 4px solid #667eea;
|
||||
padding: 15px;
|
||||
margin-bottom: 15px;
|
||||
border-radius: 4px;
|
||||
}
|
||||
.project-item h3 {
|
||||
color: #333;
|
||||
margin: 0 0 10px 0;
|
||||
font-size: 16px;
|
||||
}
|
||||
.project-meta {
|
||||
color: #666;
|
||||
font-size: 14px;
|
||||
margin: 5px 0;
|
||||
}
|
||||
.budget {
|
||||
display: inline-block;
|
||||
background: #667eea;
|
||||
color: white;
|
||||
padding: 4px 12px;
|
||||
border-radius: 4px;
|
||||
font-weight: bold;
|
||||
margin-top: 8px;
|
||||
font-size: 14px;
|
||||
}
|
||||
.project-link {
|
||||
color: #667eea;
|
||||
text-decoration: none;
|
||||
font-size: 13px;
|
||||
word-break: break-all;
|
||||
}
|
||||
.footer {
|
||||
margin-top: 30px;
|
||||
padding-top: 20px;
|
||||
border-top: 1px solid #e0e0e0;
|
||||
color: #999;
|
||||
font-size: 12px;
|
||||
text-align: center;
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div class="container">
|
||||
<h1>南京公共工程建设中心 - 采购公告分析报告</h1>
|
||||
|
||||
<div class="summary">
|
||||
<h2>报告摘要</h2>
|
||||
<div class="stat-grid">
|
||||
<div class="stat">
|
||||
<div class="stat-label">总公告数量</div>
|
||||
<div class="stat-value">${summary.total_count} 条</div>
|
||||
</div>
|
||||
<div class="stat">
|
||||
<div class="stat-label">符合条件</div>
|
||||
<div class="stat-value">${summary.filtered_count} 条</div>
|
||||
</div>
|
||||
<div class="stat">
|
||||
<div class="stat-label">金额阈值</div>
|
||||
<div class="stat-value">${summary.threshold}</div>
|
||||
</div>
|
||||
<div class="stat">
|
||||
<div class="stat-label">总金额</div>
|
||||
<div class="stat-value">${summary.total_amount}</div>
|
||||
</div>
|
||||
</div>
|
||||
${summary.date_range ? `
|
||||
<div style="margin-top: 15px; padding-top: 15px; border-top: 1px solid rgba(255,255,255,0.2);">
|
||||
<div class="stat-label">时间范围</div>
|
||||
<div style="font-size: 14px; margin-top: 5px;">
|
||||
${summary.date_range.startDate || '不限'} 至 ${summary.date_range.endDate || '不限'}
|
||||
</div>
|
||||
</div>
|
||||
` : ''}
|
||||
</div>
|
||||
|
||||
<h2>项目详情</h2>
|
||||
<div class="project-list">
|
||||
${projects.length === 0 ? '<p style="color: #999; text-align: center; padding: 20px;">暂无符合条件的项目</p>' : ''}
|
||||
${projects.map((project, index) => `
|
||||
<div class="project-item">
|
||||
<h3>${index + 1}. ${project.title}</h3>
|
||||
<div class="project-meta">
|
||||
<strong>发布日期:</strong> ${project.date}
|
||||
${project.publish_time ? ` | <strong>发布时间:</strong> ${project.publish_time}` : ''}
|
||||
</div>
|
||||
${project.budget ? `
|
||||
<div class="budget">
|
||||
预算金额: ${project.budget.amount.toFixed(2)} ${project.budget.unit}
|
||||
${project.budget.originalUnit !== project.budget.unit ? ` (原始: ${project.budget.originalUnit})` : ''}
|
||||
</div>
|
||||
` : ''}
|
||||
<div style="margin-top: 10px;">
|
||||
<a href="${project.url}" class="project-link" target="_blank">${project.url}</a>
|
||||
</div>
|
||||
</div>
|
||||
`).join('')}
|
||||
</div>
|
||||
|
||||
<div class="footer">
|
||||
<p>报告生成时间: ${new Date(summary.generated_at).toLocaleString('zh-CN')}</p>
|
||||
<p>本报告由公告采集系统自动生成</p>
|
||||
</div>
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
`;
|
||||
}
|
||||
@@ -3,6 +3,7 @@ import cors from 'cors';
|
||||
import axios from 'axios';
|
||||
import * as cheerio from 'cheerio';
|
||||
import iconv from 'iconv-lite';
|
||||
import { sendReportEmail } from './emailService.js';
|
||||
|
||||
const app = express();
|
||||
const PORT = 3000;
|
||||
@@ -33,24 +34,24 @@ function isDateInRange(dateStr, startDate, endDate) {
|
||||
return true;
|
||||
}
|
||||
|
||||
// 按时间范围抓取多页列表
|
||||
// 按时间范围采集多页列表
|
||||
async function fetchListByDateRange(startDate, endDate, maxPages = 23) {
|
||||
const allItems = [];
|
||||
let shouldContinue = true;
|
||||
let pageIndex = 0;
|
||||
|
||||
console.log(`开始按时间范围抓取: ${startDate || '不限'} 至 ${endDate || '不限'}`);
|
||||
console.log(`开始按时间范围采集: ${startDate || '不限'} 至 ${endDate || '不限'}`);
|
||||
|
||||
while (shouldContinue && pageIndex < maxPages) {
|
||||
const pageUrl = getPageUrl(pageIndex);
|
||||
console.log(`正在抓取第 ${pageIndex + 1} 页: ${pageUrl}`);
|
||||
console.log(`正在采集第 ${pageIndex + 1} 页: ${pageUrl}`);
|
||||
|
||||
try {
|
||||
const html = await fetchHtml(pageUrl);
|
||||
const items = parseList(html);
|
||||
|
||||
if (items.length === 0) {
|
||||
console.log(`第 ${pageIndex + 1} 页没有数据,停止抓取`);
|
||||
console.log(`第 ${pageIndex + 1} 页没有数据,停止采集`);
|
||||
break;
|
||||
}
|
||||
|
||||
@@ -70,7 +71,7 @@ async function fetchListByDateRange(startDate, endDate, maxPages = 23) {
|
||||
}
|
||||
|
||||
if (allItemsBeforeRange && startDate) {
|
||||
console.log(`第 ${pageIndex + 1} 页所有项目都早于起始日期,停止抓取`);
|
||||
console.log(`第 ${pageIndex + 1} 页所有项目都早于起始日期,停止采集`);
|
||||
shouldContinue = false;
|
||||
}
|
||||
|
||||
@@ -82,12 +83,12 @@ async function fetchListByDateRange(startDate, endDate, maxPages = 23) {
|
||||
await new Promise(resolve => setTimeout(resolve, 500));
|
||||
}
|
||||
} catch (err) {
|
||||
console.error(`抓取第 ${pageIndex + 1} 页失败: ${err.message}`);
|
||||
console.error(`采集第 ${pageIndex + 1} 页失败: ${err.message}`);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`总共抓取了 ${pageIndex} 页,找到 ${allItems.length} 条符合条件的公告`);
|
||||
console.log(`总共采集了 ${pageIndex} 页,找到 ${allItems.length} 条符合条件的公告`);
|
||||
return allItems;
|
||||
}
|
||||
|
||||
@@ -207,6 +208,10 @@ function parseDetail(html) {
|
||||
}
|
||||
|
||||
function extractBudget(content) {
|
||||
// 预处理内容:去除数字之间的换行符和空白字符
|
||||
// 这样可以匹配被换行符分隔的数字,例如 "1\n1\n0\n9\n0\n0" -> "110900"
|
||||
let cleanedContent = content.replace(/(\d)\s*[\n\r]\s*(?=\d)/g, '$1');
|
||||
|
||||
// 直接定义金额匹配模式(从高优先级到低优先级)
|
||||
const patterns = [
|
||||
// 优先级1: 带货币符号的万元
|
||||
@@ -230,7 +235,7 @@ function extractBudget(content) {
|
||||
|
||||
// 遍历所有模式,找到优先级最高的匹配
|
||||
for (const pattern of patterns) {
|
||||
const match = content.match(pattern.regex);
|
||||
const match = cleanedContent.match(pattern.regex);
|
||||
if (match && pattern.priority < bestPriority) {
|
||||
// 清理数字中的逗号并转换
|
||||
const numberStr = match[1].replace(/[,,]/g, '');
|
||||
@@ -329,21 +334,21 @@ app.post('/api/report', async (req, res) => {
|
||||
const { limit = 15, threshold = 50, url } = req.body;
|
||||
const targetUrl = url && url.trim() !== '' ? url : BASE_URL;
|
||||
|
||||
// 按需抓取多页以获取足够的数据
|
||||
// 按需采集多页以获取足够的数据
|
||||
const items = [];
|
||||
let pageIndex = 0;
|
||||
const maxPagesToFetch = Math.ceil(limit / 10) + 1; // 假设每页约10条,多抓一页保险
|
||||
|
||||
while (items.length < limit && pageIndex < maxPagesToFetch) {
|
||||
const pageUrl = getPageUrl(pageIndex, targetUrl);
|
||||
console.log(`正在抓取第 ${pageIndex + 1} 页: ${pageUrl}`);
|
||||
console.log(`正在采集第 ${pageIndex + 1} 页: ${pageUrl}`);
|
||||
|
||||
try {
|
||||
const html = await fetchHtml(pageUrl);
|
||||
const pageItems = parseList(html);
|
||||
|
||||
if (pageItems.length === 0) {
|
||||
console.log(`第 ${pageIndex + 1} 页没有数据,停止抓取`);
|
||||
console.log(`第 ${pageIndex + 1} 页没有数据,停止采集`);
|
||||
break;
|
||||
}
|
||||
|
||||
@@ -354,7 +359,7 @@ app.post('/api/report', async (req, res) => {
|
||||
await new Promise(resolve => setTimeout(resolve, 500));
|
||||
}
|
||||
} catch (err) {
|
||||
console.error(`抓取第 ${pageIndex + 1} 页失败: ${err.message}`);
|
||||
console.error(`采集第 ${pageIndex + 1} 页失败: ${err.message}`);
|
||||
break;
|
||||
}
|
||||
}
|
||||
@@ -417,7 +422,7 @@ app.post('/api/report-daterange', async (req, res) => {
|
||||
try {
|
||||
const { startDate, endDate, threshold = 50, maxPages = 23 } = req.body;
|
||||
|
||||
// 按时间范围抓取列表
|
||||
// 按时间范围采集列表
|
||||
const items = await fetchListByDateRange(startDate, endDate, maxPages);
|
||||
|
||||
if (items.length === 0) {
|
||||
@@ -437,7 +442,7 @@ app.post('/api/report-daterange', async (req, res) => {
|
||||
});
|
||||
}
|
||||
|
||||
// 抓取详情
|
||||
// 采集详情
|
||||
const results = [];
|
||||
for (const item of items) {
|
||||
try {
|
||||
@@ -491,6 +496,50 @@ app.post('/api/report-daterange', async (req, res) => {
|
||||
}
|
||||
});
|
||||
|
||||
// 发送报告邮件
|
||||
app.post('/api/send-email', async (req, res) => {
|
||||
try {
|
||||
const { emailConfig, report } = req.body;
|
||||
|
||||
// 验证必需的配置参数
|
||||
if (!emailConfig || !emailConfig.smtpHost || !emailConfig.smtpUser || !emailConfig.smtpPass) {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: '邮件配置不完整,请填写SMTP服务器、用户名和密码',
|
||||
});
|
||||
}
|
||||
|
||||
if (!emailConfig.recipients || emailConfig.recipients.trim() === '') {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: '请至少指定一个收件人',
|
||||
});
|
||||
}
|
||||
|
||||
if (!report) {
|
||||
return res.status(400).json({
|
||||
success: false,
|
||||
error: '没有可发送的报告数据',
|
||||
});
|
||||
}
|
||||
|
||||
// 发送邮件
|
||||
const result = await sendReportEmail(emailConfig, report);
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
message: '邮件发送成功',
|
||||
messageId: result.messageId,
|
||||
});
|
||||
} catch (error) {
|
||||
console.error('发送邮件API错误:', error);
|
||||
res.status(500).json({
|
||||
success: false,
|
||||
error: error.message,
|
||||
});
|
||||
}
|
||||
});
|
||||
|
||||
app.listen(PORT, () => {
|
||||
console.log(`Server running at http://localhost:${PORT}`);
|
||||
});
|
||||
|
||||
Reference in New Issue
Block a user