feat(readme): 对部分文本进行格式调整,包括金额数字空格分隔、API 参数说明优化、标题层级对齐等,提升可读性。
```
This commit is contained in:
2025-12-15 10:36:18 +08:00
parent 745faa0ecc
commit b044e918aa
9 changed files with 949 additions and 80 deletions

33
.gitignore vendored Normal file
View File

@@ -0,0 +1,33 @@
# 依赖目录
node_modules/
# 日志文件
npm-debug.log*
yarn-debug.log*
yarn-error.log*
pnpm-debug.log*
# 环境变量文件
.env
.env.local
.env.*.local
# 编辑器目录和文件
.vscode/
.idea/
*.swp
*.swo
*~
# 操作系统文件
.DS_Store
Thumbs.db
# 构建输出
dist/
build/
*.log
# 临时文件
*.tmp
.cache/

108
README.md
View File

@@ -1,17 +1,17 @@
# 南京公共工程建设中心 - 公告抓取工具
# 南京公共工程建设中心 - 公告采集工具
一个用于抓取南京公共工程建设中心公告信息的 Web 可视化工具。
一个用于采集南京公共工程建设中心公告信息的 Web 可视化工具。
## 功能特性
-抓取公告列表(支持分页)
- ✅ 按时间范围智能抓取
-抓取公告详情内容
-采集公告列表(支持分页)
- ✅ 按时间范围智能采集
-采集公告详情内容
- ✅ 智能提取预算金额
- ✅ 生成统计报告
- ✅ Web可视化界面
- ✅ 导出Word/Markdown报告
- ✅ RESTful API支持
- ✅ Web 可视化界面
- ✅ 导出 Word/Markdown 报告
- ✅ RESTful API 支持
## 安装
@@ -34,21 +34,24 @@ npm start
### 3. 功能介绍
**公告列表标签**
- 快速查看所有公告
- 支持分页浏览
- 一键获取最新公告列表
**详情抓取标签**
- 批量抓取公告详情
- 支持按时间范围抓取
**详情采集标签**
- 批量采集公告详情
- 支持按时间范围采集
- 自动提取预算金额
- 可自定义抓取数量
- 可自定义采集数量
**生成报告标签**
- 支持按时间范围生成报告
- 设置金额阈值筛选项目
- 实时统计项目信息
- 一键导出Word/Markdown报告
- 一键导出 Word/Markdown 报告
## 报告示例
@@ -60,8 +63,8 @@ npm start
## 统计摘要
- 总项目数: 10
- 超过50万元的项目: 3
- 总金额: 5395.50万元
- 超过 50 万元的项目: 3
- 总金额: 5395.50 万元
## 项目列表
@@ -69,7 +72,7 @@ npm start
- **发布日期**: 2025-12-12
- **发布时间**: 2025-12-12 10:35:00
- **预算金额**: 5000万元
- **预算金额**: 5000 万元
- **链接**: https://...
```
@@ -78,14 +81,18 @@ npm start
服务器启动后提供以下 RESTful API 接口:
### 1. 获取公告列表
```
GET /api/list?url=<列表页URL>&page=<页码>
```
参数:
- `url` (可选): 列表页URL,默认为官网首页
- `page` (可选): 页码,默认为1
- `url` (可选): 列表页 URL,默认为官网首页
- `page` (可选): 页码,默认为 1
### 2. 按时间范围获取列表
```
POST /api/list-daterange
Content-Type: application/json
@@ -98,6 +105,7 @@ Content-Type: application/json
```
### 3. 批量获取详情
```
POST /api/details
Content-Type: application/json
@@ -109,6 +117,7 @@ Content-Type: application/json
```
### 4. 生成报告
```
POST /api/report
Content-Type: application/json
@@ -121,6 +130,7 @@ Content-Type: application/json
```
### 5. 按时间范围生成报告
```
POST /api/report-daterange
Content-Type: application/json
@@ -137,8 +147,8 @@ Content-Type: application/json
- **后端**: Node.js + Express
- **爬虫**: Axios + Cheerio
- **前端**: 原生HTML/CSS/JavaScript
- **编码处理**: iconv-lite (支持GBK/UTF-8)
- **前端**: 原生 HTML/CSS/JavaScript
- **编码处理**: iconv-lite (支持 GBK/UTF-8)
- **文档导出**: docx.js
## 项目结构
@@ -156,61 +166,69 @@ Content-Type: application/json
## 注意事项
1. 抓取速度已限制为每条延迟500ms-1s,避免请求过快
1. 采集速度已限制为每条延迟 500ms-1s,避免请求过快
2. 仅支持 gjzx.nanjing.gov.cn 域名的详情页解析
3. 金额提取基于正则匹配,支持多种格式(预算金额、最高限价等)
4. Web服务器默认端口3000,可在 server.js 中修改
5. 按时间范围抓取会在检测到所有公告早于起始日期时自动停止
6. 编码自动识别,支持GBKUTF-8网页
4. Web 服务器默认端口 3000,可在 server.js 中修改
5. 按时间范围采集会在检测到所有公告早于起始日期时自动停止
6. 编码自动识别,支持 GBKUTF-8 网页
## 核心功能说明
### 时间范围抓取逻辑
### 时间范围采集逻辑
按时间范围抓取时,程序会:
1. 从第一页开始顺序抓取
按时间范围采集时,程序会:
1. 从第一页开始顺序采集
2. 检查每页公告的日期是否在指定范围内
3. 如果某页所有公告都早于起始日期,自动停止抓取
4. 支持设置最大页数限制,避免过度抓取
3. 如果某页所有公告都早于起始日期,自动停止采集
4. 支持设置最大页数限制,避免过度采集
### 金额提取规则
支持识别以下格式:
- 预算金额: XX万元
- 最高限价: XX万元
- 预算: XX万元
- 金额: XX万元
- 直接数字: XX万元
- 预算金额: XX 万元
- 最高限价: XX 万元
- 预算: XX 万元
- 金额: XX 万元
- 直接数字: XX 万元
### 编码处理
自动识别网页编码:
- 优先读取 Content-Type 中的 charset
- 自动处理 GBK、GB2312 编码
- 默认使用 UTF-8
## 常见问题
### Q: 为什么抓取速度比较慢?
A: 为了避免对服务器造成过大压力,程序限制了请求频率(每条延迟500ms-1s)。这是一个负责任的爬虫设计。
### Q: 为什么采集速度比较慢?
### Q: 如何抓取指定日期范围的公告?
A: 在Web界面的"详情抓取"和"生成报告"标签中勾选"按时间范围抓取",然后输入起始和结束日期即可。
A: 为了避免对服务器造成过大压力,程序限制了请求频率(每条延迟 500ms-1s)。这是一个负责任的爬虫设计。
### Q: 如何采集指定日期范围的公告?
A: 在 Web 界面的"详情采集"和"生成报告"标签中勾选"按时间范围采集",然后输入起始和结束日期即可。
### Q: 导出的报告在哪里?
A: 点击"导出Word"或"导出Markdown"按钮后会自动下载到浏览器的默认下载目录。
### Q: 可以抓取其他网站吗?
A: 需要修改 server.js 中的 BASE_URL 和相应的解析函数,因为不同网站的HTML结构不同。
A: 点击"导出 Word"或"导出 Markdown"按钮后会自动下载到浏览器的默认下载目录。
### Q: 可以采集其他网站吗?
A: 需要修改 server.js 中的 BASE_URL 和相应的解析函数,因为不同网站的 HTML 结构不同。
## 更新日志
### v1.0.0 (2025-12-12)
- Web可视化界面
- 支持按时间范围抓取
- Web 可视化界面
- 支持按时间范围采集
- 支持分页浏览
- 支持导出Word/Markdown报告
- RESTful API接口
- 支持导出 Word/Markdown 报告
- RESTful API 接口
- 自动编码识别
- 智能金额提取

81
node_modules/.package-lock.json generated vendored
View File

@@ -4,6 +4,46 @@
"lockfileVersion": 3,
"requires": true,
"packages": {
"node_modules/@napi-rs/canvas": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas/-/canvas-0.1.80.tgz",
"integrity": "sha512-DxuT1ClnIPts1kQx8FBmkk4BQDTfI5kIzywAaMjQSXfNnra5UFU9PwurXrl+Je3bJ6BGsp/zmshVVFbCmyI+ww==",
"license": "MIT",
"workspaces": [
"e2e/*"
],
"engines": {
"node": ">= 10"
},
"optionalDependencies": {
"@napi-rs/canvas-android-arm64": "0.1.80",
"@napi-rs/canvas-darwin-arm64": "0.1.80",
"@napi-rs/canvas-darwin-x64": "0.1.80",
"@napi-rs/canvas-linux-arm-gnueabihf": "0.1.80",
"@napi-rs/canvas-linux-arm64-gnu": "0.1.80",
"@napi-rs/canvas-linux-arm64-musl": "0.1.80",
"@napi-rs/canvas-linux-riscv64-gnu": "0.1.80",
"@napi-rs/canvas-linux-x64-gnu": "0.1.80",
"@napi-rs/canvas-linux-x64-musl": "0.1.80",
"@napi-rs/canvas-win32-x64-msvc": "0.1.80"
}
},
"node_modules/@napi-rs/canvas-win32-x64-msvc": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-win32-x64-msvc/-/canvas-win32-x64-msvc-0.1.80.tgz",
"integrity": "sha512-Z8jPsM6df5V8B1HrCHB05+bDiCxjE9QA//3YrkKIdVDEwn5RKaqOxCJDRJkl48cJbylcrJbW4HxZbTte8juuPg==",
"cpu": [
"x64"
],
"license": "MIT",
"optional": true,
"os": [
"win32"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@types/node": {
"version": "24.10.3",
"resolved": "https://registry.npmmirror.com/@types/node/-/node-24.10.3.tgz",
@@ -971,6 +1011,15 @@
"node": ">= 0.6"
}
},
"node_modules/nodemailer": {
"version": "7.0.11",
"resolved": "https://registry.npmmirror.com/nodemailer/-/nodemailer-7.0.11.tgz",
"integrity": "sha512-gnXhNRE0FNhD7wPSCGhdNh46Hs6nm+uTyg+Kq0cZukNQiYdnCsoQjodNP9BQVG9XrcK/v6/MgpAPBUFyzh9pvw==",
"license": "MIT-0",
"engines": {
"node": ">=6.0.0"
}
},
"node_modules/nth-check": {
"version": "2.1.1",
"resolved": "https://registry.npmmirror.com/nth-check/-/nth-check-2.1.1.tgz",
@@ -1099,6 +1148,38 @@
"url": "https://opencollective.com/express"
}
},
"node_modules/pdf-parse": {
"version": "2.4.5",
"resolved": "https://registry.npmmirror.com/pdf-parse/-/pdf-parse-2.4.5.tgz",
"integrity": "sha512-mHU89HGh7v+4u2ubfnevJ03lmPgQ5WU4CxAVmTSh/sxVTEDYd1er/dKS/A6vg77NX47KTEoihq8jZBLr8Cxuwg==",
"license": "Apache-2.0",
"dependencies": {
"@napi-rs/canvas": "0.1.80",
"pdfjs-dist": "5.4.296"
},
"bin": {
"pdf-parse": "bin/cli.mjs"
},
"engines": {
"node": ">=20.16.0 <21 || >=22.3.0"
},
"funding": {
"type": "github",
"url": "https://github.com/sponsors/mehmet-kozan"
}
},
"node_modules/pdfjs-dist": {
"version": "5.4.296",
"resolved": "https://registry.npmmirror.com/pdfjs-dist/-/pdfjs-dist-5.4.296.tgz",
"integrity": "sha512-DlOzet0HO7OEnmUmB6wWGJrrdvbyJKftI1bhMitK7O2N8W2gc757yyYBbINy9IDafXAV9wmKr9t7xsTaNKRG5Q==",
"license": "Apache-2.0",
"engines": {
"node": ">=20.16.0 || >=22.3.0"
},
"optionalDependencies": {
"@napi-rs/canvas": "^0.1.80"
}
},
"node_modules/process-nextick-args": {
"version": "2.0.1",
"resolved": "https://registry.npmmirror.com/process-nextick-args/-/process-nextick-args-2.0.1.tgz",

229
package-lock.json generated
View File

@@ -13,7 +13,193 @@
"cors": "^2.8.5",
"docx": "^9.5.1",
"express": "^5.2.1",
"iconv-lite": "^0.6.3"
"iconv-lite": "^0.6.3",
"nodemailer": "^7.0.11",
"pdf-parse": "^2.4.5"
}
},
"node_modules/@napi-rs/canvas": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas/-/canvas-0.1.80.tgz",
"integrity": "sha512-DxuT1ClnIPts1kQx8FBmkk4BQDTfI5kIzywAaMjQSXfNnra5UFU9PwurXrl+Je3bJ6BGsp/zmshVVFbCmyI+ww==",
"license": "MIT",
"workspaces": [
"e2e/*"
],
"engines": {
"node": ">= 10"
},
"optionalDependencies": {
"@napi-rs/canvas-android-arm64": "0.1.80",
"@napi-rs/canvas-darwin-arm64": "0.1.80",
"@napi-rs/canvas-darwin-x64": "0.1.80",
"@napi-rs/canvas-linux-arm-gnueabihf": "0.1.80",
"@napi-rs/canvas-linux-arm64-gnu": "0.1.80",
"@napi-rs/canvas-linux-arm64-musl": "0.1.80",
"@napi-rs/canvas-linux-riscv64-gnu": "0.1.80",
"@napi-rs/canvas-linux-x64-gnu": "0.1.80",
"@napi-rs/canvas-linux-x64-musl": "0.1.80",
"@napi-rs/canvas-win32-x64-msvc": "0.1.80"
}
},
"node_modules/@napi-rs/canvas-android-arm64": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-android-arm64/-/canvas-android-arm64-0.1.80.tgz",
"integrity": "sha512-sk7xhN/MoXeuExlggf91pNziBxLPVUqF2CAVnB57KLG/pz7+U5TKG8eXdc3pm0d7Od0WreB6ZKLj37sX9muGOQ==",
"cpu": [
"arm64"
],
"license": "MIT",
"optional": true,
"os": [
"android"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@napi-rs/canvas-darwin-arm64": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-darwin-arm64/-/canvas-darwin-arm64-0.1.80.tgz",
"integrity": "sha512-O64APRTXRUiAz0P8gErkfEr3lipLJgM6pjATwavZ22ebhjYl/SUbpgM0xcWPQBNMP1n29afAC/Us5PX1vg+JNQ==",
"cpu": [
"arm64"
],
"license": "MIT",
"optional": true,
"os": [
"darwin"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@napi-rs/canvas-darwin-x64": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-darwin-x64/-/canvas-darwin-x64-0.1.80.tgz",
"integrity": "sha512-FqqSU7qFce0Cp3pwnTjVkKjjOtxMqRe6lmINxpIZYaZNnVI0H5FtsaraZJ36SiTHNjZlUB69/HhxNDT1Aaa9vA==",
"cpu": [
"x64"
],
"license": "MIT",
"optional": true,
"os": [
"darwin"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@napi-rs/canvas-linux-arm-gnueabihf": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-linux-arm-gnueabihf/-/canvas-linux-arm-gnueabihf-0.1.80.tgz",
"integrity": "sha512-eyWz0ddBDQc7/JbAtY4OtZ5SpK8tR4JsCYEZjCE3dI8pqoWUC8oMwYSBGCYfsx2w47cQgQCgMVRVTFiiO38hHQ==",
"cpu": [
"arm"
],
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@napi-rs/canvas-linux-arm64-gnu": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-linux-arm64-gnu/-/canvas-linux-arm64-gnu-0.1.80.tgz",
"integrity": "sha512-qwA63t8A86bnxhuA/GwOkK3jvb+XTQaTiVML0vAWoHyoZYTjNs7BzoOONDgTnNtr8/yHrq64XXzUoLqDzU+Uuw==",
"cpu": [
"arm64"
],
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@napi-rs/canvas-linux-arm64-musl": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-linux-arm64-musl/-/canvas-linux-arm64-musl-0.1.80.tgz",
"integrity": "sha512-1XbCOz/ymhj24lFaIXtWnwv/6eFHXDrjP0jYkc6iHQ9q8oXKzUX1Lc6bu+wuGiLhGh2GS/2JlfORC5ZcXimRcg==",
"cpu": [
"arm64"
],
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@napi-rs/canvas-linux-riscv64-gnu": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-linux-riscv64-gnu/-/canvas-linux-riscv64-gnu-0.1.80.tgz",
"integrity": "sha512-XTzR125w5ZMs0lJcxRlS1K3P5RaZ9RmUsPtd1uGt+EfDyYMu4c6SEROYsxyatbbu/2+lPe7MPHOO/0a0x7L/gw==",
"cpu": [
"riscv64"
],
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@napi-rs/canvas-linux-x64-gnu": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-linux-x64-gnu/-/canvas-linux-x64-gnu-0.1.80.tgz",
"integrity": "sha512-BeXAmhKg1kX3UCrJsYbdQd3hIMDH/K6HnP/pG2LuITaXhXBiNdh//TVVVVCBbJzVQaV5gK/4ZOCMrQW9mvuTqA==",
"cpu": [
"x64"
],
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@napi-rs/canvas-linux-x64-musl": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-linux-x64-musl/-/canvas-linux-x64-musl-0.1.80.tgz",
"integrity": "sha512-x0XvZWdHbkgdgucJsRxprX/4o4sEed7qo9rCQA9ugiS9qE2QvP0RIiEugtZhfLH3cyI+jIRFJHV4Fuz+1BHHMg==",
"cpu": [
"x64"
],
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@napi-rs/canvas-win32-x64-msvc": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-win32-x64-msvc/-/canvas-win32-x64-msvc-0.1.80.tgz",
"integrity": "sha512-Z8jPsM6df5V8B1HrCHB05+bDiCxjE9QA//3YrkKIdVDEwn5RKaqOxCJDRJkl48cJbylcrJbW4HxZbTte8juuPg==",
"cpu": [
"x64"
],
"license": "MIT",
"optional": true,
"os": [
"win32"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@types/node": {
@@ -983,6 +1169,15 @@
"node": ">= 0.6"
}
},
"node_modules/nodemailer": {
"version": "7.0.11",
"resolved": "https://registry.npmmirror.com/nodemailer/-/nodemailer-7.0.11.tgz",
"integrity": "sha512-gnXhNRE0FNhD7wPSCGhdNh46Hs6nm+uTyg+Kq0cZukNQiYdnCsoQjodNP9BQVG9XrcK/v6/MgpAPBUFyzh9pvw==",
"license": "MIT-0",
"engines": {
"node": ">=6.0.0"
}
},
"node_modules/nth-check": {
"version": "2.1.1",
"resolved": "https://registry.npmmirror.com/nth-check/-/nth-check-2.1.1.tgz",
@@ -1111,6 +1306,38 @@
"url": "https://opencollective.com/express"
}
},
"node_modules/pdf-parse": {
"version": "2.4.5",
"resolved": "https://registry.npmmirror.com/pdf-parse/-/pdf-parse-2.4.5.tgz",
"integrity": "sha512-mHU89HGh7v+4u2ubfnevJ03lmPgQ5WU4CxAVmTSh/sxVTEDYd1er/dKS/A6vg77NX47KTEoihq8jZBLr8Cxuwg==",
"license": "Apache-2.0",
"dependencies": {
"@napi-rs/canvas": "0.1.80",
"pdfjs-dist": "5.4.296"
},
"bin": {
"pdf-parse": "bin/cli.mjs"
},
"engines": {
"node": ">=20.16.0 <21 || >=22.3.0"
},
"funding": {
"type": "github",
"url": "https://github.com/sponsors/mehmet-kozan"
}
},
"node_modules/pdfjs-dist": {
"version": "5.4.296",
"resolved": "https://registry.npmmirror.com/pdfjs-dist/-/pdfjs-dist-5.4.296.tgz",
"integrity": "sha512-DlOzet0HO7OEnmUmB6wWGJrrdvbyJKftI1bhMitK7O2N8W2gc757yyYBbINy9IDafXAV9wmKr9t7xsTaNKRG5Q==",
"license": "Apache-2.0",
"engines": {
"node": ">=20.16.0 || >=22.3.0"
},
"optionalDependencies": {
"@napi-rs/canvas": "^0.1.80"
}
},
"node_modules/process-nextick-args": {
"version": "2.0.1",
"resolved": "https://registry.npmmirror.com/process-nextick-args/-/process-nextick-args-2.0.1.tgz",

View File

@@ -2,7 +2,7 @@
"name": "gjzx-scraper",
"version": "1.0.0",
"type": "module",
"description": "工具:抓取 https://gjzx.nanjing.gov.cn/gggs/ 公示列表信息及详情",
"description": "工具:采集 https://gjzx.nanjing.gov.cn/gggs/ 公示列表信息及详情",
"main": "src/server.js",
"scripts": {
"start": "node src/server.js"
@@ -13,6 +13,7 @@
"cors": "^2.8.5",
"docx": "^9.5.1",
"express": "^5.2.1",
"iconv-lite": "^0.6.3"
"iconv-lite": "^0.6.3",
"nodemailer": "^7.0.11"
}
}

View File

@@ -134,11 +134,11 @@ async function fetchDetails() {
listData = await dateRangeResponse.json();
} else {
// 普通模式 - 按数量抓取多页
// 普通模式 - 按数量采集多页
const url = document.getElementById('detailUrl').value;
const limit = parseInt(document.getElementById('detailLimit').value);
// 抓取多页直到获得足够数量
// 采集多页直到获得足够数量
const allItems = [];
let page = 1;
const maxPagesToFetch = Math.ceil(limit / 10) + 1; // 假设每页约10条
@@ -177,7 +177,7 @@ async function fetchDetails() {
return;
}
// 抓取详情
// 采集详情
const limit = useDetailDateRange ? listData.data.length : parseInt(document.getElementById('detailLimit').value);
const detailResponse = await fetch(`${API_BASE}/details`, {
method: 'POST',
@@ -202,7 +202,7 @@ async function fetchDetails() {
function displayDetails(items, container) {
const html = `
<div style="max-height: 380px; overflow-y: auto;">
<h3 style="margin-bottom: 15px;">抓取${items.length} 条详情</h3>
<h3 style="margin-bottom: 15px;">采集${items.length} 条详情</h3>
${items.map((item, index) => `
<div class="list-item">
<h3>${index + 1}. ${item.title}</h3>
@@ -212,7 +212,7 @@ function displayDetails(items, container) {
${item.detail.budget ? `
<span class="budget">${item.detail.budget.amount}${item.detail.budget.unit}</span>
` : '<div class="meta">未找到预算信息</div>'}
` : '<div class="error">抓取失败</div>'}
` : '<div class="error">采集失败</div>'}
<br><a href="${item.href}" target="_blank">查看原文 →</a>
</div>
`).join('')}
@@ -271,6 +271,7 @@ async function generateReport() {
currentReport = data.data;
displayReport(data.data, results);
exportBtn.style.display = 'inline-block';
document.getElementById('sendEmailBtn').style.display = 'inline-block';
} else {
results.innerHTML = `<div class="error">错误: ${data.error}</div>`;
}
@@ -475,3 +476,197 @@ async function exportReport() {
document.body.removeChild(a);
URL.revokeObjectURL(url);
}
// ========== 邮件功能 ==========
// 页面加载时加载邮件配置
document.addEventListener('DOMContentLoaded', function() {
loadEmailConfig();
});
// 保存邮件配置到localStorage
function saveEmailConfig() {
const config = {
smtpHost: document.getElementById('smtpHost').value,
smtpPort: parseInt(document.getElementById('smtpPort').value) || 587,
smtpUser: document.getElementById('smtpUser').value,
smtpPass: document.getElementById('smtpPass').value,
recipients: document.getElementById('recipients').value
};
// 验证配置
if (!config.smtpHost || !config.smtpUser || !config.smtpPass || !config.recipients) {
showEmailStatus('请填写所有必填项', 'error');
return;
}
// 保存到localStorage
localStorage.setItem('emailConfig', JSON.stringify(config));
showEmailStatus('邮件配置已保存', 'success');
}
// 从localStorage加载邮件配置
function loadEmailConfig() {
const configStr = localStorage.getItem('emailConfig');
if (configStr) {
try {
const config = JSON.parse(configStr);
document.getElementById('smtpHost').value = config.smtpHost || '';
document.getElementById('smtpPort').value = config.smtpPort || 587;
document.getElementById('smtpUser').value = config.smtpUser || '';
document.getElementById('smtpPass').value = config.smtpPass || '';
document.getElementById('recipients').value = config.recipients || '';
} catch (e) {
console.error('加载邮件配置失败:', e);
}
}
}
// 测试邮件配置
async function testEmailConfig() {
const config = {
smtpHost: document.getElementById('smtpHost').value,
smtpPort: parseInt(document.getElementById('smtpPort').value) || 587,
smtpUser: document.getElementById('smtpUser').value,
smtpPass: document.getElementById('smtpPass').value,
recipients: document.getElementById('recipients').value
};
// 验证配置
if (!config.smtpHost || !config.smtpUser || !config.smtpPass || !config.recipients) {
showEmailStatus('请填写所有必填项', 'error');
return;
}
// 创建测试报告
const testReport = {
summary: {
total_count: 1,
filtered_count: 1,
threshold: '50万元',
total_amount: '100.00万元',
generated_at: new Date().toISOString()
},
projects: [{
title: '这是一封测试邮件',
date: new Date().toLocaleDateString('zh-CN'),
publish_time: new Date().toLocaleString('zh-CN'),
budget: {
amount: 100,
unit: '万元',
text: '测试金额',
originalUnit: '万元'
},
url: 'https://gjzx.nanjing.gov.cn'
}]
};
showEmailStatus('正在发送测试邮件...', 'info');
try {
const response = await fetch(`${API_BASE}/send-email`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
emailConfig: config,
report: testReport
})
});
const data = await response.json();
if (data.success) {
showEmailStatus('测试邮件发送成功!请检查收件箱', 'success');
} else {
showEmailStatus(`发送失败: ${data.error}`, 'error');
}
} catch (error) {
showEmailStatus(`请求失败: ${error.message}`, 'error');
}
}
// 发送报告到邮箱
async function sendReportByEmail() {
if (!currentReport) {
alert('请先生成报告');
return;
}
// 从localStorage加载邮件配置
const configStr = localStorage.getItem('emailConfig');
if (!configStr) {
alert('请先在"邮件配置"标签页配置邮件服务器');
return;
}
let emailConfig;
try {
emailConfig = JSON.parse(configStr);
} catch (e) {
alert('邮件配置格式错误,请重新配置');
return;
}
// 验证配置
if (!emailConfig.smtpHost || !emailConfig.smtpUser || !emailConfig.smtpPass || !emailConfig.recipients) {
alert('邮件配置不完整,请在"邮件配置"标签页检查配置');
return;
}
const sendBtn = document.getElementById('sendEmailBtn');
const originalText = sendBtn.textContent;
sendBtn.disabled = true;
sendBtn.textContent = '正在发送...';
try {
const response = await fetch(`${API_BASE}/send-email`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
emailConfig: emailConfig,
report: currentReport
})
});
const data = await response.json();
if (data.success) {
alert('报告已成功发送到邮箱!');
} else {
alert(`发送失败: ${data.error}`);
}
} catch (error) {
alert(`请求失败: ${error.message}`);
} finally {
sendBtn.disabled = false;
sendBtn.textContent = originalText;
}
}
// 显示邮件配置状态
function showEmailStatus(message, type) {
const statusDiv = document.getElementById('emailConfigStatus');
const bgColors = {
success: '#d4edda',
error: '#f8d7da',
info: '#d1ecf1'
};
const textColors = {
success: '#155724',
error: '#721c24',
info: '#0c5460'
};
statusDiv.innerHTML = `
<div style="background: ${bgColors[type]}; color: ${textColors[type]}; padding: 15px; border-radius: 8px;">
${message}
</div>
`;
// 3秒后自动隐藏成功消息
if (type === 'success') {
setTimeout(() => {
statusDiv.innerHTML = '';
}, 3000);
}
}

View File

@@ -3,7 +3,7 @@
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>南京公共工程建设中心 - 公告抓取工具</title>
<title>南京公共工程建设中心 - 公告采集工具</title>
<style>
* {
margin: 0;
@@ -335,13 +335,14 @@
<div class="container">
<div class="header">
<h1>南京公共工程建设中心</h1>
<p>公告抓取与分析工具</p>
<p>公告采集与分析工具</p>
</div>
<div class="tabs">
<button class="tab active" onclick="switchTab('list')">公告列表</button>
<button class="tab" onclick="switchTab('detail')">详情抓取</button>
<button class="tab" onclick="switchTab('detail')">详情采集</button>
<button class="tab" onclick="switchTab('report')">生成报告</button>
<button class="tab" onclick="switchTab('email')">邮件配置</button>
</div>
<div class="content">
@@ -359,7 +360,7 @@
<div id="listLoading" class="loading">
<div class="spinner"></div>
<p>正在抓取...</p>
<p>正在采集...</p>
</div>
<div id="listResults" class="results"></div>
@@ -372,12 +373,12 @@
</div>
</div>
<!-- 详情抓取 -->
<!-- 详情采集 -->
<div id="detail" class="tab-content">
<div class="form-group">
<div class="checkbox-wrapper" onclick="document.getElementById('useDetailDateRange').click();">
<input type="checkbox" id="useDetailDateRange" onchange="toggleDetailDateRange()" onclick="event.stopPropagation();">
<label for="useDetailDateRange">按时间范围抓取</label>
<label for="useDetailDateRange">按时间范围采集</label>
</div>
</div>
@@ -391,7 +392,7 @@
<input type="date" id="detailEndDate">
</div>
<div class="form-group">
<label>最大抓取页数</label>
<label>最大采集页数</label>
<input type="number" id="detailMaxPages" value="1" min="1">
</div>
</div>
@@ -402,16 +403,16 @@
<input type="text" id="detailUrl" placeholder="默认: https://gjzx.nanjing.gov.cn/gggs/">
</div>
<div class="form-group">
<label>抓取数量</label>
<label>采集数量</label>
<input type="number" id="detailLimit" value="5" min="1" max="50">
</div>
</div>
<button class="btn" onclick="fetchDetails()">开始抓取</button>
<button class="btn" onclick="fetchDetails()">开始采集</button>
<div id="detailLoading" class="loading">
<div class="spinner"></div>
<p>正在抓取详情...</p>
<p>正在采集详情...</p>
</div>
<div id="detailResults" class="results"></div>
@@ -422,7 +423,7 @@
<div class="form-group">
<div class="checkbox-wrapper" onclick="document.getElementById('useDateRange').click();">
<input type="checkbox" id="useDateRange" onchange="toggleDateRange()" onclick="event.stopPropagation();">
<label for="useDateRange">按时间范围抓取</label>
<label for="useDateRange">按时间范围采集</label>
</div>
</div>
@@ -436,7 +437,7 @@
<input type="date" id="endDate">
</div>
<div class="form-group">
<label>最大抓取页数</label>
<label>最大采集页数</label>
<input type="number" id="maxPages" value="1" min="1" >
</div>
</div>
@@ -447,7 +448,7 @@
<input type="text" id="reportUrl" placeholder="默认: https://gjzx.nanjing.gov.cn/gggs/">
</div>
<div class="form-group">
<label>抓取数量</label>
<label>采集数量</label>
<input type="number" id="reportLimit" value="15" min="1" max="50">
</div>
</div>
@@ -459,6 +460,7 @@
<button class="btn" onclick="generateReport()">生成报告</button>
<button class="btn export-btn" onclick="exportReport()" id="exportBtn" style="display:none;">导出Word</button>
<button class="btn" onclick="sendReportByEmail()" id="sendEmailBtn" style="display:none; background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%);">发送邮件</button>
<div id="reportLoading" class="loading">
<div class="spinner"></div>
@@ -467,6 +469,56 @@
<div id="reportResults" class="results"></div>
</div>
<!-- 邮件配置 -->
<div id="email" class="tab-content">
<h2 style="margin-bottom: 20px; color: #667eea;">邮件配置</h2>
<p style="color: #666; margin-bottom: 20px;">配置SMTP邮件服务器信息,用于发送报告到指定邮箱</p>
<div class="form-group">
<label>SMTP服务器地址 *</label>
<input type="text" id="smtpHost" placeholder="例如: smtp.qq.com, smtp.163.com, smtp.gmail.com">
</div>
<div class="form-group">
<label>SMTP端口 *</label>
<input type="number" id="smtpPort" value="587" placeholder="通常为 587 (TLS) 或 465 (SSL)">
</div>
<div class="form-group">
<label>发件人邮箱 (SMTP用户名) *</label>
<input type="email" id="smtpUser" placeholder="your-email@example.com">
</div>
<div class="form-group">
<label>SMTP密码/授权码 *</label>
<input type="password" id="smtpPass" placeholder="邮箱密码或授权码">
</div>
<div class="form-group">
<label>收件人邮箱 (多个用逗号分隔) *</label>
<input type="text" id="recipients" placeholder="email1@example.com, email2@example.com">
</div>
<button class="btn" onclick="saveEmailConfig()">保存配置</button>
<button class="btn" onclick="testEmailConfig()" style="background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%);">测试连接</button>
<div id="emailConfigStatus" style="margin-top: 20px;"></div>
<div style="margin-top: 30px; padding: 20px; background: #f0f8ff; border-radius: 8px; border-left: 4px solid #667eea;">
<h3 style="margin-top: 0; color: #667eea;">常用邮箱配置参考</h3>
<ul style="line-height: 1.8; color: #666;">
<li><strong>QQ邮箱:</strong> smtp.qq.com, 端口 587 或 465, 需要使用授权码</li>
<li><strong>163邮箱:</strong> smtp.163.com, 端口 465 或 25, 需要使用授权码</li>
<li><strong>Gmail:</strong> smtp.gmail.com, 端口 587 或 465, 需要开启"允许不够安全的应用"</li>
<li><strong>Outlook:</strong> smtp-mail.outlook.com, 端口 587</li>
<li><strong>企业邮箱:</strong> 请咨询您的IT管理员获取SMTP配置</li>
</ul>
<p style="margin: 10px 0 0 0; color: #999; font-size: 13px;">
提示: QQ和163邮箱需要在邮箱设置中开启SMTP服务并生成授权码,授权码不是邮箱密码。
</p>
</div>
</div>
</div>
</div>

213
src/emailService.js Normal file
View File

@@ -0,0 +1,213 @@
import nodemailer from 'nodemailer';
// 创建邮件发送服务
export async function sendReportEmail(emailConfig, report) {
try {
// 创建SMTP传输器
const transporter = nodemailer.createTransport({
host: emailConfig.smtpHost,
port: emailConfig.smtpPort || 587,
secure: emailConfig.smtpPort === 465, // true for 465, false for other ports
auth: {
user: emailConfig.smtpUser,
pass: emailConfig.smtpPass,
},
});
// 生成HTML格式的报告内容
const htmlContent = generateReportHtml(report);
// 发送邮件
const info = await transporter.sendMail({
from: `"公告采集系统" <${emailConfig.smtpUser}>`,
to: emailConfig.recipients,
subject: `采购公告分析报告 - ${new Date().toLocaleDateString('zh-CN')}`,
html: htmlContent,
});
return {
success: true,
messageId: info.messageId,
};
} catch (error) {
console.error('发送邮件失败:', error);
throw new Error(`邮件发送失败: ${error.message}`);
}
}
// 生成HTML格式的报告
function generateReportHtml(report) {
const { summary, projects } = report;
return `
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>采购公告分析报告</title>
<style>
body {
font-family: 'PingFang SC', 'Microsoft YaHei', Arial, sans-serif;
line-height: 1.6;
color: #333;
max-width: 900px;
margin: 0 auto;
padding: 20px;
background-color: #f5f5f5;
}
.container {
background: white;
border-radius: 8px;
padding: 30px;
box-shadow: 0 2px 10px rgba(0,0,0,0.1);
}
h1 {
color: #667eea;
border-bottom: 3px solid #667eea;
padding-bottom: 10px;
margin-bottom: 20px;
}
.summary {
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
color: white;
padding: 20px;
border-radius: 8px;
margin-bottom: 30px;
}
.summary h2 {
margin-top: 0;
margin-bottom: 15px;
font-size: 18px;
}
.stat-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
gap: 15px;
}
.stat {
background: rgba(255,255,255,0.15);
padding: 12px;
border-radius: 6px;
}
.stat-label {
font-size: 13px;
opacity: 0.9;
margin-bottom: 5px;
}
.stat-value {
font-size: 22px;
font-weight: bold;
}
.project-list {
margin-top: 20px;
}
.project-item {
background: #f9f9f9;
border-left: 4px solid #667eea;
padding: 15px;
margin-bottom: 15px;
border-radius: 4px;
}
.project-item h3 {
color: #333;
margin: 0 0 10px 0;
font-size: 16px;
}
.project-meta {
color: #666;
font-size: 14px;
margin: 5px 0;
}
.budget {
display: inline-block;
background: #667eea;
color: white;
padding: 4px 12px;
border-radius: 4px;
font-weight: bold;
margin-top: 8px;
font-size: 14px;
}
.project-link {
color: #667eea;
text-decoration: none;
font-size: 13px;
word-break: break-all;
}
.footer {
margin-top: 30px;
padding-top: 20px;
border-top: 1px solid #e0e0e0;
color: #999;
font-size: 12px;
text-align: center;
}
</style>
</head>
<body>
<div class="container">
<h1>南京公共工程建设中心 - 采购公告分析报告</h1>
<div class="summary">
<h2>报告摘要</h2>
<div class="stat-grid">
<div class="stat">
<div class="stat-label">总公告数量</div>
<div class="stat-value">${summary.total_count} 条</div>
</div>
<div class="stat">
<div class="stat-label">符合条件</div>
<div class="stat-value">${summary.filtered_count} 条</div>
</div>
<div class="stat">
<div class="stat-label">金额阈值</div>
<div class="stat-value">${summary.threshold}</div>
</div>
<div class="stat">
<div class="stat-label">总金额</div>
<div class="stat-value">${summary.total_amount}</div>
</div>
</div>
${summary.date_range ? `
<div style="margin-top: 15px; padding-top: 15px; border-top: 1px solid rgba(255,255,255,0.2);">
<div class="stat-label">时间范围</div>
<div style="font-size: 14px; margin-top: 5px;">
${summary.date_range.startDate || '不限'}${summary.date_range.endDate || '不限'}
</div>
</div>
` : ''}
</div>
<h2>项目详情</h2>
<div class="project-list">
${projects.length === 0 ? '<p style="color: #999; text-align: center; padding: 20px;">暂无符合条件的项目</p>' : ''}
${projects.map((project, index) => `
<div class="project-item">
<h3>${index + 1}. ${project.title}</h3>
<div class="project-meta">
<strong>发布日期:</strong> ${project.date}
${project.publish_time ? ` | <strong>发布时间:</strong> ${project.publish_time}` : ''}
</div>
${project.budget ? `
<div class="budget">
预算金额: ${project.budget.amount.toFixed(2)} ${project.budget.unit}
${project.budget.originalUnit !== project.budget.unit ? ` (原始: ${project.budget.originalUnit})` : ''}
</div>
` : ''}
<div style="margin-top: 10px;">
<a href="${project.url}" class="project-link" target="_blank">${project.url}</a>
</div>
</div>
`).join('')}
</div>
<div class="footer">
<p>报告生成时间: ${new Date(summary.generated_at).toLocaleString('zh-CN')}</p>
<p>本报告由公告采集系统自动生成</p>
</div>
</div>
</body>
</html>
`;
}

View File

@@ -3,6 +3,7 @@ import cors from 'cors';
import axios from 'axios';
import * as cheerio from 'cheerio';
import iconv from 'iconv-lite';
import { sendReportEmail } from './emailService.js';
const app = express();
const PORT = 3000;
@@ -33,24 +34,24 @@ function isDateInRange(dateStr, startDate, endDate) {
return true;
}
// 按时间范围抓取多页列表
// 按时间范围采集多页列表
async function fetchListByDateRange(startDate, endDate, maxPages = 23) {
const allItems = [];
let shouldContinue = true;
let pageIndex = 0;
console.log(`开始按时间范围抓取: ${startDate || '不限'}${endDate || '不限'}`);
console.log(`开始按时间范围采集: ${startDate || '不限'}${endDate || '不限'}`);
while (shouldContinue && pageIndex < maxPages) {
const pageUrl = getPageUrl(pageIndex);
console.log(`正在抓取${pageIndex + 1} 页: ${pageUrl}`);
console.log(`正在采集${pageIndex + 1} 页: ${pageUrl}`);
try {
const html = await fetchHtml(pageUrl);
const items = parseList(html);
if (items.length === 0) {
console.log(`${pageIndex + 1} 页没有数据,停止抓取`);
console.log(`${pageIndex + 1} 页没有数据,停止采集`);
break;
}
@@ -70,7 +71,7 @@ async function fetchListByDateRange(startDate, endDate, maxPages = 23) {
}
if (allItemsBeforeRange && startDate) {
console.log(`${pageIndex + 1} 页所有项目都早于起始日期,停止抓取`);
console.log(`${pageIndex + 1} 页所有项目都早于起始日期,停止采集`);
shouldContinue = false;
}
@@ -82,12 +83,12 @@ async function fetchListByDateRange(startDate, endDate, maxPages = 23) {
await new Promise(resolve => setTimeout(resolve, 500));
}
} catch (err) {
console.error(`抓取${pageIndex + 1} 页失败: ${err.message}`);
console.error(`采集${pageIndex + 1} 页失败: ${err.message}`);
break;
}
}
console.log(`总共抓取${pageIndex} 页,找到 ${allItems.length} 条符合条件的公告`);
console.log(`总共采集${pageIndex} 页,找到 ${allItems.length} 条符合条件的公告`);
return allItems;
}
@@ -207,6 +208,10 @@ function parseDetail(html) {
}
function extractBudget(content) {
// 预处理内容:去除数字之间的换行符和空白字符
// 这样可以匹配被换行符分隔的数字,例如 "1\n1\n0\n9\n0\n0" -> "110900"
let cleanedContent = content.replace(/(\d)\s*[\n\r]\s*(?=\d)/g, '$1');
// 直接定义金额匹配模式(从高优先级到低优先级)
const patterns = [
// 优先级1: 带货币符号的万元
@@ -230,7 +235,7 @@ function extractBudget(content) {
// 遍历所有模式,找到优先级最高的匹配
for (const pattern of patterns) {
const match = content.match(pattern.regex);
const match = cleanedContent.match(pattern.regex);
if (match && pattern.priority < bestPriority) {
// 清理数字中的逗号并转换
const numberStr = match[1].replace(/[,]/g, '');
@@ -329,21 +334,21 @@ app.post('/api/report', async (req, res) => {
const { limit = 15, threshold = 50, url } = req.body;
const targetUrl = url && url.trim() !== '' ? url : BASE_URL;
// 按需抓取多页以获取足够的数据
// 按需采集多页以获取足够的数据
const items = [];
let pageIndex = 0;
const maxPagesToFetch = Math.ceil(limit / 10) + 1; // 假设每页约10条多抓一页保险
while (items.length < limit && pageIndex < maxPagesToFetch) {
const pageUrl = getPageUrl(pageIndex, targetUrl);
console.log(`正在抓取${pageIndex + 1} 页: ${pageUrl}`);
console.log(`正在采集${pageIndex + 1} 页: ${pageUrl}`);
try {
const html = await fetchHtml(pageUrl);
const pageItems = parseList(html);
if (pageItems.length === 0) {
console.log(`${pageIndex + 1} 页没有数据,停止抓取`);
console.log(`${pageIndex + 1} 页没有数据,停止采集`);
break;
}
@@ -354,7 +359,7 @@ app.post('/api/report', async (req, res) => {
await new Promise(resolve => setTimeout(resolve, 500));
}
} catch (err) {
console.error(`抓取${pageIndex + 1} 页失败: ${err.message}`);
console.error(`采集${pageIndex + 1} 页失败: ${err.message}`);
break;
}
}
@@ -417,7 +422,7 @@ app.post('/api/report-daterange', async (req, res) => {
try {
const { startDate, endDate, threshold = 50, maxPages = 23 } = req.body;
// 按时间范围抓取列表
// 按时间范围采集列表
const items = await fetchListByDateRange(startDate, endDate, maxPages);
if (items.length === 0) {
@@ -437,7 +442,7 @@ app.post('/api/report-daterange', async (req, res) => {
});
}
// 抓取详情
// 采集详情
const results = [];
for (const item of items) {
try {
@@ -491,6 +496,50 @@ app.post('/api/report-daterange', async (req, res) => {
}
});
// 发送报告邮件
app.post('/api/send-email', async (req, res) => {
try {
const { emailConfig, report } = req.body;
// 验证必需的配置参数
if (!emailConfig || !emailConfig.smtpHost || !emailConfig.smtpUser || !emailConfig.smtpPass) {
return res.status(400).json({
success: false,
error: '邮件配置不完整,请填写SMTP服务器、用户名和密码',
});
}
if (!emailConfig.recipients || emailConfig.recipients.trim() === '') {
return res.status(400).json({
success: false,
error: '请至少指定一个收件人',
});
}
if (!report) {
return res.status(400).json({
success: false,
error: '没有可发送的报告数据',
});
}
// 发送邮件
const result = await sendReportEmail(emailConfig, report);
res.json({
success: true,
message: '邮件发送成功',
messageId: result.messageId,
});
} catch (error) {
console.error('发送邮件API错误:', error);
res.status(500).json({
success: false,
error: error.message,
});
}
});
app.listen(PORT, () => {
console.log(`Server running at http://localhost:${PORT}`);
});