zrguo commited on
Commit
60e1dab
·
1 Parent(s): 8352b84

Add example of directly using modal processors

Browse files
docs/mineru_integration_en.md CHANGED
@@ -243,4 +243,118 @@ The MinerU configuration file `magic-pdf.json` supports various customization op
243
  - GPU acceleration settings
244
  - Cache settings
245
 
246
- For complete configuration options, refer to the [MinerU official documentation](https://mineru.readthedocs.io/).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
243
  - GPU acceleration settings
244
  - Cache settings
245
 
246
+ For complete configuration options, refer to the [MinerU official documentation](https://mineru.readthedocs.io/).
247
+
248
+ ### Using Modal Processors Directly
249
+
250
+ You can also use LightRAG's modal processors directly without going through MinerU. This is useful when you want to process specific types of content or have more control over the processing pipeline.
251
+
252
+ Each modal processor returns a tuple containing:
253
+ 1. A description of the processed content
254
+ 2. Entity information that can be used for further processing or storage
255
+
256
+ The processors support different types of content:
257
+ - `ImageModalProcessor`: Processes images with captions and footnotes
258
+ - `TableModalProcessor`: Processes tables with captions and footnotes
259
+ - `EquationModalProcessor`: Processes mathematical equations in LaTeX format
260
+ - `GenericModalProcessor`: A base processor that can be extended for custom content types
261
+
262
+ > **Note**: A complete working example can be found in `examples/modalprocessors_example.py`. You can run it using:
263
+ > ```bash
264
+ > python examples/modalprocessors_example.py --api-key YOUR_API_KEY
265
+ > ```
266
+
267
+ <details>
268
+ <summary> Here's an example of how to use different modal processors: </summary>
269
+
270
+ ```python
271
+ from lightrag.modalprocessors import (
272
+ ImageModalProcessor,
273
+ TableModalProcessor,
274
+ EquationModalProcessor,
275
+ GenericModalProcessor
276
+ )
277
+
278
+ # Initialize LightRAG
279
+ lightrag = LightRAG(
280
+ working_dir="./rag_storage",
281
+ embedding_func=lambda texts: openai_embed(
282
+ texts,
283
+ model="text-embedding-3-large",
284
+ api_key="your-api-key",
285
+ base_url="your-base-url",
286
+ ),
287
+ llm_model_func=lambda prompt, system_prompt=None, history_messages=[], **kwargs: openai_complete_if_cache(
288
+ "gpt-4o-mini",
289
+ prompt,
290
+ system_prompt=system_prompt,
291
+ history_messages=history_messages,
292
+ api_key="your-api-key",
293
+ base_url="your-base-url",
294
+ **kwargs,
295
+ ),
296
+ )
297
+
298
+ # Process an image
299
+ image_processor = ImageModalProcessor(
300
+ lightrag=lightrag,
301
+ modal_caption_func=vision_model_func
302
+ )
303
+
304
+ image_content = {
305
+ "img_path": "image.jpg",
306
+ "img_caption": ["Example image caption"],
307
+ "img_footnote": ["Example image footnote"]
308
+ }
309
+
310
+ description, entity_info = await image_processor.process_multimodal_content(
311
+ modal_content=image_content,
312
+ content_type="image",
313
+ file_path="image_example.jpg",
314
+ entity_name="Example Image"
315
+ )
316
+
317
+ # Process a table
318
+ table_processor = TableModalProcessor(
319
+ lightrag=lightrag,
320
+ modal_caption_func=llm_model_func
321
+ )
322
+
323
+ table_content = {
324
+ "table_body": """
325
+ | Name | Age | Occupation |
326
+ |------|-----|------------|
327
+ | John | 25 | Engineer |
328
+ | Mary | 30 | Designer |
329
+ """,
330
+ "table_caption": ["Employee Information Table"],
331
+ "table_footnote": ["Data updated as of 2024"]
332
+ }
333
+
334
+ description, entity_info = await table_processor.process_multimodal_content(
335
+ modal_content=table_content,
336
+ content_type="table",
337
+ file_path="table_example.md",
338
+ entity_name="Employee Table"
339
+ )
340
+
341
+ # Process an equation
342
+ equation_processor = EquationModalProcessor(
343
+ lightrag=lightrag,
344
+ modal_caption_func=llm_model_func
345
+ )
346
+
347
+ equation_content = {
348
+ "text": "E = mc^2",
349
+ "text_format": "LaTeX"
350
+ }
351
+
352
+ description, entity_info = await equation_processor.process_multimodal_content(
353
+ modal_content=equation_content,
354
+ content_type="equation",
355
+ file_path="equation_example.txt",
356
+ entity_name="Mass-Energy Equivalence"
357
+ )
358
+ ```
359
+
360
+ </details>
docs/mineru_integration_zh.md CHANGED
@@ -242,4 +242,117 @@ MinerU 配置文件 `magic-pdf.json` 支持多种自定义选项,包括:
242
  - GPU 加速设置
243
  - 缓存设置
244
 
245
- 有关完整的配置选项,请参阅 [MinerU 官方文档](https://mineru.readthedocs.io/)。
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
242
  - GPU 加速设置
243
  - 缓存设置
244
 
245
+ 有关完整的配置选项,请参阅 [MinerU 官方文档](https://mineru.readthedocs.io/)。
246
+
247
+ ### 直接使用模态处理器
248
+
249
+ 您也可以直接使用 LightRAG 的模态处理器,而不需要通过 MinerU。这在您想要处理特定类型的内容或对处理流程有更多控制时特别有用。
250
+
251
+ 每个模态处理器都会返回一个包含以下内容的元组:
252
+ 1. 处理后内容的描述
253
+ 2. 可用于进一步处理或存储的实体信息
254
+
255
+ 处理器支持不同类型的内容:
256
+ - `ImageModalProcessor`:处理带有标题和脚注的图像
257
+ - `TableModalProcessor`:处理带有标题和脚注的表格
258
+ - `EquationModalProcessor`:处理 LaTeX 格式的数学公式
259
+ - `GenericModalProcessor`:可用于扩展自定义内容类型的基础处理器
260
+
261
+ > **注意**:完整的可运行示例可以在 `examples/modalprocessors_example.py` 中找到。您可以使用以下命令运行它:
262
+ > ```bash
263
+ > python examples/modalprocessors_example.py --api-key YOUR_API_KEY
264
+ > ```
265
+
266
+ <details>
267
+ <summary> 使用不同模态处理器的示例 </summary>
268
+
269
+ ```python
270
+ from lightrag.modalprocessors import (
271
+ ImageModalProcessor,
272
+ TableModalProcessor,
273
+ EquationModalProcessor,
274
+ GenericModalProcessor
275
+ )
276
+
277
+ # 初始化 LightRAG
278
+ lightrag = LightRAG(
279
+ working_dir="./rag_storage",
280
+ embedding_func=lambda texts: openai_embed(
281
+ texts,
282
+ model="text-embedding-3-large",
283
+ api_key="your-api-key",
284
+ base_url="your-base-url",
285
+ ),
286
+ llm_model_func=lambda prompt, system_prompt=None, history_messages=[], **kwargs: openai_complete_if_cache(
287
+ "gpt-4o-mini",
288
+ prompt,
289
+ system_prompt=system_prompt,
290
+ history_messages=history_messages,
291
+ api_key="your-api-key",
292
+ base_url="your-base-url",
293
+ **kwargs,
294
+ ),
295
+ )
296
+
297
+ # 处理图像
298
+ image_processor = ImageModalProcessor(
299
+ lightrag=lightrag,
300
+ modal_caption_func=vision_model_func
301
+ )
302
+
303
+ image_content = {
304
+ "img_path": "image.jpg",
305
+ "img_caption": ["示例图像标题"],
306
+ "img_footnote": ["示例图像脚注"]
307
+ }
308
+
309
+ description, entity_info = await image_processor.process_multimodal_content(
310
+ modal_content=image_content,
311
+ content_type="image",
312
+ file_path="image_example.jpg",
313
+ entity_name="示例图像"
314
+ )
315
+
316
+ # 处理表格
317
+ table_processor = TableModalProcessor(
318
+ lightrag=lightrag,
319
+ modal_caption_func=llm_model_func
320
+ )
321
+
322
+ table_content = {
323
+ "table_body": """
324
+ | 姓名 | 年龄 | 职业 |
325
+ |------|-----|------|
326
+ | 张三 | 25 | 工程师 |
327
+ | 李四 | 30 | 设计师 |
328
+ """,
329
+ "table_caption": ["员工信息表"],
330
+ "table_footnote": ["数据更新至2024年"]
331
+ }
332
+
333
+ description, entity_info = await table_processor.process_multimodal_content(
334
+ modal_content=table_content,
335
+ content_type="table",
336
+ file_path="table_example.md",
337
+ entity_name="员工表格"
338
+ )
339
+
340
+ # 处理公式
341
+ equation_processor = EquationModalProcessor(
342
+ lightrag=lightrag,
343
+ modal_caption_func=llm_model_func
344
+ )
345
+
346
+ equation_content = {
347
+ "text": "E = mc^2",
348
+ "text_format": "LaTeX"
349
+ }
350
+
351
+ description, entity_info = await equation_processor.process_multimodal_content(
352
+ modal_content=equation_content,
353
+ content_type="equation",
354
+ file_path="equation_example.txt",
355
+ entity_name="质能方程"
356
+ )
357
+ ```
358
+ </details>
examples/modalprocessors_example.py ADDED
@@ -0,0 +1,200 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Example of directly using modal processors
3
+
4
+ This example demonstrates how to use LightRAG's modal processors directly without going through MinerU.
5
+ """
6
+
7
+ import asyncio
8
+ import argparse
9
+ from lightrag.llm.openai import openai_complete_if_cache, openai_embed
10
+ from lightrag.kg.shared_storage import initialize_pipeline_status
11
+ from pathlib import Path
12
+ from lightrag import LightRAG
13
+ from lightrag.modalprocessors import (
14
+ ImageModalProcessor,
15
+ TableModalProcessor,
16
+ EquationModalProcessor,
17
+ GenericModalProcessor
18
+ )
19
+
20
+ WORKING_DIR = "./rag_storage"
21
+
22
+ def get_llm_model_func(api_key: str, base_url: str = None):
23
+ return lambda prompt, system_prompt=None, history_messages=[], **kwargs: openai_complete_if_cache(
24
+ "gpt-4o-mini",
25
+ prompt,
26
+ system_prompt=system_prompt,
27
+ history_messages=history_messages,
28
+ api_key=api_key,
29
+ base_url=base_url,
30
+ **kwargs,
31
+ )
32
+
33
+ def get_vision_model_func(api_key: str, base_url: str = None):
34
+ return lambda prompt, system_prompt=None, history_messages=[], image_data=None, **kwargs: openai_complete_if_cache(
35
+ "gpt-4o",
36
+ "",
37
+ system_prompt=None,
38
+ history_messages=[],
39
+ messages=[
40
+ {"role": "system", "content": system_prompt} if system_prompt else None,
41
+ {"role": "user", "content": [
42
+ {"type": "text", "text": prompt},
43
+ {
44
+ "type": "image_url",
45
+ "image_url": {
46
+ "url": f"data:image/jpeg;base64,{image_data}"
47
+ }
48
+ }
49
+ ]} if image_data else {"role": "user", "content": prompt}
50
+ ],
51
+ api_key=api_key,
52
+ base_url=base_url,
53
+ **kwargs,
54
+ ) if image_data else openai_complete_if_cache(
55
+ "gpt-4o-mini",
56
+ prompt,
57
+ system_prompt=system_prompt,
58
+ history_messages=history_messages,
59
+ api_key=api_key,
60
+ base_url=base_url,
61
+ **kwargs,
62
+ )
63
+
64
+ async def process_image_example(lightrag: LightRAG, vision_model_func):
65
+ """Example of processing an image"""
66
+ # Create image processor
67
+ image_processor = ImageModalProcessor(
68
+ lightrag=lightrag,
69
+ modal_caption_func=vision_model_func
70
+ )
71
+
72
+ # Prepare image content
73
+ image_content = {
74
+ "img_path": "image.jpg",
75
+ "img_caption": ["Example image caption"],
76
+ "img_footnote": ["Example image footnote"]
77
+ }
78
+
79
+ # Process image
80
+ description, entity_info = await image_processor.process_multimodal_content(
81
+ modal_content=image_content,
82
+ content_type="image",
83
+ file_path="image_example.jpg",
84
+ entity_name="Example Image"
85
+ )
86
+
87
+ print("Image Processing Results:")
88
+ print(f"Description: {description}")
89
+ print(f"Entity Info: {entity_info}")
90
+
91
+ async def process_table_example(lightrag: LightRAG, llm_model_func):
92
+ """Example of processing a table"""
93
+ # Create table processor
94
+ table_processor = TableModalProcessor(
95
+ lightrag=lightrag,
96
+ modal_caption_func=llm_model_func
97
+ )
98
+
99
+ # Prepare table content
100
+ table_content = {
101
+ "table_body": """
102
+ | Name | Age | Occupation |
103
+ |------|-----|------------|
104
+ | John | 25 | Engineer |
105
+ | Mary | 30 | Designer |
106
+ """,
107
+ "table_caption": ["Employee Information Table"],
108
+ "table_footnote": ["Data updated as of 2024"]
109
+ }
110
+
111
+ # Process table
112
+ description, entity_info = await table_processor.process_multimodal_content(
113
+ modal_content=table_content,
114
+ content_type="table",
115
+ file_path="table_example.md",
116
+ entity_name="Employee Table"
117
+ )
118
+
119
+ print("\nTable Processing Results:")
120
+ print(f"Description: {description}")
121
+ print(f"Entity Info: {entity_info}")
122
+
123
+ async def process_equation_example(lightrag: LightRAG, llm_model_func):
124
+ """Example of processing a mathematical equation"""
125
+ # Create equation processor
126
+ equation_processor = EquationModalProcessor(
127
+ lightrag=lightrag,
128
+ modal_caption_func=llm_model_func
129
+ )
130
+
131
+ # Prepare equation content
132
+ equation_content = {
133
+ "text": "E = mc^2",
134
+ "text_format": "LaTeX"
135
+ }
136
+
137
+ # Process equation
138
+ description, entity_info = await equation_processor.process_multimodal_content(
139
+ modal_content=equation_content,
140
+ content_type="equation",
141
+ file_path="equation_example.txt",
142
+ entity_name="Mass-Energy Equivalence"
143
+ )
144
+
145
+ print("\nEquation Processing Results:")
146
+ print(f"Description: {description}")
147
+ print(f"Entity Info: {entity_info}")
148
+
149
+ async def initialize_rag(api_key: str, base_url: str = None):
150
+ rag = LightRAG(
151
+ working_dir=WORKING_DIR,
152
+ embedding_func=lambda texts: openai_embed(
153
+ texts,
154
+ model="text-embedding-3-large",
155
+ api_key=api_key,
156
+ base_url=base_url,
157
+ ),
158
+ llm_model_func=lambda prompt, system_prompt=None, history_messages=[], **kwargs: openai_complete_if_cache(
159
+ "gpt-4o-mini",
160
+ prompt,
161
+ system_prompt=system_prompt,
162
+ history_messages=history_messages,
163
+ api_key=api_key,
164
+ base_url=base_url,
165
+ **kwargs,
166
+ ),
167
+ )
168
+
169
+ await rag.initialize_storages()
170
+ await initialize_pipeline_status()
171
+
172
+ return rag
173
+
174
+ def main():
175
+ """Main function to run the example"""
176
+ parser = argparse.ArgumentParser(description='Modal Processors Example')
177
+ parser.add_argument('--api-key', required=True, help='OpenAI API key')
178
+ parser.add_argument('--base-url', help='Optional base URL for API')
179
+ parser.add_argument('--working-dir', '-w', default=WORKING_DIR, help='Working directory path')
180
+
181
+ args = parser.parse_args()
182
+
183
+ # Run examples
184
+ asyncio.run(main_async(args.api_key, args.base_url))
185
+
186
+ async def main_async(api_key: str, base_url: str = None):
187
+ # Initialize LightRAG
188
+ lightrag = await initialize_rag(api_key, base_url)
189
+
190
+ # Get model functions
191
+ llm_model_func = get_llm_model_func(api_key, base_url)
192
+ vision_model_func = get_vision_model_func(api_key, base_url)
193
+
194
+ # Run examples
195
+ await process_image_example(lightrag, vision_model_func)
196
+ await process_table_example(lightrag, llm_model_func)
197
+ await process_equation_example(lightrag, llm_model_func)
198
+
199
+ if __name__ == "__main__":
200
+ main()