milwright commited on
Commit
d22e40a
Β·
1 Parent(s): 69533d8

Update main README with metadata

Browse files
Files changed (1) hide show
  1. README.md +54 -52
README.md CHANGED
@@ -1,73 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Reddit Scraper
2
 
3
- A comprehensive tool for scraping Reddit data with both command-line and graphical user interfaces for data collection, analysis, and visualization in a local development environment.
4
 
5
  ## Features
6
 
7
- - Simple and advanced UI options
8
- - Search multiple subreddits simultaneously
9
- - Filter posts by keywords and various criteria
10
- - Visualize data with interactive charts
11
- - Export results to CSV or JSON
12
- - Track search history
13
-
14
- ## Installation
15
-
16
- 1. Clone this repository
17
- 2. Make sure you have Python 3.7+ installed
18
- 3. Install dependencies:
19
-
20
- ```bash
21
- pip install -r requirements.txt
22
- ```
23
-
24
- ## Usage
25
-
26
- ### Quick Start
27
-
28
- Run the script to launch the UI:
29
 
30
- ```bash
31
- ./run_scraper.sh
32
- ```
33
 
34
- For the basic UI mode:
35
 
36
- ```bash
37
- ./run_scraper.sh basic
38
- ```
39
 
40
- ### Manual Launch
 
 
 
 
 
41
 
42
- Alternatively, you can run either UI directly:
43
 
44
- ```bash
45
- # Basic UI
46
- streamlit run scraper_ui.py
47
 
48
- # Advanced UI
49
- streamlit run advanced_scraper_ui.py
50
- ```
 
51
 
52
- ## Requirements
53
 
54
- - Python 3.7+
55
- - Reddit API credentials (provided by default for testing)
56
- - Dependencies listed in requirements.txt
 
57
 
58
- ## Development
59
 
60
- This project includes:
61
 
62
- - `google_adk.py` - Core file with Reddit scraper functionality
63
- - `enhanced_scraper.py` - Extended scraper with advanced features
64
- - `scraper_ui.py` - Basic Streamlit UI
65
- - `advanced_scraper_ui.py` - Advanced UI with visualizations and filtering
66
 
67
- ## License
68
 
69
- This project is licensed under the GNU General Public License v3.0 - see the [LICENSE](LICENSE) file for details.
 
 
 
 
 
70
 
71
- ## Note
72
 
73
- The included Reddit API credentials are for demonstration purposes only. For production use, please obtain your own credentials from the [Reddit Developer Portal](https://www.reddit.com/prefs/apps).
 
 
 
 
1
+ ---
2
+ title: Reddit Scraper
3
+ emoji: πŸ”
4
+ colorFrom: red
5
+ colorTo: orange
6
+ sdk: streamlit
7
+ sdk_version: 1.32.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: gpl-3.0
11
+ ---
12
+
13
  # Reddit Scraper
14
 
15
+ A tool for scraping Reddit data with a user-friendly interface for data collection, analysis, and visualization.
16
 
17
  ## Features
18
 
19
+ - πŸ” **Search multiple subreddits** simultaneously
20
+ - πŸ”‘ **Filter posts by keywords** and various criteria
21
+ - πŸ“Š **Visualize data** with interactive charts
22
+ - πŸ’Ύ **Export results** to CSV or JSON
23
+ - πŸ“œ **Track search history**
24
+ - πŸ” **Secure credentials** management
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
+ ## How to Use
 
 
27
 
28
+ ### 1. Set up Reddit API Credentials
29
 
30
+ To use this app, you will need Reddit API credentials. You can get these from the [Reddit Developer Portal](https://www.reddit.com/prefs/apps).
 
 
31
 
32
+ - Click "Create App" or "Create Another App"
33
+ - Fill in the details (name, description)
34
+ - Select "script" as the application type
35
+ - Use "http://localhost:8000" as the redirect URI (this doesn't need to be a real endpoint)
36
+ - Click "Create app"
37
+ - Take note of the client ID (the string under "personal use script") and client secret
38
 
39
+ Enter these credentials in the app's sidebar or set them up as secrets in the Hugging Face Space settings (if you've duplicated this Space).
40
 
41
+ ### 2. Searching Reddit
 
 
42
 
43
+ 1. Enter one or more subreddits to search (one per line)
44
+ 2. Specify keywords to search for (one per line)
45
+ 3. Adjust parameters like post limit, sorting method, etc.
46
+ 4. Click "Run Search" to start scraping
47
 
48
+ ### 3. Working with Results
49
 
50
+ - Use the tabs to navigate between different views
51
+ - Apply additional filters to the search results
52
+ - Visualize the data with built-in charts
53
+ - Export results to CSV or JSON for further analysis
54
 
55
+ ## Privacy & API Usage
56
 
57
+ This tool uses the official Reddit API and follows Reddit's API terms of service. Your API credentials are never stored on our servers unless you explicitly save them to your own copy of this Space.
58
 
59
+ ## Setup Your Own Copy
 
 
 
60
 
61
+ If you want to run this app with your own credentials always available:
62
 
63
+ 1. Duplicate this Space to your account
64
+ 2. Go to Settings β†’ Repository Secrets
65
+ 3. Add the following secrets:
66
+ - `REDDIT_CLIENT_ID`: Your Reddit API client ID
67
+ - `REDDIT_CLIENT_SECRET`: Your Reddit API client secret
68
+ - `REDDIT_USER_AGENT`: (Optional) A custom user agent string
69
 
70
+ ## Tech Stack
71
 
72
+ - [Streamlit](https://streamlit.io/): UI framework
73
+ - [PRAW](https://praw.readthedocs.io/): Reddit API wrapper
74
+ - [Pandas](https://pandas.pydata.org/): Data processing
75
+ - [Plotly](https://plotly.com/): Data visualization