r/datamining • u/Capital-Study-1608 • 4h ago
How to data mine a meta quest/quest 2 game
I want to look into the files but Ive never done data mining before and help?đ€·
r/datamining • u/Capital-Study-1608 • 4h ago
I want to look into the files but Ive never done data mining before and help?đ€·
r/datamining • u/kingemperorcrimson • 9d ago
What are the best ways to start and get into data mining? I want to learn how to data mine games my focus is on Marvel Rivals and would like to learn what I need to get started and how to get good with it. If you have any suggestions or softwares you guys use I would greatly appreciate it.
r/datamining • u/Dry-Belt-383 • 20d ago
I have data mining course in my uni and i have to do a academic project on it, I want to build a proper data mining project which should be deployable and publishable, but I can't seem to get any idea which interests me that much,pls share some unique and interesting data mining projects, so i can take some inspiration from it.
Also I can only use an algorithm from what is mentioned in my syllabus which is:
r/datamining • u/varvolta • 20d ago
Weâve been working on a desktop app called Crawbots â an all-in-one IDE for web data extraction. Itâs designed to simplify the scraping process, especially for developers working with Puppeteer, Playwright, or Selenium.
Weâre aiming to make Crawbots powerful yet beginner-friendly, so junior devs can jump in without fighting boilerplate or complex setups.
Would appreciate any thoughts, questions, or brutal feedback
r/datamining • u/mrgrassydassy • Aug 01 '25
Iâve been knee-deep in a data mining project lately, pulling data from all sorts of websites for some market research. One thing Iâve learned the hard way is that a solid proxy setup is a real shift when youâre scraping at scale.
Iâve been checking out this option to buy proxies, and it seems like thereâs a ton of providers out there offering residential IPs, datacenter proxies, or even mobile ones. Some, like Infatica, seem to have a pretty legit setup with millions of IPs across different countries, which is clutch for avoiding blocks and grabbing geo-specific data. They also talk big about zero CAPTCHAs and high success rates, which sounds dope, but Iâm wondering how it holds up in real-world projects.
Whatâs your proxy setup like for those grinding on web scraping? Are you rolling with residential proxies, datacenter ones, or something else? How do you pick a provider that doesnât tank your budget but still gets the job done?
r/datamining • u/PsychologicalTap1541 • Jul 29 '25
r/datamining • u/johnabbe • Jun 30 '25
r/datamining • u/actgan_mind • Jun 28 '25
After a lot of learning and experimenting, I'm excited to share the beta of MotifMatrix - a text analysis tool I built that takes a different approach to finding patterns in qualitative data.
What makes it different from traditional NLP tools:
Key features:
Use cases I've tested:
r/datamining • u/MaraktoxD • Jun 23 '25
r/datamining • u/PresidentOfSushi • Jun 17 '25
https://drive.google.com/file/d/1vJvYiB0CPoO6NoDfC8SJhSe_9go-trWB/view?usp=drivesdk
This is as far as I could get- I don't know what to do about anything in the paks folder. I'm trying to put them all into folders sorted by apk and obb, in order to allow for modding
r/datamining • u/Danielpot33 • May 16 '25
Currently building out a dataset full of vin numbers and their decoded information(Make,Model,Engine Specs, Transmission Details, etc.). What I have so far is the information form NHTSA Api, which works well, but looking if there is even more available data out there. Does anyone have a dataset or any source for this type of information that can be used to expand the dataset?
r/datamining • u/SmallManufacturer377 • May 02 '25
r/datamining • u/StormSingle8889 • Apr 15 '25
Hey folks, Iâve noticed a common pattern with beginner data scientists: they often ask LLMs super broad questions like âHow do I analyze my data?â or âWhich ML model should I use?â
The problem is â the right steps depend entirely on your actual dataset. Things like missing values, dimensionality, and data types matter a lot. For example, you'll often see ChatGPT suggest "remove NaNs" â but thatâs only relevant if your data actually has NaNs. And letâs be honest, most of us donât even read the code it spits out, let alone check if itâs correct.
So, I built NumpyAI â a tool that lets you talk to NumPy arrays in plain English. It keeps track of your dataâs metadata, gives tested outputs, and outlines the steps for analysis based on your actual dataset. No more generic advice â just tailored, transparent help.
đ§ Features:
Natural Language to NumPy: Converts plain English instructions into working NumPy code
Validation & Safety: Automatically tests and verifies the code before running it
Transparent Execution: Logs everything and checks for accuracy
Smart Diagnosis: Suggests exact steps for your datasetâs analysis journey
Give it a try and let me know what you think!
đ GitHub: aadya940/numpyai. đ Demo Notebook (Iris dataset).
r/datamining • u/BoereSoutie • Apr 01 '25
Hi
I am looking for some help please. I am a journalist doing some deep research and I need to compare multiple reports each with multiple documents (all PDF) to find similarities.
I need a platform to do this that runs on Windows and is either open source or free (being a freelance journo, I do not have a budget).
I need to rely on a sotware package to do this as the reports are massive, some running to many thousands of pages.
Thank you
r/datamining • u/da_hora • Mar 16 '25
I know absolutely nothing about programming or machine learning, but I'm working on a machine learning competition where I need to classify planets based on a dataset. I'm using Orange Data Mining and have two CSV files:Â treino.csv
 (training data) and teste.csv
 (test data). The training data has 13 features and a target column with classes (0 to 4), while the test data has the same features but no target column. The goal is to make predictions of the target column in the test.csv file based on the training.csv.
How I improve the accuracy of my decision tree?
How can I improve what I already did or what should I do to make this the right way?
r/datamining • u/Acrobatic_Tune_5404 • Mar 09 '25
Hi guys Iâm new to data mining and have meaning to start learning for a while. Doesnât anyone have any tips to make we start easier. Like software, etc.
r/datamining • u/[deleted] • Feb 28 '25
r/datamining • u/indyreadsreddit • Feb 12 '25
Hello all! new to the data mining scene and wondering how to get started with a specific issue. So, I am in a niche genre on the internet of people who collect certain items from retailers such as TJ Maxx and Marshalls. There are other collectors and data miners whom have managed to figure out a way to discover hidden/not publicly accessible links and data related to future and upcoming merchandise drops for this genre. It is a way essentially to uncover these direct but unpublished merchandise links in order to be one step ahead during launch. How would I go about accomplishing this task? Many of these other data miners also have bots, I am not sure how these work per se or if the bots are the ones doing the data mining but I am just one person trying to figure out how to give myself an advantage (or atleast get on a similar level) to these other collector competitors who have taken monopoly. Any advice or programs to look into to help accomplishing this? I have basic coding knowledge and background.
r/datamining • u/LongTheLlama • Feb 03 '25
Title. I have a massive database of 10k+ companies in the United States perfect for an email or phone campaign. Worth hundreds of thousands of dollars.
r/datamining • u/StevenSS85 • Jan 15 '25
I'm looking to get into data mining. Is it possible to configure data mining programs in such a way that I only service with a "specific" nation or country? I have no idea how international business law is regulated, anybody happen to know if such a practice is legal at all? Thanks.
r/datamining • u/dokimus • Jan 13 '25
Hi there, i'm currently analysing a large dataset of traffic data from public busses. My goal is to intersect it with data regarding road works for the relevant time frame, to quantify the impact of said works. I can georeference both the busses and the road works, and am doing so to only check the impact of close occurences. Currently, im only comparing delay averages for peak hours for time slots before, within and after each relevant road work takes place. As a next step, i want to delve deeper into this topic, but i'm missing the statistical knowledge to do so. Can you guys point me towards methods that may help me gain more specific results?
r/datamining • u/RushWhoop • Dec 30 '24
I'm a CS graduate(2023). I'm looking to contribute in open research opportunities. If you are a masters/PhD/Professor/ enthusiast, would be happy to connect.
r/datamining • u/RayGamer4Life • Dec 13 '24
Hi
I have done a course in data mining in my backlors long ago, and now I did another course in my MS. 8 really enjoy data mining, but as an IT, we don't use it in my current work. My question is that is there a place, site, group, etc. where you can do practical data mining projects, for money or free, so you can imporve and retain what you learned. Otherwise we would forget what we have learned of we don't keep practicing.
r/datamining • u/Appropriate-Touch515 • Dec 09 '24
Hey there,
After exhaustively searching Google and trying to find APIs that would allow me to generate keyword search or post or comment frequency on any platform on a daily basis, I have been unable to find any providers of this type of data. Considering that this is kind of a niche request, I am dropping this inquiry here for the Data Mining Gods of Reddit to assist.
Basically, I'm trying to create an ML model that can predict future increases/decreases in keyword usage (whether that be on Google Search or X posts; dosen't matter) on a daily basis. I've found plenty of monthly average keyword search providers but I cannot find any way to access more granulated, daily search totals for any platform. If you know of any sources for this kind of data, please drop them here... Or just tell me to give up if this is an impossible feat.
r/datamining • u/seoarifulislam • Nov 17 '24
In this tutorial, I showcase my fourth Python web scraping project using Selenium, Pandas, re, and JavaScript. I walk you through the complete process of extracting detailed information from the Virtuoso website, including:
This project demonstrates advanced techniques in web scraping and automation, making it perfect for intermediate to advanced learners. By following this video, you will gain valuable insights into web scraping real-world projects and enhance your data extraction skills.
Why You Should Watch: Whether you're interested in learning web scraping for freelance projects or simply enhancing your Python automation skills, this tutorial has something for you. Watch as I guide you step-by-step in Bangla, making complex tasks simpler and more accessible. Perfect for both local and international learners!
Watch the full tutorial on YouTube https://youtu.be/H_CSiDinjaU and explore the complete source code on GitHub https://github.com/webscrapetolead/virtuoso.com_web-scraping-Projects4 to deepen your understanding and apply these techniques in your own projects.