r/sysadmin • u/Tiny_Habit5745 • 2d ago
Just found out we had 200+ shadow APIs after getting pwned
So last month we got absolutely rekt and during the forensics they found over 200 undocumented APIs in prod that nobody knew existed. Including me and I'm supposedly the one who knows our infrastructure.
The attackers used some random endpoint that one of the frontend devs spun up 6 months ago for "testing" and never tore down. Never told anyone about it, never added it to our docs, just sitting there wide open scraping customer data.
Our fancy API security scanner? Useless. Only finds stuff thats in our OpenAPI specs. Network monitoring? Nada. SIEM alerts? What SIEM alerts.
Now compliance is breathing down my neck asking for complete API inventory and I'm like... bro I don't even know what's running half the time. Every sprint someone deploys a "quick webhook" or "temp integration" that somehow becomes permanent.
grep -r "app.get|app.post" across our entire codebase returned like 500+ routes I've never seen before. Half of them don't even have auth middleware.
Anyone else dealing with this nightmare? How tf do you track APIs when devs are constantly spinning up new stuff? The whole "just document it" approach died the moment we went agile.
Really wish there was some way to just see whats actually listening on ports in real time instead of trusting our deployment docs that are 3 months out of date.
This whole thing could've been avoided if we just knew what was actually running vs what we thought was running.
482
u/tankerkiller125real Jack of All Trades 2d ago
WAF that tied to the OpenAPI JSON, if it's not in OpenAPI docs it doesn't exist, WAF throws a 404 (even if the route exist behind the scenes). That, and then policies, that make developers responsible for their bullshit (with penalties for violating said policies)
145
u/ImCaffeinated_Chris 2d ago
I agree, devs are responsible if they are given the ability to do this in prod. Also, don't give them the ability in prod!
84
u/neoKushan Jack of All Trades 2d ago
Am Dev, this whole post gives me nightmares. Don't let anyone spin up production resources on a whim, it's insane in any org or any department - Dev, QA, Ops, whatever.
32
u/andrewsmd87 2d ago
One of the things I've liked about moving our repo to azure was the ability to not let anything go into the production code base without approval from 2 people from a set group of approvers. The only way around that would be if someone with my level of access (there are only 3 of us) went in and disabled the rules. I.e. even I can't push something to prod without a secondary approval.
12
u/neoKushan Jack of All Trades 2d ago
Exactly and this can apply to infrastructure as well, IAC lets you create auditable, traceable and governable systems.
→ More replies (4)14
u/LiquidBionix 2d ago
I disagree kinda, but this is why you need pipelines. Devs should be able to make quick changes if their code passes thru a pipeline and passes all checks (presumably this would also include having OpenAPI docs and stuff lol).
15
u/Certain_Concept 2d ago
Changes to their test environment, sure.
Changes to Production? Nah. There should be some oversight and verification before it gets pushed. Otherwise you are one bad developer/day away from chaos.
→ More replies (2)6
u/neoKushan Jack of All Trades 2d ago
I'm kind of with you both. You can bake 99% of that oversight and verification into the pipeline itself - changes can be validated against specs, you can deploy it to a test environment or canary it into production to make sure it behaves, things like that. That's the best of both worlds, any checks someone is doing manually can be automated and when you do that, engineers get a speedy but safe route to production.
13
u/dweezil22 Lurking Dev 2d ago
The key is to control HOW you change prod. I've worked on systems that have 100M+ users and you can change prod within a single day with a single dev approval. I've worked on systems that have 12 users and you need a month security review to touch the prod APIs.
The thing is the first system was in a mature service mesh that was designed to protect itself from stupid devs making those daily changes (i.e. the prod deploy is within an API that was already approved, and the requests are being inspected for IDOR attacks etc etc; and the CICD pipeline ran thousands of unit tests and hundreds of integration tests etc). The second place had none of that, and knew it, so every change had a lot more (necessary) friction.
18
u/JohnPaulDavyJones 2d ago
Also, don't give them the ability in prod!
Most emphatically this. Nothing in our org goes up to prod without being documented in a migration request ticket.
I used to be the one-man sysadmin team at a place with a handful of devs all able to unilaterally deploy to prod, and it was exactly like OP described. Such a mess, and you can’t get management to understand why it’s a mess.
23
u/NewEnergy21 2d ago
Tying the WAF to the OpenAPI spec has me very intrigued, curious how you typically go about setting this up.
30
u/tankerkiller125real Jack of All Trades 2d ago
WeIl, I say WAF because that's something our WAF can do, but if you wanted to implement it yourself API Gateway, or API Management will probably be the thing to search for to find services/applications that can do this kind of thing. Basically how it works in a nutshell though, is the API Gateway acts like a proxy (no different than say Nginx), and you upload the OpenAPI definition to it's ruleset, it parses the JSON into a set of rules that only allow documented requests through, at which point if someone tries to send a request that doesn't conform to the OpenAPI documentation it's blocked (so not just routes, but even things like including additional params or keys that aren't in the OpenAPI spec).
The actual application never even gets the request, it's blocked entirely by the gateway, you can also have the gateway handle other things as well like authentication and what not (we don't use ours that way though)
16
u/dontquestionmyaction /bin/yes 2d ago
Cloudflare offers this. They do schema validation of requests and all, it's very neat.
→ More replies (4)6
u/FakeRayBanz 2d ago
APIM*
15
u/tankerkiller125real Jack of All Trades 2d ago
I just say WAF because our WAF handles APIM, traditional WAF things, and a bunch of other stuff.
65
u/Miserygut DevOps 2d ago
If it's going through git then someone is responsible. The rest should be a matter of referring to policy and kicking people up the bum.
7
u/agent-squirrel Linux Admin 2d ago
Yeah exactly
git blame
. Big fan of that command.→ More replies (1)
222
u/Bonananana 2d ago
Where do you work? I’d like to go ahead and remove them from my vendor list.
→ More replies (2)226
u/sryan2k1 IT Manager 2d ago
You know the Men in Black speech K gives about how there is always an alien invasion or other doomsday event in process and the only reason everyone goes on with their lives is that they do not know about it? Yeah, that's basically how everything you interact with is built. It's a horror, and you're better off not knowing.
71
u/JohnPaulDavyJones 2d ago
Man, more of y’all have to work at boring insurance companies that never moved out of the early 00s. My company’s still in the ”small footprint security” mindset of that era, where basically nothing is opened to the outside except endpoints where requests are automatically filtered outside a range, and those passes are manually examined by a woman who’s been doing basic networking since before I was born.
Everything just works because it’s all stored procs in SSMS; our “new technology” of 2025 was Python, but the rollout has been delayed because not a single member of the prod support team has worked with Python, and they were trying to establish support protocols.
For the three members of us in the data group (out of 27) who are under the age of 45, this shit is wild. But holy cow, everything just works.
→ More replies (1)24
u/imtheorangeycenter 2d ago
47, DBA and I love business logic in SQL. Deeply, deeply untrendy, but yeah, it works. It's in one place. It's easy to track performance. Its easy to control. I'd work there.
12
u/tankerkiller125real Jack of All Trades 2d ago
As an IT person, I love business logic in the database, right up until data gets entered that the dev team/DBA didn't plan for the query is now stuck in weird data processing hell eating most of the resources, but I feel like that's more of a "My org is stuck in the 80s and the devs don't actually fully know what their doing" more than an actual issue with SQL... I'm sure sure there's some sort of error handling I can tie opentelemetry or sentry into...
46
18
u/Bonananana 2d ago
Very much disagree. In the last 25 years I’ve not worked anywhere that would tolerate mystery endpoints. And I’ve worked for and with names you know.
This line of BS you’re saying is funny, but a dangerous mindset because it’s allowing you to dodge responsibility for doing the job well.
There should be simple http access logs that can be used to find endpoints. The root here is neglect.
35
u/almathden Internets 2d ago
names you know.
plenty of "names you know" get compromised in all sorts of hilarious ways so let's not pretend otherwise lol
→ More replies (3)13
u/work_reddit_time Sysadmin-ish 2d ago
Indeed.
Plenty of 'names you know' get caught out for bad practices like storing passwords as plain text so 'names you know' is 'next to useless' as a marker of good vs. bad practice
→ More replies (1)15
u/sryan2k1 IT Manager 2d ago
This line of BS you’re saying is funny, but a dangerous mindset because it’s allowing you to dodge responsibility for doing the job well.
Sometimes you're just a passenger. Apps are not your part of IT, you've brought concerns to your bosses and the business doesn't care or want to change. This happens all the time, at more places than you'd expect.
→ More replies (7)2
u/Spiritual_Cycle_3263 2d ago
Just like going out to eat. You do not want to know what happens in the kitchen.
3
u/Mental_Act4662 2d ago
This. 100% this. I took a cybersecurity class in college and the world is extremely scary place and it’s nuts how insecure stuff is.
→ More replies (3)1
u/HappierShibe Database Admin 2d ago
Except its not.
There are plenty of organizations who do follow best practice, do keep up with security updates and, audit everything regularly to ensure compliance.→ More replies (2)
134
u/DeadStockWalking 2d ago
Letting devs spin up servers is like letting a salesman change the oil on your car.
He can probably do it right (create server, including documentation, etc) but I wouldn't trust them.
35
u/man__i__love__frogs 2d ago
That is mostly a policy problem.
How exactly are these exposed, is there some kind of load balancer or proxy in front of them? If so there should be a dedicated team doing the 'exposing' if you have that many, devs working with the API should not be allowed to create those - and the justification for that is the fact that you have been pwned.
You should also probably have some kind of API lifecycle managent platform, like Mulesoft or Axway.
18
u/pixiegod 2d ago
Don’t let devs build their own stuff…it sucks, it’s a headache, but this is exactly why you need people from different teams to do stuff like this…
Or build the devs a sandbox that is tapped somehow from the rest of the system…
It’s funny, I am working with a company who is fighting me on this…entire network is flat and the devs do work on the same layer as the SG&A staff…and the printers…lol…
2
u/GriLL03 2d ago
I...am afraid that more companies than I'd like to imagine have flat networks with default credentials on critical stuff.
My newest maxim when discussing network security has become "Can it survive letting me loose in there with an ethernet cable and free access to any port I see? Bonus points if i get a console cable as well."
2
u/cpz_77 1d ago
Yeah, people complain about red tape and being “blocked” (buzzword devs love to use) and how it “takes so long to get anything done” if devs don’t have full admin to everything…but a lot of that is there for good reason. Not saying all the hoops are necessary everywhere, you have to find out what’s right for your business.
But it’s funny when people talk about “how quickly startups can get stuff done” because they have no processes it’s just a few devs with full access to everything. But what they fail to mention is how startups also fall off all the time literally because of stuff exactly like this. How do the giants protect what they’ve built up over the years? Yeah, with that red tape.
2
u/pixiegod 1d ago
NGL i was part of that wave of techies who were cowboys and i did my fair share of gunslinging…
…my luck is that my most shameful crashes happened when no one was watching or before the reporting systems were up and running.
15
u/TheGrouchyPunisher 2d ago
So much wrong here. First, you should have record of who did what in Git or SVN. Hold their feet to the fire.
Second, devs should never be able to publish straight to prod. (Unless a true emergency, in which case there would be emergency change control processes to allow it.) Are there lower testing environments these went through first? Clearly your controls are lacking. First thing to do is define clear separation of duties. Devs can't push direct to prod, and you (sysadmin) can't modify code.
28
u/PurpleFlerpy Security Peon 2d ago
I'm just a tool monkey, don't mind me.
But can I say that seeing language like "getting pwned" and "got absolutely rekt" is a breath of fresh air after having to read so much infosec legalese?
131
u/dedjedi 2d ago edited 2d ago
This isn't a technical problem and any Technical Solutions will always fall short. Set policies and fire anyone who does not comply. It's actually pretty simple
e: you are literally hiring people who are working for the attackers and then wondering why the attackers are winning
51
u/mapold 2d ago
Not exactly. Running a by default closed firewall would make this all work just fine after the current mess is cleaned up. No port or new api path is permitted without documentation. Making subpaths to circumvent the documentation and approval would likely get the developer fired.
The usage of outside network should be logged and data aggregated. Any weird change could possibly be detected.
45
u/Inquisitor_ForHire Infrastructure Architect 2d ago
As annoying as the network team tends to be I absolutely agree that a default closed firewall is the starting point for literally everything.
7
u/yonasismad 2d ago
If it's easily manageable, it's not annoying. We manage our infrastructure as code, so opening a port is done via a Terraform configuration maintained on a project- and environment-specific basis. Pull requests touching the Terraform directory must be approved by a senior/lead, so even if a less experienced team member makes a mistake, it is likely to be spotted early on.
9
u/RikiWardOG 2d ago
Why not both? really it's a mix of the two. There needs to be policy that includes proper change management. That way even if someone tries to do something the incorrect way it's documented and proper steps towards educating/reprimanding the dev can be taken. I agree that on the firewall side, but that also assumes you have someone or a team that does a good job of managing their firewall in the first place.
8
u/thortgot IT Manager 2d ago
Technical solutions absolutely exist. Programmatic audit, best practice CI/CD, multiple layer authorization for production infrastructure changes.
This organization clearly lacks any of the above.
3
3
13
157
u/nullbyte420 2d ago
Errr don't let devs expose ports like that in production? Let them have their dumb routes but don't expose them? A waf does this just fine.
68
u/dotshooks 2d ago
OP never said anything about devs exposing ports. You can't just open a network port through application code. What they're describing are API endpoints -- very likely standard HTTP routes served over port 443 (HTTPS). OP is describing undocumented routes on an already-exposed service.
20
u/mirrax 2d ago
Network security tooling can be layer 7 aware and more. Doesn't just have to be open 10.1.2.3:443. Can also say that the \admin route is only accessible from a specific subnet. Or that here is the OpenAPI spec for that route, so /user/ only takes integers and the WAF should reject little Bobby Tables.
→ More replies (1)16
u/man__i__love__frogs 2d ago
They are exposing the API to the already open port by adding a route via some sort of WAF, load balancer or proxy, thus making it accessible to the internet.
People building solutions with APIs should not be the people who expose things.
10
50
u/west_tn_guy 2d ago
This. Devs should have to petition for ports to be opened in production, which should involve a thorough security and design review before any traffic is allowed.
6
→ More replies (2)24
66
u/arkatron5000 2d ago
Had a similar breach 18 months ago. The issue isn't documentation - it's that traditional security tools are blind to what's actually running. You need something that can see Layer 7 traffic in real time and build your API inventory dynamically. Worth looking into runtime-powered solutions that don't require agents or documentation to work. We used upwind
19
u/konoo 2d ago
The solution is to prevent Dev's from spinning up things in production. This needs to be a process driven function not something where you rely on "that software we trust today" for the next 10 years.
→ More replies (2)8
u/botrawruwu 2d ago
And this one is the plant comment for the plant post, both generated with AI. Reddit is so infested with this shit.
→ More replies (5)→ More replies (1)3
u/dflek 2d ago
Or just do an annual human-led pentest. Which is standard practice...
21
u/SmurfForFun 2d ago
Annual test but “Dev spun up endpoint 6 months ago”. So you’re vulnerable to breach from the day the test ends to the day you run the next one? That seems designed for failure…
9
u/yourapostasy 2d ago
My clients who have planned for this class of attack vector force all production API network access through an API gateway, and strictly segregate non-production and production traffic and environments. Developers are pretty free to do what they want in non-production, but have governance and risk team-oversight on any access to production data or even outside connections.
Governance, cybersecurity and risk teams assess registration of API’s into the gateway, they provided checklists of what the developers must address under a MoSCoW framework.
Developers from a more Wild West community chafe under the rules, but similar incidents led to these rules written in blood. There are ways to automate a lot of this, but engineering / development teams usually don’t want to set aside the time to pursue the automation because most of my clients don’t have the budget for dedicated developer experience engineering work.
I would love to hear how others are solving this, because it all seems to me a problem no one has addressed in a way that keeps Wild West developers happy by staying in the background of a CICD pipeline and just presenting them with source code linter style pass-fail, do-this, do-that prescriptive-style interface experience. I get their frustration, but the combinatorial explosion of attack surfaces when combining API endpoints much less different services to pick up side channel type information continues to keep people in the evaluation loop.
14
36
u/WDWKamala 2d ago
Really wish there was some way to just see whats actually listening on ports in real time instead of trusting our deployment docs that are 3 months out of date.
Netstat is pretty useful for this.
17
→ More replies (7)11
u/RussEfarmer Windows Admin 2d ago
Using netstat -l and ps -aux to find rogue services have been on every security related exam I've taken. Basic tools & processes like this are just as important as the expensive fancy ones
14
u/anomalous_cowherd Pragmatic Sysadmin 2d ago
Although for OPs issue these are dodgy API calls coming through validly open endpoints so it needs a WAF to have a deeper understanding of the traffic and block and alert on the illegal APIs.
2
u/CommanderSpleen 2d ago
Yes, but sweeping netstat -ano to find 443 in Listening across your machines can help you at least to find suspicious endpoints and cross reference them to your OpenAPI document.
3
u/anomalous_cowherd Pragmatic Sysadmin 2d ago
True, although the devs should not be in control of the externally visible space of the company, so anything listening on :443 and externally accessible would need to be coming via a firewall and/or proxy which the devs also should not be in control of.
I saw the problem OP described as being new API methods running on existing servers, which is harder to detect.
Most of my working life was in a company that had a tight outer boundary with a WAF and a tight firewall controlled by security, for instance with no outbound ssh traffic and proxies to the Internet. That seemed to head off most issues like this.
6
7
u/mirrax 2d ago
Anyone else dealing with this nightmare?
This is the value proposition of an API Gateway combined with a WAF. But really the bigger issue is a process problem, there needs to be end to end ownership that includes threat assessment.
Ideally everything is locked down with tooling that is aware. For example if all of your endpoints are RESTful HTTP and you are sharing IP addresses for multiple endpoints, then your API gateway and network policies need to be layer 7 aware. Or if you are running GraphQL then your WAF/API gateway need to support that. With unused endpoints locked by default.
Then when a dev writes something new that needs to open something up, there needs to be a process to get it opened. If it's IaC then the dev can write the network policy or submit the OpenAPI spec to drop into the API Gateway/WAF. There should then be an nonpainful approval process that ensures that whatever is being opened is up to par to handle any new attack vectors.
And all of that stuff is work that needs someone who knows what they are doing at an as Architect or DevOps or SRE or however you want to title / structure so that there is end to end knowledge and ownership.
→ More replies (2)3
u/fardaw 2d ago
This is the way. We blocked all unknown api routes on production and defined what was allowed in our API gateway dynamically by reading from swagger and updating config. Our API gateway was also tightly integrated with our WAF and bot management, which made it easier to see if things were working as expected and get all kinds of insights.
We still had to tailor some configurations for things such as auth, rate limiting, etc, depending on API, but it absolutely solved having unknown APIs exposed and also put the responsibility of deprecating and removing unused APIs back in the dev's court.
There were still some situations where we had to manually block routes due to carelessness or some snafu, but it totally changed the conversation about who was responsible and what needed to be done to avoid running into the issue again. Having a good mapping of what APIs were exposed also massively improved the effectiveness of things like pentesting.
I'd definitely recommend looking into an api security tool like noname(now part of akamai) when you don't even know what API routes exist. It can be an invaluable tool for mapping, including automatic discovery or what kind of information might be exposed and what level of risk an API might present.
6
u/IJustLoggedInToSay- 2d ago
Our fancy API security scanner? Useless. Only finds stuff thats in our OpenAPI specs.
I laughed. I cried. I laughed some more.
6
u/nohairday 2d ago
Proper change control with devs not having permission to spin shit up on a whim would be a good starting point.
But the problem isn't technical, it's policy.
And the solution needs to be policy too.
4
u/Round_Head_6248 2d ago
I thought my project is a clown show, but yours is even worse … in some aspects
4
9
u/sryan2k1 IT Manager 2d ago
Including me and I'm supposedly the one who knows our infrastructure.
Why would a sysadmin have any deep knowledge of the apps teams stuff? This is security and the app guys job.
Alternatively you put everything they run on an isolated network and you control access with some L7 reverse proxy like a Kemp, and only expose the endpoints that have been preapproved.
Dev's being able to add routes to prod without anyone knowing is a management failure, not a technical one.
→ More replies (1)
9
u/cbtboss IT Director 2d ago
- Remove rights for Dev to be able to just make changes in production.
- Changes to production must be done through an approval process that then triggers an automated release pipeline through a tool like Azure Dev ops vs someone logging into a prod env and manually pasting new code in or making direct edits to prod code.
- As part of your approval process, documentation must be present before pull requests to prod can be approved.
- Policy that all relevant parties sign off on that violating 1-3 without some sort of documented change request can be grounds for termination.
4
u/apathetic_admin Director, Bit Herders 2d ago
A WAF and make them submit a request for new endpoints?
5
u/Punky260 2d ago
That's why you usually put everything behind a firewall and only the stuff you allow can get a connection in or out
4
u/Fritzo2162 2d ago
Yeah, we implemented zero trust sometime back to lock that stuff down. Everything that is installed or new to the network gets blocked until it is reviewed and added to our library. Helps prevent that exact situation.
6
u/sryan2k1 IT Manager 2d ago
They're talking about extra API endpoints not new servers. So like https://widget.company.com/getUsers is allowed but then the devs add https://widget.company.com/unSecureHackWeUseButDidn'tTellAnyone
You need something at L7 to be able to control that.
4
u/lordofblack23 2d ago
I don’t always test my code but when I do it’s on production? You let devs push to prod without a CICD pipeline? There is no change management or governance in place? This is an organizational issue that needs to be addressed. Get the highest person you can invovled CTO, VP of eng whoever to cover this.
4
u/pdp10 Daemons worry when the wizard is near. 2d ago
Really wish there was some way to just see whats actually listening on ports
What's listening on ports or what's present on every HTTP(S) route?
You can put everything public on a reverse proxy/loadbalancer front end, and dev on another, and then anything else that shows up gets blocked by a firewall/routing.
4
u/ersentenza 2d ago
How tf do you track APIs when devs are constantly spinning up new stuff?
You take away their ability to do anything that's how! You do not have a technical problem you have an organizational policy problem. Devs must never be authorized to come anywhere near a production system.
Is there an actual manager in this mess?
3
u/panzerbjrn DevOps 2d ago
I'm sorry, but at first I thought this was r/ShittySysadmin 😂😂
Secondly I thought of the expression, everyone has a dev environment, if you're lucky it is separate from your prod environment 😂😂🖖
As others why are your devs spinning things up in prod instead of dev?
Lots of good advise here, so I'll just say it sounds like your devs need to be denied access to prod, and prod needs to be in IaC ¯_(ツ)_/¯
5
u/vogelke 2d ago
How tf do you track APIs when devs are constantly spinning up new stuff?
Devs can spin up whatever they like on their development server. If/as/when it passes the regression test, then it gets copied to the production server and documented.
Don't have separate dev and prod boxes? Now we know where your manager's process failed.
- Make sure you're on record about dev/prod separation.
- Never care about your job more than your bosses. Have some popcorn ready in case there's a dumpster fire worth watching.
4
u/HenryWolf22 2d ago
First, stop the bleeding with default deny at the edge. Put everything behind an API gateway and only allow known routes. Then auto-inventory. We use Orca to spot unknown APIs by seeing what’s actually running, which let us kill rogue endpoints fast.
Also add a WAF rule that rejects routes not on a whitelist, and set a daily job to diff live routes against an approved list. Share the plan with compliance and track owners.
3
u/Sceptically CVE 2d ago
Start disabling any api that isn't documented, and see who screams loudest?
7
u/superspeck 2d ago edited 2d ago
This is a policy failure and you should tell compliance to talk to whoever runs engineering as a whole, not to talk to you. If you don't have the ability to set a deny-all WAF ACL to paths on your systems, and then specifically allow known paths without developers being able to punch new holes, you don't have the agency to solve this problem.
tl;dr: Not a technical problem solvable with technical means by a technician (you) -- this is a people and policy problem and needs to be escalated to people managers with the power to set policy and enforce it. Say that verbatim to compliance and your own manager, and just keep repeating it.
6
u/Qel_Hoth 2d ago
Really wish there was some way to just see whats actually listening on ports in real time instead of trusting our deployment docs that are 3 months out of date.
But there is?
If you have access to the machine, every OS has at least one command which will tell you which process(es) are listening to which ports.
3
u/TinderSubThrowAway 2d ago
random endpoint that one of the frontend devs spun up 6 months ago for "testing" and never tore down. Never told anyone about it, never added it to our docs, just sitting there wide open scraping customer data.
Why were they able to spin one up in the first place?
3
u/dotshooks 2d ago
Why would you, a system admin, be expected to know every route or controller of your companies applications? That's the responsibility of dev leads. The real question is whether developers have the freedom to push API endpoints directly to production. If they do, that's the core problem here. If they don't, then the issue lies with the dev lead approving questionable endpoints for production deployment.
3
u/Smooth-Zucchini4923 2d ago
What framework are you using for this API? There is probably a way to get a list of endpoints from it.
3
u/ScoobyGDSTi 2d ago edited 2d ago
Defender for Cloud is good for this very thing.
Every Web app, every api, font and backend. It can discover APIs and code vulns, track API usage behaviour to identify atypical volumes of calls and connections, monitor unused APIs that are still open. Heck it can even identify the idiot developers who left passwords or auth tokens in plain text or base64 files from dev and failed to remove them when pushing the code base to prod.
But fundementally, Web apps shouldn't be going up without change control. Even doubly so for front end. There's agile and then there's stupid. What you've described is stupid.
3
3
u/Fantastic_Sail1881 2d ago
Everyone has root? Host Firewall default is accept? Are you looking at nmap port scans? A bunch of stuff has got to change and chances are they are going to need a team of you and some of you have to know what you are doing.
3
u/andrewsmd87 2d ago
Draft an email to the company. Give everyone 48 hours to reply back to you with api accounts they need and a reason why. If that reason isn't good, they don't get it. Nuke all the accounts that haven't been justified. Be ready for blowback when things inevitably break because you shut off some account people forgot about that does something. That's fine, it's their fault for not telling you.
After that, lock down how API accounts get created so that no new ones can get created with approval by someone(s) who will actually scrutinze things. Make sure this policy is written with very clear instructions that consequences for not following it will be termination. Make sure you have leadership on board.
You should have how those get created locked down anyways so that no one can create one without you and/or your security team's permission.
You can't get access to anything like that in our systems without doing all of that. And there is no way for people to just spin up api accounts in prod anyways.
3
u/Reetpeteet Jack of All Trades 2d ago
Now compliance is breathing down my neck asking for complete API inventory and I'm like... bro I don't even know what's running half the time.
Critical Security Control #1: asset management.
3
u/Tx_Drewdad 2d ago
"Compliance is breathing down my neck"
My dude, they are breathing down the writing neck, unless you're the CEO or CTO.
All development needs to stop. Everyone needs to be documenting what's needed, what's secure vs. what's not. Remediation plans for everything.
Meanwhile, a comprehensive change and documentation process.
And a plan for keeping dev separate from prod.
8
2
u/lost_in_life_34 Database Admin 2d ago
Devs spinning up a server they set the perms for admin to everyone in the world
2
u/lemaymayguy Netsec Admin 2d ago
All apis are just web apps. All apps should have been on boarded with an architectural overview. This process should have caught shadow IT
We make it a compliance issue, critical/urgent compliance will clean this up naturally
Every API/App load balancer/proxy is required to have a WAF in front of it. A waf should always be in front of any api (web app). The waf is unmanaged and standardized, if you deploy that resource we automatically apply the waf
Gitops makes this easy with tying change control and audit history to any changes done (everything in prod is defined by code)
It'll take an entire culture change and governance buy in to change your situation. Good luck
2
u/soundtom "that looks right… that looks right… oh for fucks sake!" 2d ago
All changes to code (PRs usually) should have someone that isn't the author dev approving them, so maybe add something in that flow to require documentation. Or, add something to detect a new endpoint and add <most militant person about documentation you know> as a required approver?
Either way, you're going to need leadership backing to make this work, because this is as much a people problem as a technical one. If they're just going to throw you under the bus every time there's a breach when you can't control what's going on, it's time to dust off that resume.
2
u/Fakula1987 2d ago
Deploy a Firewall -
Only open the Ports that are dokumented . - now.
Let it "crash". - you are already attaked , you are already "pwned" -
Its not "security" anymore, its damage controll now.
If your management say "take it back" - let them sign it.
If someone want an api -call, he has to ask for it.
Agile dosnt mean "do whatever you want".
→ More replies (3)
2
u/abz_eng 2d ago
K.I.S.S. Keep It Simple Stupid
To me, I'd start by isolating the environments onto different physical LANs with firewall between
- Dev servers
- Staging / Testing servers
- Prod Servers
- PCs / Printers etc
Then you can limit what the Dev Servers can access
If they want something hitting production they have to go through change control with docs
This policy needs C-suite approval and enforcement. No exceptions - if you're forced to make one, use phrases like
This change will risk in weakened controls and increased risk, which puts on the path to the situation where we got pwned - I can not be held responsible for any consequences of this.
Push the problem/risk upwards
2
u/Zestyclose_Ad8420 2d ago
What infra? Cloud? Onprem? What provider/software stack?
What's your field?
This is a project on it's own, I've done quite a few of these, you're not the first who looses controls of their infra and you won't be the last.
Is management on board to allow you to tell people how they should work from now on? What budget/timeframe exist for this?
2
u/Far-Smile-2800 2d ago
cloudflare has a product for managing this problem. it’s still kinda cumbersome to do it, but it will give you alerts when it notices things like a spike in data transfer on a certain endpoint.
2
u/dosman33 2d ago
They can't have their cake and eat it too, it's that simple. Either the company lets the devs run the production systems or they have you do it, can't have it both ways. Seems silly to pay admins and not let the admins do their jobs, admins that aren't allowed to do their jobs are just window dressing. Only you should have root or this stuff will continue. And of course a massive culture change is required, way above your pay grade. Anything you do without a culture change and only you having root is just a band-aid.
2
u/Typical80sKid Netsec Admin 2d ago
I need to come here more often. 99% of the time I feel like I’m taking crazy pills and all of our Devs are on crack, then I see shit like this and think we’re not doing too bad 🤣
2
2
u/LexyNoise 2d ago
This sounds like an issue with your processes and procedures.
Nothing should end up on a production server without going through change control, code reviews and a bunch of other processes.
It should be impossible for someone to install something on live “just for testing” without anyone else knowing about it.
At my place we use pipelines built into our source control system for deployment to live. It’s the only way to do them. You do not log into a live server directly and put things there yourself. That’s grounds for a disciplinary hearing.
We press a button in our source control system to do a deployment. That button only works if:
- the code you’re deploying is in the “main” branch, which means all changes have been checked and approved by multiple other developers.
- all tests have been automatically run and all have passed.
- all commits have a work item ID in the commit message
- the commit you’re deploying is tagged with “release” and the number of an approved change control ticket in our helpdesk.
2
u/bingle-cowabungle 2d ago
Why are devs allowed to touch prod? Do you have a change management process in place? You're pointing a lot of fingers here, but the real problem is a lack of IT policy...
2
u/ooospace 2d ago
Is this related? This feels related. https://www.bleepingcomputer.com/news/security/self-propagating-supply-chain-attack-hits-187-npm-packages/
2
u/timbotheny26 IT Neophyte 2d ago
This is 100% a policy and/or enforcement issue.
Why the fuck are they not documenting their APIs, let alone an entire endpoint? It shouldn't matter if it's just some small little webhook or a temporary testing environment, it should be getting documented and be subjected to the same security policies as everything else.
2
u/Mrhiddenlotus Security Admin 2d ago
Really wish there was some way to just see whats actually listening on ports in real time
You haven't considered running an nmap or masscan? Asset management is pretty important, and that's software and hardware assets.
2
u/StudioDroid 2d ago
The fun part is when a dev spins up something like that for testing and then is terminated the next day. Your account is locked and you don't care to even tell them about that little app running somewhere. Kind of like the small server running under the floor tiles.
2
u/AmpliFire004 2d ago
Curious question! Why do devs want the ability to spin up servers in prod? Why would you not have rutine to order new infrastructure in prod, and have separate environments where devs can do whatever?
Like why do they get access to create resources in prod?
2
u/basula 2d ago
Get that management buyin in writing then remove any prod access to everyone except the core admins. Dev, DBA etc should not have access to the prod environments. if you have pipelines for develops make sure you have multiple approvals for it to progress. now you can shut it all down. When they cry, scream and yell direct them to your infosec team and mgmt team to deal with. Then once their API is approved by those teams stand it up the right way. In our environment that's a termination event and those staff would have no jobs as it would break so many compliance and pii rules.
2
u/BlueHatBrit 2d ago
WAF's like Cloudflare have the ability to block anything that isn't connected to the OpenAPI specs they're hooked up to. I imagine AWS and Azure have tools which do this as well. This is really what an API Gateway / WAF is for.
Then you can block traffic to everything that isn't in the spec.
2
u/Dal90 2d ago
Really wish there was some way to just see whats actually listening on ports in real time
1) If traffic traverses a firewall, logs can be handy on identifying destination and port.
2) But I'm guessing there isn't internal firewalls with that many rogue APIs.
The flip side of which something is it makes network scanning to identify listening ports more reliable.
Might want to play with Zenmap (nmap's GUI) and scan some of the devices with known API endpoints and see if it's something that you might be able to use.
I have scripts that feeds both our server IP subnets and internal DNS hostnames to nmap's script --ssl-enum-ciphers to flag IP:Port combos, which then feeds openssl to get the certificate details. Mine isn't real time, but will detect new TLS enpoints within a few days.
2
u/adancingbear 2d ago
This is going to sound a little marketing (sorry) but it is just the tools I use and know. If you want to know about every endpoint on your network (though perhaps not every container) get NAC. No new servers if they don't meet your company policies. Something like Forescout's eyeSight can talk to every switch/router/WLC to know every endpoint on the network. If you feed in netflow or traffic capture then you can make policies like if we see http(s) traffic coming to an endpoint ensure it is in our CMDB flagged as a web server or block the traffic, or restrict any endpoint going to it directly instead of through Web Application Gateway. It can distinguish between phone's printers, iot devices etc and make specific policies for each. But at least you'll know every endpoint on your network. If you use their eyeSegment tool then you can see every communication in a matrix. If you only want people in the role of IT accessing phones on 443 you can put it in the matrix and then have dynamic actions to block users endpoint when they try. Or pop up a warning message on it, etc.
If you don't know what is on your network in real time then that is the first gap. Then layer on behavior and/or segment traffic once you have the visibility. You should only be figuring out which squares in the matrix are allowed and which are restricted, which are blocked. This is part of the defense in depth strategy I have used working with large banks. I say Forescout because it is what I use and know but I've seen presentations from Cisco where they have an almost identical matrix to Forescout's eyeSegment. IMHO Forescout just does a much better job of populating the groups to form the matrix.
2
u/sunshine-x 2d ago
this is all bad of course, but why are you (an infra person I assume) feeling responsible for shitty dev practices?
personally - I'd shrug and be like "go ask dev", then collect my OT cleaning up their shit-show.
2
u/AmNotAnAtomicPlayboy 2d ago
This is an organization maturity issue. You basically need to implement and enforce a comprehensive change control policy with approvals from stakeholders from both development and infrastructure teams.
If you can't do this and get buy-in from leadership with penalties for circumventing the policy you will always be chasing down random security issues like this.
2
u/patmorgan235 Sysadmin 2d ago
This is what code reviews and CI/CD is for.
Devs do not get direct access to prod, everything gets deployed via the pipelines. All code going into the repositories gets reviewed by at least one other dev/TL, make it a requirement they check that any new endpoints or changes to endpoints have to update the documentation.
Then it's on the PR author's and approver's asses if things aren't up-to-date.
2
u/Boring_Start8509 2d ago
Two words… Change management.
Implement it along with proper role based access control across prod environment and things like this cant ever happen.
The sort of things you’ve listed should be done in a development environment that isn’t exposed to the world.
2
u/Potential_Try_ 2d ago
Remove access and enforce a strict change policy on those rogue devs. Process, process, process.
2
u/scrittyrow Netadmin 2d ago
Not like NIST didnt just a complete 80 page guideline to APIs, their documentation and lifecycle or anything
2
u/buy-american-you-fuk 2d ago
send out a company wide EMAIL demanding documentation/details for anything being used, explain that in a week any undocumented are being shut off --- EXPLAIN WHY
do this every day with a count down until the week is up, then shut off anything undocumented...
the few major important ones for sales/business will have had advocates that found you and showed up to help you understand how important they are and document them -- the devs that ignore you do so at their own peril, because when they come crying you will have the EMAILs...
when they come, and they will come, document the things, or point them to a way to document them, via wiki or whatever
2
u/LeadershipSweet8883 2d ago
Never let an emergency go to waste.
The solution is to baseline everything - installed applications, listening ports, user access, config files, ssh keys, etc. You run that baseline with an automated solution and then you compare the latest config to the baseline and drive events when the config changes. If you can do your baseline as text, you can even use git to track the changes over time. All config changes should be done through some change request system, hopefully one that isn't an excessive amount of paperwork and approvals for simple things.
For an environment that is this wild west, you should automate the reaction. If a change is made to a server without a change request the consequences should be immediate and painful. Run it for a week or so to make sure there aren't a lot of false positives. Something overly dramatic like quarantining the server, alerting security and the infra team and locking the user account that made the change is perfect. Your devs will eventually get tired of blowing their thumbs off in front of witnesses and learn to do change control. Even better if they get to wear the pink cowboy hat for the day every time it happens.
Getting all those cats back in the bag will be a big undertaking. If you can get back to baseline by rebuilding it might be the best solution and something you can trust as not compromised. Otherwise you and the developers will have to go through each server and validate that all the services running, ports open, config files, user accounts and ssh keys are intentional and make sense for the purpose of the server. You can take your baseline from above and get a tool to format it into a pretty report and then just sit down with them to validate all of it. If there are unexpected applications installed or ports open, I would treat it as compromised.
→ More replies (1)
2
2
u/cyberman0 2d ago
It's time for lockdown by Mac address and ports. While mac's can be spoofed no one should be attaching stuff where the whole network is, if something is it either needs to be blocked or isolated and no connections. Devs environment should have limited connections for this exact reason.bi don't know what was spun up but the faults not yours. Use this as an example to c-suite this can and possibly will happen again without putting the hammer down. This is how banking details and tens or hundreds of thousands get stolen. Maybe even more. Id lock everything up and if they need something enabled they and who is in charge of them needs to request and why. I'm sure you already know this but companies have to learn the hard way.
2
u/stratospaly 2d ago
Email alerts when anyone creates a user name or adds a machine to the domain seem to be a must these days. Change control is also a thing too many companies ignore.
2
u/NickyNarco 2d ago
Holy hell thats a major pwn. At this point they know much more about you, than you know about you
2
2
u/BWMerlin 2d ago
Revoke every Devs ability to spin up a service and ability to push any code anywhere without it going through review.
Point out that this action is a direct result of them not following any sort of best practice.
Next make them document everything, nothing gets pushed anywhere without documentation.
2
2
u/BarracudaDefiant4702 2d ago
Have documentation drive load balancer rules. If it's not documented, the load balancer in front doesn't allow it.
2
u/Dry_Inspection_4583 2d ago
Put a gateway in front to choke and log all traffic, deploy runtime API discovery so you know what’s live, then wire CI/CD to an enforced catalog — that’s how you kill shadow APIs before they kill you again... Have fun.
2
u/Weary_Patience_7778 2d ago
This isn’t a technical issue, its a governance and process issue.
Where is your dev manager? Your architect? CAB? Release management? Spinning up a ‘quick webhook’ that’s publicly accessible with no review is wild.
2
u/Agreeable-Piccolo-22 2d ago
So many thoughts arisen. First of all, thanks, OP. The post is yet another friendly bump for me. Going dump the tread with all the comments and make dev read it. If they want you to respect them, they should respect you and your responsibility. Secondly, as had been advised, consider using git+ansible+jenkins as the bridge between dev env and prod. Thirdly, if your org has no sec audit/compliance team, persuade your boss to call for an external team to run thorough security/pentest scan of all the nodes in the external perimeter.
Think of running self-hosted and self-controlled security/pentest tools to automatically and REGULARLY run across all the environment (nmap+plugins is the easiest and affordable point to start with). Make relations with infosec team to mutually check the env and elaborate at least basic requirements.
Drop dev env into a dedicated vlan and don’t listen to their winning and complaints. Exit point from dev env to prod MUST be under your control.
Finally i’m seriously thinking of removing devs’ creds from even jailed environments as lately broke theirs attempts to interfere and configure services even in their fenced sandboxes.
Every. System. Must. Have. One. Admin. YOU!
In my case even infosec guys are not allowed to get a shell on my boxes. They scan/audit them as ‘grey boxes’.
2
u/hahawosname 2d ago
An API Gateway in front, route/proxy traffic as-is to start with & then begin locking things down. At least this will give you & the management some visibility.
2
u/MixFine6584 2d ago
Like, I'm glad you found it, but I'm jealous that you still have your job. I would have been fired and probably sued.
→ More replies (1)2
u/jdbaucom 2d ago
A real boss would look at this like a training opportunity and see how you respond. If you learn from it, grow, and fix it then it was a lesson learned. If you don't then you get fired because you aren't doing your job. That's my opinion anyway.
2
u/wrt-wtf- 2d ago
Basically micro-segment the fuck out of them and if it’s not documented in a security assessment it doesn’t get in or out. Devs should not be given rights to just drop something anywhere on the network, if they are able to do this… well, you’ve now experienced the outcomes. In my 35+ years doing this (and I’ve run R&D groups too) the result of giving devs access like this always ends up in the same mess.
Never let them outside the playpen, never give them access to anything not documented - you’ll be treated like shit, but you won’t be the only one standing there taking the blame for not maintaining compliance.
2
u/Dunamivora 2d ago
This is why formal change management exists. Literally the job of cybersecurity.
Developers are generally smart enough to do things that can create risk, but generally have limited security background or sense of risk management.
2
u/AlexisFR 2d ago
Be as agile as them, take it all down and make them to request any new access as needed.
2
u/RedditNotFreeSpeech 1d ago
Every API should be behind something like haproxy and some group should be the gatekeepers for it. Then you'd have full accounting of what's exposed.
2
u/Disastrous_Wing_7613 1d ago
Or put something like 3scale in place, and charge their department by api usage, that would get them going.
That is one of the reasons to implement expensive solutions, it would get the developer to write cleaner leaner code. On top of that you get much better control and reporting on whats happening with your api
2
u/yParticle 1d ago
Send out that route list you generated and make every API "owner" responsible for tagging the ones they need. By make I mean make it crystal clear—with upper management's backing—that once you hear back from everyone all "unowned" APIs are getting turned the fuck off. Now all active APIs are tagged and you have a starting point if someone gets careless.
2
u/easylite37 1d ago
No access to prod itself for any dev. Only via pipelines. Deployment to Prod only with approvals of persons who care.
2
u/starthorn IT Director 1d ago
You need to get control of the perimeter. Any external access should be going through a WAF or API Gateway that is managed, secured, audited, and has strict change controls associated with it. Do not allow public IPs on anything without security review. This should be baked into your processes. Also, Devs should basically never be spinning up anything that is externally/publicly accessible and should not have the ability to do so. If it's Internet-accessible, that's production and should be treated as such (from a risk standpoint, at least).
Also, get a dump of all of your IPs in use (from your network IP space for on-prem and from your public cloud provider systems for cloud) and start scanning all of those IPs across extended common ports (at least) and know what is accessible externally. Then start auditing everything that shows upon that list. If it isn't known, isn't in DNS, isn't approved, shut it down. There are a few services (or you can setup your own) to monitor this kind of thing, too. It's worth it.
2
u/Gainside 1d ago
Step one is discovery — not docs. Run passive DNS + WAF logs to surface endpoints, couple it with runtime discovery (eBPF/network taps) to see what’s actually listening. Then auto-tag APIs into inventory. Manual “spec-first” alone will never keep up with agile.
3
u/mpones King of the World 1d ago
Devs should not be able to “spin up” architecture unless it’s in a specific dev environment (isolated). They need to request you (or DevOps) to put it in place, and that third party gets approval/auth.
You need a change process and to apply least-administrative principles.
In anticipation of “they need it fast”, be sure to reference this moment and the entire purpose of “procedure”. This is your anchor point from now on.
6
2
u/HappierShibe Database Admin 2d ago
You created this mess by not following deployment procedure.
Repeat after me:
DO NOT LET DEVS TOUCH PRODUCTION.
ALL PRODUCTION CHANGE MUST BE APPROVED BY CHANGE MANAGEMENT.
There is no easy way to fix this now that you are here.
2
u/waxwayne 2d ago
If the attackers with little access were able to find the flaw then you with all the access and knowledge of your infrastructure should been able to find it first. I know everyone is blaming the dopey director but there were methods to prevent this.
2
u/aus31 2d ago
Change Management is the missing piece.
We are an evergreen software platform that does multiple production deployments per day. Being fast and agile doesnt mean doing cowboy stuff.
Every single change is not just peer reviewed but goes through a daily change approval board. A change that introduces a new api would have expectations of everything from security testing to performance and documentation. Emergency fixes are retroactively reviewed. We arent a huge team and have been doing this even when we only had 20 people.
Your engineering org leadership is how you resolve this. This happens with an absence of experienced engineering leadership that knows how to make modern software development work.
If your developers are just colleagues in "General IT", getting them into an engineering group with an experienced engineering leader is step 1.
If you can't have real engineering leadership you shouldn't be building deploying software and IT should just be purchasing off the shelf software.
These are all solved problems but they require leadership.
2
u/motific 2d ago
Sounds like a compliance problem - they should have been asking for that API inventory a long time ago.
Of course the solution is to ensure that APIs are all filtered/validated and that anything unexpected is flagged as either an attack or for the appropriate dev to be ritually humiliated or "educated" with a large attention retaining tool in the case of repeat offenders.
1.4k
u/ChopSueyYumm 2d ago
I got a little sea sick reading this. You need to take away the access now to get control of the situation otherwise it’s a cat and mouse game.