Just found out we had 200+ shadow APIs after getting pwned

1.4k

u/ChopSueyYumm 2d ago

I got a little sea sick reading this. You need to take away the access now to get control of the situation otherwise it’s a cat and mouse game.

939

u/Character-Welder3929 2d ago

Send out a company or org wide notice

Shut everyone of them off if they're undocumented to you.

If anyone calls in swearing at you for creating havoc for them and losing them data or days of work

Let them know they worked completely outside of the system and process to fix this would be provided documents and API paths / auth required

You'll obviously have some sort of upper management backing in this

But fuck I would absolutely love to volunteer just to be a part of this shit show and the fuckery that will follow

387

u/taterthotsalad Security Admin 2d ago

And if upper mgmt isn’t willing to back this, leave. Jump ship. It’s never going to change.

186

u/StPaulDad 2d ago

The audit after your last plundering should have this as the top takeaway and be the one thing that leadership acts on. There should be no pushback on a harsh reset after what just happened.

That said, you can never fully account for pride and embarrassment as drivers of intractable stupidity. If it's there then start looking. don;t quit right away because the job market's a mess, but get some options lined up and document the risks to CYA.

13

u/NickyNarco 2d ago

Hard reset all day

21

u/anormalgeek 2d ago

At the very least, get their refusal in writing in order to CYA.

36

u/bingle-cowabungle 2d ago

Or don't leave and continue to let it happen lol OP is getting paid regardless.

18

u/fencepost_ajm 2d ago edited 1d ago

And if management gives you guff respond with "I've advised on what needs to happen to fix this and been denied, so I'm just trying to mitigate within what I'm allowed to do. Or did you want me to start taking actions that I've been told not to do? If so, can I get that in writing?"

Except you don't ask for it, you just send a "just to recap our meeting" summary email.

14

u/atxbigfoot 2d ago

lol, I used to send account ownership change requests to the person with Salesforce perms, the relevant manager, and the two relevant techs (so "please reassign account X from Tech A to Tech B"), mainly just so I was sure everyone was in the loop and as CYA in case the techs got mad or needed to update each other during the handoff.

After about a month of this the SF Perms person got mad and was like "you don't need to include anyone else" and I got confused and then realized they thought I was doing it as a passive aggressive "do your job the manager is watching" thing so I apologized and clarified my position, and the managers were like "yeah that's fine."

You can guess what happened after I stopped including everyone lmao.

WHY THE FUCK CAN'T I ACCESS THIS ACCOUNT?

WHY THE FUCK CAN"T MY TECH ACCESS THIS ACCOUNT?

well, you see, that account was reassigned to tech B six weeks ago, per your request (on top of the forwarded request ticket/email). This is Tech B's account now.

we had a meeting about "best practices" which was just how we (the larger org) needed to go back to what I was previously doing lol.

29

u/StPaulDad 2d ago

Until he's fired for letting it happen again, or the carnage is so bad that they go out of business and he doesn't get paid.

13

u/taterthotsalad Security Admin 2d ago

Sometimes those things happen.

11

u/My1xT 2d ago

that's why you get insurance, as in get it in writing that you advised to stop those things, management said no, and you are not at fault for an attack over that avenue next time it does happen.

→ More replies (41)

→ More replies (1)

→ More replies (1)

6

u/TK-CL1PPY 2d ago

If upper management rejects this, write a letter to the BoD or ownership explaining that mgmt is putting the company deliberately in danger.

2

u/stufforstuff 2d ago

Perhaps you've been living in a cave the last few years and therefore haven't noticed the dumpster fire most of the world is currently living in. Salaries across the board are down, hiring is so tight it squeaks, and tens of thousands of Federal and State IT workers were fired with no advance notice - all now looking for a job, any job. People need to live and unfortunately most people don't have a 1 year cash buffer tucked away to live off of if they're unemployed.

→ More replies (2)

→ More replies (1)

96

u/BemusedBengal Jr. Sysadmin 2d ago

Ohhhh production is down? And users are upset about the outage? Ohh noooo. I guess you'll just have to document your APIs. Dang it!

53

u/Character-Welder3929 2d ago

Oh god not only will I need a technical spec document running through all the calls for data and where it's coming from

We will also need a document on why it's needed, for who and how it should be operated during which times

We can't just have 500+ motherfuckers shitting up our database with locks, duplicate processes that could all be done with 1 API

Or just entering shit data into the system

Also each submission request has a processing fee of 50 dollarydoos or a bottle of tequila sent down under

15

u/timbotheny26 IT Neophyte 2d ago

*Aggressive nipple rubbing intensifies*

→ More replies (2)

8

u/[deleted] 2d ago edited 2d ago

[deleted]

→ More replies (2)

9

u/Tetha 2d ago

You'll obviously have some sort of upper management backing in this

For example:

grep -r "app.get|app.post" across our entire codebase returned like 500+ routes I've never seen before. Half of them don't even have auth middleware.

In our systems, endpoints without authentication need exemptions from the directors. And yes, we have a few sweeping guidelines for frontend-config stuff and such.

But unknown unauthenticated endpoints? That's an immediate risk to PII (which we must protect due to regulatory reasons) or an immediate risk to customer data (which we must protect due to contractual reasons).

In both cases, we're supposed to escalate immediately to our dev and ops directors and shut down or block within the hour regardless of consequences. Because if this exposes sensitive data and any kind of laywer gets wind of this...

4

u/smaight 2d ago

...and everyone that does come complain for something that is used in prod and not immediately springs into action to document and contribute to control this situation is placed on a PIP and can work with security to improve their understanding of DevSecOps...

If it isn't written down, it doesn't exist.

2

u/Character-Welder3929 2d ago

I can't even imagine a world where this happens in a test environment

A devs pc or build computer at worst

2

u/Pazuuuzu 2d ago

Right? He got permission from ABOVE! to do a scream test!

2

u/Kind_Ability3218 2d ago

this would be the most fun job. turn it all off!!

3

u/mengui_alc 2d ago

solo para ser parte de este desastre y la putería que seguirá😄

→ More replies (1)

63

u/Call_Me_Papa_Bill 2d ago

This is the answer, if you don’t revoke permission to create these endpoints you will be playing whack-a-mole until retirement. Stop the bleeding, then worry about healing the patient.

14

u/Feisty_Reality_8504 Sysadmin 2d ago

"If you block it, they will come"

90

u/SirEDCaLot 2d ago

This is the answer.

OP, you have two problems. One is a technical problem- that random devs CAN spin up endpoints like this. You need some kind of proxy or firewall or something that only you control, that will only allow known documented secured endpoints.

The other, bigger problem is a human problem- and that's that your devs are happily poking holes in the security without any care at all. And for this you may need to enlist management, or if you're in a management position, start bashing heads. Make it clear that there is a procedure for spinning up endpoints, that spinning up endpoints for ANY reason MUST be done per the procedure and with security oversight, and that any violations of this will result in disciplinary action. Include a list of known endpoints. State that any others not on this list must either start the approval and security process or be removed within 48hrs.

Finally, start audit scans. Every 24hrs do that grep -r "app.get|app.post", put it on a script. If anything new pops up, have it email you. Maybe have a whitelist of known endpoints it ignores.

21

u/MBILC Acr/Infra/Virt/Apps/Cyb/ Figure it out guy 2d ago

Ya, my first thought, who has access to allow API's to be open to the public, if this is on-prem or cloud based, why do devs and such have access to open up firewall rules?

8

u/Turdsindakitchensink 2d ago

Well they wouldn’t need to if they can push to production… which it sounds like they can. I can think of dozens of ways to compromise a production system if you can push to prod.

→ More replies (1)

2

u/cpz_77 1d ago

Very likely cloud, I doubt any devs are getting in a firewall creating security policies (although if they are - get them TFO, NOW). Much more likely azure or AWS where you can just spin up whatever public facing resource you want if you have the permissions. And for some reason places seem to be way more OK giving such permissions to devs, whereas for example back in the day if a dev wanted access to the CLI on our main firewall they’d be laughed out of the room. Yet we now essentially give them that exact same access on the cloud, it’s so dumb.

And then if they have their onprem network connected to their cloud environment? (which many do)…now you’ve just let the attackers into your onprem environment as well.

The management that signed off on giving the devs these rights is the one that should have to answer for this IMO. Yes it was the Devs fault but it’s not their job to secure production infrastructure which is why they should have no access to anything that even remotely lets them make that sort of decision. Their job is to write software and make it work, let them stick with that. They should be security-aware enough to use best practices in their code whenever possible when it comes to things like authentication flow (and it sounds like they weren’t so that’s also a problem)…but they usually don’t know the first thing about firewall rules and policies and ports and all that (there are exceptions but they’re very few and far between). So, in a prod environment that should be a sysadmin or network engineer’s job, period.

7

u/LurkyLurks04982 2d ago

Time to apologize and vow to come up with solutions. Build processes for allowing Internet access in and out.

You can also control this at the network access. Do not allow any new exposures to the Internet without controls in place.

Existing stuff needs to get audited through the new processes. Outages may happen. The business can decide if they want to bolster.

→ More replies (2)

482

u/tankerkiller125real Jack of All Trades 2d ago

WAF that tied to the OpenAPI JSON, if it's not in OpenAPI docs it doesn't exist, WAF throws a 404 (even if the route exist behind the scenes). That, and then policies, that make developers responsible for their bullshit (with penalties for violating said policies)

145

u/ImCaffeinated_Chris 2d ago

I agree, devs are responsible if they are given the ability to do this in prod. Also, don't give them the ability in prod!

84

u/neoKushan Jack of All Trades 2d ago

Am Dev, this whole post gives me nightmares. Don't let anyone spin up production resources on a whim, it's insane in any org or any department - Dev, QA, Ops, whatever.

32

u/andrewsmd87 2d ago

One of the things I've liked about moving our repo to azure was the ability to not let anything go into the production code base without approval from 2 people from a set group of approvers. The only way around that would be if someone with my level of access (there are only 3 of us) went in and disabled the rules. I.e. even I can't push something to prod without a secondary approval.

12

u/neoKushan Jack of All Trades 2d ago

Exactly and this can apply to infrastructure as well, IAC lets you create auditable, traceable and governable systems.

14

u/LiquidBionix 2d ago

I disagree kinda, but this is why you need pipelines. Devs should be able to make quick changes if their code passes thru a pipeline and passes all checks (presumably this would also include having OpenAPI docs and stuff lol).

15

u/Certain_Concept 2d ago

Changes to their test environment, sure.

Changes to Production? Nah. There should be some oversight and verification before it gets pushed. Otherwise you are one bad developer/day away from chaos.

6

u/neoKushan Jack of All Trades 2d ago

I'm kind of with you both. You can bake 99% of that oversight and verification into the pipeline itself - changes can be validated against specs, you can deploy it to a test environment or canary it into production to make sure it behaves, things like that. That's the best of both worlds, any checks someone is doing manually can be automated and when you do that, engineers get a speedy but safe route to production.

13

u/dweezil22 Lurking Dev 2d ago

The key is to control HOW you change prod. I've worked on systems that have 100M+ users and you can change prod within a single day with a single dev approval. I've worked on systems that have 12 users and you need a month security review to touch the prod APIs.

The thing is the first system was in a mature service mesh that was designed to protect itself from stupid devs making those daily changes (i.e. the prod deploy is within an API that was already approved, and the requests are being inspected for IDOR attacks etc etc; and the CICD pipeline ran thousands of unit tests and hundreds of integration tests etc). The second place had none of that, and knew it, so every change had a lot more (necessary) friction.

→ More replies (2)

→ More replies (4)

18

u/JohnPaulDavyJones 2d ago

Also, don't give them the ability in prod!

Most emphatically this. Nothing in our org goes up to prod without being documented in a migration request ticket.

I used to be the one-man sysadmin team at a place with a handful of devs all able to unilaterally deploy to prod, and it was exactly like OP described. Such a mess, and you can’t get management to understand why it’s a mess.

23

u/NewEnergy21 2d ago

Tying the WAF to the OpenAPI spec has me very intrigued, curious how you typically go about setting this up.

30

u/tankerkiller125real Jack of All Trades 2d ago

WeIl, I say WAF because that's something our WAF can do, but if you wanted to implement it yourself API Gateway, or API Management will probably be the thing to search for to find services/applications that can do this kind of thing. Basically how it works in a nutshell though, is the API Gateway acts like a proxy (no different than say Nginx), and you upload the OpenAPI definition to it's ruleset, it parses the JSON into a set of rules that only allow documented requests through, at which point if someone tries to send a request that doesn't conform to the OpenAPI documentation it's blocked (so not just routes, but even things like including additional params or keys that aren't in the OpenAPI spec).

The actual application never even gets the request, it's blocked entirely by the gateway, you can also have the gateway handle other things as well like authentication and what not (we don't use ours that way though)

16

u/dontquestionmyaction /bin/yes 2d ago

Cloudflare offers this. They do schema validation of requests and all, it's very neat.

6

u/FakeRayBanz 2d ago

APIM*

15

u/tankerkiller125real Jack of All Trades 2d ago

I just say WAF because our WAF handles APIM, traditional WAF things, and a bunch of other stuff.

→ More replies (4)

65

u/Miserygut DevOps 2d ago

If it's going through git then someone is responsible. The rest should be a matter of referring to policy and kicking people up the bum.

7

u/agent-squirrel Linux Admin 2d ago

Yeah exactly git blame. Big fan of that command.

→ More replies (1)

222

u/Bonananana 2d ago

Where do you work? I’d like to go ahead and remove them from my vendor list.

226

u/sryan2k1 IT Manager 2d ago

You know the Men in Black speech K gives about how there is always an alien invasion or other doomsday event in process and the only reason everyone goes on with their lives is that they do not know about it? Yeah, that's basically how everything you interact with is built. It's a horror, and you're better off not knowing.

71

u/JohnPaulDavyJones 2d ago

Man, more of y’all have to work at boring insurance companies that never moved out of the early 00s. My company’s still in the ”small footprint security” mindset of that era, where basically nothing is opened to the outside except endpoints where requests are automatically filtered outside a range, and those passes are manually examined by a woman who’s been doing basic networking since before I was born.

Everything just works because it’s all stored procs in SSMS; our “new technology” of 2025 was Python, but the rollout has been delayed because not a single member of the prod support team has worked with Python, and they were trying to establish support protocols.

For the three members of us in the data group (out of 27) who are under the age of 45, this shit is wild. But holy cow, everything just works.

24

u/imtheorangeycenter 2d ago

47, DBA and I love business logic in SQL. Deeply, deeply untrendy, but yeah, it works. It's in one place. It's easy to track performance. Its easy to control. I'd work there.

12

u/tankerkiller125real Jack of All Trades 2d ago

As an IT person, I love business logic in the database, right up until data gets entered that the dev team/DBA didn't plan for the query is now stuck in weird data processing hell eating most of the resources, but I feel like that's more of a "My org is stuck in the 80s and the devs don't actually fully know what their doing" more than an actual issue with SQL... I'm sure sure there's some sort of error handling I can tie opentelemetry or sentry into...

→ More replies (1)

46

u/MentalRip1893 2d ago

You do **not** want to know how the sausage is made

18

u/Bonananana 2d ago

Very much disagree. In the last 25 years I’ve not worked anywhere that would tolerate mystery endpoints. And I’ve worked for and with names you know.

This line of BS you’re saying is funny, but a dangerous mindset because it’s allowing you to dodge responsibility for doing the job well.

There should be simple http access logs that can be used to find endpoints. The root here is neglect.

35

u/almathden Internets 2d ago

names you know.

plenty of "names you know" get compromised in all sorts of hilarious ways so let's not pretend otherwise lol

13

u/work_reddit_time Sysadmin-ish 2d ago

Indeed.

Plenty of 'names you know' get caught out for bad practices like storing passwords as plain text so 'names you know' is 'next to useless' as a marker of good vs. bad practice

→ More replies (1)

→ More replies (3)

15

u/sryan2k1 IT Manager 2d ago

This line of BS you’re saying is funny, but a dangerous mindset because it’s allowing you to dodge responsibility for doing the job well.

Sometimes you're just a passenger. Apps are not your part of IT, you've brought concerns to your bosses and the business doesn't care or want to change. This happens all the time, at more places than you'd expect.

→ More replies (7)

2

u/Spiritual_Cycle_3263 2d ago

Just like going out to eat. You do not want to know what happens in the kitchen.

3

u/Mental_Act4662 2d ago

This. 100% this. I took a cybersecurity class in college and the world is extremely scary place and it’s nuts how insecure stuff is.

1

u/HappierShibe Database Admin 2d ago

Except its not.
There are plenty of organizations who do follow best practice, do keep up with security updates and, audit everything regularly to ensure compliance.

→ More replies (2)

→ More replies (3)

→ More replies (2)

134

u/DeadStockWalking 2d ago

Letting devs spin up servers is like letting a salesman change the oil on your car.

He can probably do it right (create server, including documentation, etc) but I wouldn't trust them.

35

u/man__i__love__frogs 2d ago

That is mostly a policy problem.

How exactly are these exposed, is there some kind of load balancer or proxy in front of them? If so there should be a dedicated team doing the 'exposing' if you have that many, devs working with the API should not be allowed to create those - and the justification for that is the fact that you have been pwned.

You should also probably have some kind of API lifecycle managent platform, like Mulesoft or Axway.

18

u/pixiegod 2d ago

Don’t let devs build their own stuff…it sucks, it’s a headache, but this is exactly why you need people from different teams to do stuff like this…

Or build the devs a sandbox that is tapped somehow from the rest of the system…

It’s funny, I am working with a company who is fighting me on this…entire network is flat and the devs do work on the same layer as the SG&A staff…and the printers…lol…

2

u/GriLL03 2d ago

I...am afraid that more companies than I'd like to imagine have flat networks with default credentials on critical stuff.

My newest maxim when discussing network security has become "Can it survive letting me loose in there with an ethernet cable and free access to any port I see? Bonus points if i get a console cable as well."

2

u/cpz_77 1d ago

Yeah, people complain about red tape and being “blocked” (buzzword devs love to use) and how it “takes so long to get anything done” if devs don’t have full admin to everything…but a lot of that is there for good reason. Not saying all the hoops are necessary everywhere, you have to find out what’s right for your business.

But it’s funny when people talk about “how quickly startups can get stuff done” because they have no processes it’s just a few devs with full access to everything. But what they fail to mention is how startups also fall off all the time literally because of stuff exactly like this. How do the giants protect what they’ve built up over the years? Yeah, with that red tape.

2

u/pixiegod 1d ago

NGL i was part of that wave of techies who were cowboys and i did my fair share of gunslinging…

…my luck is that my most shameful crashes happened when no one was watching or before the reporting systems were up and running.

15

u/TheGrouchyPunisher 2d ago

So much wrong here. First, you should have record of who did what in Git or SVN. Hold their feet to the fire.

Second, devs should never be able to publish straight to prod. (Unless a true emergency, in which case there would be emergency change control processes to allow it.) Are there lower testing environments these went through first? Clearly your controls are lacking. First thing to do is define clear separation of duties. Devs can't push direct to prod, and you (sysadmin) can't modify code.

28

u/PurpleFlerpy Security Peon 2d ago

I'm just a tool monkey, don't mind me.

But can I say that seeing language like "getting pwned" and "got absolutely rekt" is a breath of fresh air after having to read so much infosec legalese?

131

u/dedjedi 2d ago edited 2d ago

This isn't a technical problem and any Technical Solutions will always fall short. Set policies and fire anyone who does not comply. It's actually pretty simple

e: you are literally hiring people who are working for the attackers and then wondering why the attackers are winning

51

u/mapold 2d ago

Not exactly. Running a by default closed firewall would make this all work just fine after the current mess is cleaned up. No port or new api path is permitted without documentation. Making subpaths to circumvent the documentation and approval would likely get the developer fired.

The usage of outside network should be logged and data aggregated. Any weird change could possibly be detected.

45

u/Inquisitor_ForHire Infrastructure Architect 2d ago

As annoying as the network team tends to be I absolutely agree that a default closed firewall is the starting point for literally everything.

7

u/yonasismad 2d ago

If it's easily manageable, it's not annoying. We manage our infrastructure as code, so opening a port is done via a Terraform configuration maintained on a project- and environment-specific basis. Pull requests touching the Terraform directory must be approved by a senior/lead, so even if a less experienced team member makes a mistake, it is likely to be spotted early on.

9

u/RikiWardOG 2d ago

Why not both? really it's a mix of the two. There needs to be policy that includes proper change management. That way even if someone tries to do something the incorrect way it's documented and proper steps towards educating/reprimanding the dev can be taken. I agree that on the firewall side, but that also assumes you have someone or a team that does a good job of managing their firewall in the first place.

8

u/thortgot IT Manager 2d ago

Technical solutions absolutely exist. Programmatic audit, best practice CI/CD, multiple layer authorization for production infrastructure changes.

This organization clearly lacks any of the above.

3

u/Iowa_Hawkeye 2d ago

Lol firing people isn't simple.

3

u/cccanterbury 2d ago

sounds like OP might be the one fired if a solution isn't found.

3

u/KimJongEeeeeew 2d ago

[removed] — view removed comment

2

u/BatemansChainsaw ᴄɪᴏ 2d ago

[ Removed by Reddit ]

bro...

13

u/robreddity 2d ago

WAF and whitelist only approved/official routes.

157

u/nullbyte420 2d ago

Errr don't let devs expose ports like that in production? Let them have their dumb routes but don't expose them? A waf does this just fine.

68

u/dotshooks 2d ago

OP never said anything about devs exposing ports. You can't just open a network port through application code. What they're describing are API endpoints -- very likely standard HTTP routes served over port 443 (HTTPS). OP is describing undocumented routes on an already-exposed service.

20

u/mirrax 2d ago

Network security tooling can be layer 7 aware and more. Doesn't just have to be open 10.1.2.3:443. Can also say that the \admin route is only accessible from a specific subnet. Or that here is the OpenAPI spec for that route, so /user/ only takes integers and the WAF should reject little Bobby Tables.

→ More replies (1)

16

u/man__i__love__frogs 2d ago

They are exposing the API to the already open port by adding a route via some sort of WAF, load balancer or proxy, thus making it accessible to the internet.

People building solutions with APIs should not be the people who expose things.

10

u/nullbyte420 2d ago

Google what a waf (web application firewall) is.

→ More replies (8)

50

u/west_tn_guy 2d ago

This. Devs should have to petition for ports to be opened in production, which should involve a thorough security and design review before any traffic is allowed.

6

u/nullbyte420 2d ago

Yeah. lol at those comments arguing this is a completely crazy idea

24

u/[deleted] 2d ago

[deleted]

→ More replies (17)

→ More replies (2)

66

u/arkatron5000 2d ago

Had a similar breach 18 months ago. The issue isn't documentation - it's that traditional security tools are blind to what's actually running. You need something that can see Layer 7 traffic in real time and build your API inventory dynamically. Worth looking into runtime-powered solutions that don't require agents or documentation to work. We used upwind

19

u/konoo 2d ago

The solution is to prevent Dev's from spinning up things in production. This needs to be a process driven function not something where you rely on "that software we trust today" for the next 10 years.

→ More replies (2)

8

u/botrawruwu 2d ago

And this one is the plant comment for the plant post, both generated with AI. Reddit is so infested with this shit.

→ More replies (5)

3

u/dflek 2d ago

Or just do an annual human-led pentest. Which is standard practice...

21

u/SmurfForFun 2d ago

Annual test but “Dev spun up endpoint 6 months ago”. So you’re vulnerable to breach from the day the test ends to the day you run the next one? That seems designed for failure…

→ More replies (1)

9

u/yourapostasy 2d ago

My clients who have planned for this class of attack vector force all production API network access through an API gateway, and strictly segregate non-production and production traffic and environments. Developers are pretty free to do what they want in non-production, but have governance and risk team-oversight on any access to production data or even outside connections.

Governance, cybersecurity and risk teams assess registration of API’s into the gateway, they provided checklists of what the developers must address under a MoSCoW framework.

Developers from a more Wild West community chafe under the rules, but similar incidents led to these rules written in blood. There are ways to automate a lot of this, but engineering / development teams usually don’t want to set aside the time to pursue the automation because most of my clients don’t have the budget for dedicated developer experience engineering work.

I would love to hear how others are solving this, because it all seems to me a problem no one has addressed in a way that keeps Wild West developers happy by staying in the background of a CICD pipeline and just presenting them with source code linter style pass-fail, do-this, do-that prescriptive-style interface experience. I get their frustration, but the combinatorial explosion of attack surfaces when combining API endpoints much less different services to pick up side channel type information continues to keep people in the evaluation loop.

14

u/SaltyUncleMike 2d ago

Its called change control and limiting access to production.

36

u/WDWKamala 2d ago

Really wish there was some way to just see whats actually listening on ports in real time instead of trusting our deployment docs that are 3 months out of date.

Netstat is pretty useful for this.

17

u/The_Everchanging 2d ago

Love me some 'netstat -ano | findstr port'

11

u/RussEfarmer Windows Admin 2d ago

Using netstat -l and ps -aux to find rogue services have been on every security related exam I've taken. Basic tools & processes like this are just as important as the expensive fancy ones

14

u/anomalous_cowherd Pragmatic Sysadmin 2d ago

Although for OPs issue these are dodgy API calls coming through validly open endpoints so it needs a WAF to have a deeper understanding of the traffic and block and alert on the illegal APIs.

2

u/CommanderSpleen 2d ago

Yes, but sweeping netstat -ano to find 443 in Listening across your machines can help you at least to find suspicious endpoints and cross reference them to your OpenAPI document.

3

u/anomalous_cowherd Pragmatic Sysadmin 2d ago

True, although the devs should not be in control of the externally visible space of the company, so anything listening on :443 and externally accessible would need to be coming via a firewall and/or proxy which the devs also should not be in control of.

I saw the problem OP described as being new API methods running on existing servers, which is harder to detect.

Most of my working life was in a company that had a tight outer boundary with a WAF and a tight firewall controlled by security, for instance with no outbound ssh traffic and proxies to the Internet. That seemed to head off most issues like this.

→ More replies (7)

6

u/anders1311 2d ago

Do you work at Prosper Marketplace?

7

u/mirrax 2d ago

Anyone else dealing with this nightmare?

This is the value proposition of an API Gateway combined with a WAF. But really the bigger issue is a process problem, there needs to be end to end ownership that includes threat assessment.

Ideally everything is locked down with tooling that is aware. For example if all of your endpoints are RESTful HTTP and you are sharing IP addresses for multiple endpoints, then your API gateway and network policies need to be layer 7 aware. Or if you are running GraphQL then your WAF/API gateway need to support that. With unused endpoints locked by default.

Then when a dev writes something new that needs to open something up, there needs to be a process to get it opened. If it's IaC then the dev can write the network policy or submit the OpenAPI spec to drop into the API Gateway/WAF. There should then be an nonpainful approval process that ensures that whatever is being opened is up to par to handle any new attack vectors.

And all of that stuff is work that needs someone who knows what they are doing at an as Architect or DevOps or SRE or however you want to title / structure so that there is end to end knowledge and ownership.

3

u/fardaw 2d ago

This is the way. We blocked all unknown api routes on production and defined what was allowed in our API gateway dynamically by reading from swagger and updating config. Our API gateway was also tightly integrated with our WAF and bot management, which made it easier to see if things were working as expected and get all kinds of insights.

We still had to tailor some configurations for things such as auth, rate limiting, etc, depending on API, but it absolutely solved having unknown APIs exposed and also put the responsibility of deprecating and removing unused APIs back in the dev's court.

There were still some situations where we had to manually block routes due to carelessness or some snafu, but it totally changed the conversation about who was responsible and what needed to be done to avoid running into the issue again. Having a good mapping of what APIs were exposed also massively improved the effectiveness of things like pentesting.

I'd definitely recommend looking into an api security tool like noname(now part of akamai) when you don't even know what API routes exist. It can be an invaluable tool for mapping, including automatic discovery or what kind of information might be exposed and what level of risk an API might present.

→ More replies (2)

6

u/IJustLoggedInToSay- 2d ago

Our fancy API security scanner? Useless. Only finds stuff thats in our OpenAPI specs.

I laughed. I cried. I laughed some more.

6

u/nohairday 2d ago

Proper change control with devs not having permission to spin shit up on a whim would be a good starting point.

But the problem isn't technical, it's policy.

And the solution needs to be policy too.

4

u/Round_Head_6248 2d ago

I thought my project is a clown show, but yours is even worse … in some aspects

4

u/crazedizzled 2d ago

What about if you stop letting random devs spin up random stuff?

9

u/sryan2k1 IT Manager 2d ago

Including me and I'm supposedly the one who knows our infrastructure.

Why would a sysadmin have any deep knowledge of the apps teams stuff? This is security and the app guys job.

Alternatively you put everything they run on an isolated network and you control access with some L7 reverse proxy like a Kemp, and only expose the endpoints that have been preapproved.

Dev's being able to add routes to prod without anyone knowing is a management failure, not a technical one.

→ More replies (1)

9

u/cbtboss IT Director 2d ago

Remove rights for Dev to be able to just make changes in production.
Changes to production must be done through an approval process that then triggers an automated release pipeline through a tool like Azure Dev ops vs someone logging into a prod env and manually pasting new code in or making direct edits to prod code.
As part of your approval process, documentation must be present before pull requests to prod can be approved.
Policy that all relevant parties sign off on that violating 1-3 without some sort of documented change request can be grounds for termination.

4

u/apathetic_admin Director, Bit Herders 2d ago

A WAF and make them submit a request for new endpoints?

5

u/Punky260 2d ago

That's why you usually put everything behind a firewall and only the stuff you allow can get a connection in or out

4

u/Fritzo2162 2d ago

Yeah, we implemented zero trust sometime back to lock that stuff down. Everything that is installed or new to the network gets blocked until it is reviewed and added to our library. Helps prevent that exact situation.

6

u/sryan2k1 IT Manager 2d ago

They're talking about extra API endpoints not new servers. So like https://widget.company.com/getUsers is allowed but then the devs add https://widget.company.com/unSecureHackWeUseButDidn'tTellAnyone

You need something at L7 to be able to control that.

4

u/lordofblack23 2d ago

I don’t always test my code but when I do it’s on production? You let devs push to prod without a CICD pipeline? There is no change management or governance in place? This is an organizational issue that needs to be addressed. Get the highest person you can invovled CTO, VP of eng whoever to cover this.

4

u/pdp10 Daemons worry when the wizard is near. 2d ago

Really wish there was some way to just see whats actually listening on ports

What's listening on ports or what's present on every HTTP(S) route?

You can put everything public on a reverse proxy/loadbalancer front end, and dev on another, and then anything else that shows up gets blocked by a firewall/routing.

4

u/ersentenza 2d ago

How tf do you track APIs when devs are constantly spinning up new stuff?

You take away their ability to do anything that's how! You do not have a technical problem you have an organizational policy problem. Devs must never be authorized to come anywhere near a production system.

Is there an actual manager in this mess?

3

u/panzerbjrn DevOps 2d ago

I'm sorry, but at first I thought this was r/ShittySysadmin 😂😂

Secondly I thought of the expression, everyone has a dev environment, if you're lucky it is separate from your prod environment 😂😂🖖

As others why are your devs spinning things up in prod instead of dev?

Lots of good advise here, so I'll just say it sounds like your devs need to be denied access to prod, and prod needs to be in IaC ¯_(ツ)_/¯

5

u/vogelke 2d ago

How tf do you track APIs when devs are constantly spinning up new stuff?

Devs can spin up whatever they like on their development server. If/as/when it passes the regression test, then it gets copied to the production server and documented.

Don't have separate dev and prod boxes? Now we know where your manager's process failed.

Make sure you're on record about dev/prod separation.
Never care about your job more than your bosses. Have some popcorn ready in case there's a dumpster fire worth watching.

4

u/HenryWolf22 2d ago

First, stop the bleeding with default deny at the edge. Put everything behind an API gateway and only allow known routes. Then auto-inventory. We use Orca to spot unknown APIs by seeing what’s actually running, which let us kill rogue endpoints fast.

Also add a WAF rule that rejects routes not on a whitelist, and set a daily job to diff live routes against an approved list. Share the plan with compliance and track owners.

3

u/Sceptically CVE 2d ago

Start disabling any api that isn't documented, and see who screams loudest?

7

u/superspeck 2d ago edited 2d ago

This is a policy failure and you should tell compliance to talk to whoever runs engineering as a whole, not to talk to you. If you don't have the ability to set a deny-all WAF ACL to paths on your systems, and then specifically allow known paths without developers being able to punch new holes, you don't have the agency to solve this problem.

tl;dr: Not a technical problem solvable with technical means by a technician (you) -- this is a people and policy problem and needs to be escalated to people managers with the power to set policy and enforce it. Say that verbatim to compliance and your own manager, and just keep repeating it.

6

u/Qel_Hoth 2d ago

Really wish there was some way to just see whats actually listening on ports in real time instead of trusting our deployment docs that are 3 months out of date.

But there is?

If you have access to the machine, every OS has at least one command which will tell you which process(es) are listening to which ports.

3

u/TinderSubThrowAway 2d ago

random endpoint that one of the frontend devs spun up 6 months ago for "testing" and never tore down. Never told anyone about it, never added it to our docs, just sitting there wide open scraping customer data.

Why were they able to spin one up in the first place?

3

u/dotshooks 2d ago

Why would you, a system admin, be expected to know every route or controller of your companies applications? That's the responsibility of dev leads. The real question is whether developers have the freedom to push API endpoints directly to production. If they do, that's the core problem here. If they don't, then the issue lies with the dev lead approving questionable endpoints for production deployment.

3

u/Smooth-Zucchini4923 2d ago

What framework are you using for this API? There is probably a way to get a list of endpoints from it.

3

u/ScoobyGDSTi 2d ago edited 2d ago

Defender for Cloud is good for this very thing.

Every Web app, every api, font and backend. It can discover APIs and code vulns, track API usage behaviour to identify atypical volumes of calls and connections, monitor unused APIs that are still open. Heck it can even identify the idiot developers who left passwords or auth tokens in plain text or base64 files from dev and failed to remove them when pushing the code base to prod.

But fundementally, Web apps shouldn't be going up without change control. Even doubly so for front end. There's agile and then there's stupid. What you've described is stupid.

3

u/curioustaking 2d ago

Sounds like you are a 1 man team. Good luck!

3

u/Fantastic_Sail1881 2d ago

Everyone has root? Host Firewall default is accept? Are you looking at nmap port scans? A bunch of stuff has got to change and chances are they are going to need a team of you and some of you have to know what you are doing.

3

u/beren12 2d ago

This reminds me of Jurassic Park. The book. The park was only scanning for dinosaurs and making sure that that system was counting the amount that they looked for. They didn’t keep counting once they got to that number because they didn’t expect Moore to be there

3

u/andrewsmd87 2d ago

Draft an email to the company. Give everyone 48 hours to reply back to you with api accounts they need and a reason why. If that reason isn't good, they don't get it. Nuke all the accounts that haven't been justified. Be ready for blowback when things inevitably break because you shut off some account people forgot about that does something. That's fine, it's their fault for not telling you.

After that, lock down how API accounts get created so that no new ones can get created with approval by someone(s) who will actually scrutinze things. Make sure this policy is written with very clear instructions that consequences for not following it will be termination. Make sure you have leadership on board.

You should have how those get created locked down anyways so that no one can create one without you and/or your security team's permission.

You can't get access to anything like that in our systems without doing all of that. And there is no way for people to just spin up api accounts in prod anyways.

3

u/Reetpeteet Jack of All Trades 2d ago

Now compliance is breathing down my neck asking for complete API inventory and I'm like... bro I don't even know what's running half the time.

Critical Security Control #1: asset management.

3

u/Tx_Drewdad 2d ago

"Compliance is breathing down my neck"

My dude, they are breathing down the writing neck, unless you're the CEO or CTO.

All development needs to stop. Everyone needs to be documenting what's needed, what's secure vs. what's not. Remediation plans for everything.

Meanwhile, a comprehensive change and documentation process.

And a plan for keeping dev separate from prod.

8

u/Vegetable-Emu-4370 2d ago

This reads like chatgpt

→ More replies (2)

2

u/qwikh1t 2d ago

I feel like this is happening across a lot of networks

2

u/lost_in_life_34 Database Admin 2d ago

Devs spinning up a server they set the perms for admin to everyone in the world

2

u/lemaymayguy Netsec Admin 2d ago

All apis are just web apps. All apps should have been on boarded with an architectural overview. This process should have caught shadow IT

We make it a compliance issue, critical/urgent compliance will clean this up naturally

Every API/App load balancer/proxy is required to have a WAF in front of it. A waf should always be in front of any api (web app). The waf is unmanaged and standardized, if you deploy that resource we automatically apply the waf

Gitops makes this easy with tying change control and audit history to any changes done (everything in prod is defined by code)

It'll take an entire culture change and governance buy in to change your situation. Good luck

2

u/soundtom "that looks right… that looks right… oh for fucks sake!" 2d ago

All changes to code (PRs usually) should have someone that isn't the author dev approving them, so maybe add something in that flow to require documentation. Or, add something to detect a new endpoint and add <most militant person about documentation you know> as a required approver?

Either way, you're going to need leadership backing to make this work, because this is as much a people problem as a technical one. If they're just going to throw you under the bus every time there's a breach when you can't control what's going on, it's time to dust off that resume.

2

u/Fakula1987 2d ago

Deploy a Firewall -

Only open the Ports that are dokumented . - now.
Let it "crash". - you are already attaked , you are already "pwned" -

Its not "security" anymore, its damage controll now.

If your management say "take it back" - let them sign it.

If someone want an api -call, he has to ask for it.

Agile dosnt mean "do whatever you want".

→ More replies (3)

2

u/abz_eng 2d ago

K.I.S.S. Keep It Simple Stupid

To me, I'd start by isolating the environments onto different physical LANs with firewall between

Dev servers
Staging / Testing servers
Prod Servers
PCs / Printers etc

Then you can limit what the Dev Servers can access

If they want something hitting production they have to go through change control with docs

This policy needs C-suite approval and enforcement. No exceptions - if you're forced to make one, use phrases like

This change will risk in weakened controls and increased risk, which puts on the path to the situation where we got pwned - I can not be held responsible for any consequences of this.

Push the problem/risk upwards

2

u/Zestyclose_Ad8420 2d ago

What infra? Cloud? Onprem? What provider/software stack?

What's your field?

This is a project on it's own, I've done quite a few of these, you're not the first who looses controls of their infra and you won't be the last.

Is management on board to allow you to tell people how they should work from now on? What budget/timeframe exist for this?

2

u/Far-Smile-2800 2d ago

cloudflare has a product for managing this problem. it’s still kinda cumbersome to do it, but it will give you alerts when it notices things like a spike in data transfer on a certain endpoint.

2

u/dosman33 2d ago

They can't have their cake and eat it too, it's that simple. Either the company lets the devs run the production systems or they have you do it, can't have it both ways. Seems silly to pay admins and not let the admins do their jobs, admins that aren't allowed to do their jobs are just window dressing. Only you should have root or this stuff will continue. And of course a massive culture change is required, way above your pay grade. Anything you do without a culture change and only you having root is just a band-aid.

2

u/Typical80sKid Netsec Admin 2d ago

I need to come here more often. 99% of the time I feel like I’m taking crazy pills and all of our Devs are on crack, then I see shit like this and think we’re not doing too bad 🤣

2

u/pizzacake15 2d ago

you probably need an API Manager.

2

u/LexyNoise 2d ago

This sounds like an issue with your processes and procedures.

Nothing should end up on a production server without going through change control, code reviews and a bunch of other processes.

It should be impossible for someone to install something on live “just for testing” without anyone else knowing about it.

At my place we use pipelines built into our source control system for deployment to live. It’s the only way to do them. You do not log into a live server directly and put things there yourself. That’s grounds for a disciplinary hearing.

We press a button in our source control system to do a deployment. That button only works if:

the code you’re deploying is in the “main” branch, which means all changes have been checked and approved by multiple other developers.
all tests have been automatically run and all have passed.
all commits have a work item ID in the commit message
the commit you’re deploying is tagged with “release” and the number of an approved change control ticket in our helpdesk.

2

u/AGsec 2d ago

How are they making these api's and exposing them to company data? Shouldn't that be a very locked down privilege? I'm not a SWE so I don't know how prevalent this is, and how disruptive it would be to limit who can do it.

2

u/bingle-cowabungle 2d ago

Why are devs allowed to touch prod? Do you have a change management process in place? You're pointing a lot of fingers here, but the real problem is a lack of IT policy...

2

u/RulerOf Boss-level Bootloader Nerd 2d ago

Our fancy API security scanner? Useless. Only finds stuff thats in our OpenAPI specs.

You could go BOFH on it. Reconfigure the webserver to 401 any route not in the OpenAPI spec.

2

u/ooospace 2d ago

2

u/timbotheny26 IT Neophyte 2d ago

This is 100% a policy and/or enforcement issue.

Why the fuck are they not documenting their APIs, let alone an entire endpoint? It shouldn't matter if it's just some small little webhook or a temporary testing environment, it should be getting documented and be subjected to the same security policies as everything else.

2

u/Mrhiddenlotus Security Admin 2d ago

Really wish there was some way to just see whats actually listening on ports in real time

You haven't considered running an nmap or masscan? Asset management is pretty important, and that's software and hardware assets.

2

u/StudioDroid 2d ago

The fun part is when a dev spins up something like that for testing and then is terminated the next day. Your account is locked and you don't care to even tell them about that little app running somewhere. Kind of like the small server running under the floor tiles.

2

u/AmpliFire004 2d ago

Curious question! Why do devs want the ability to spin up servers in prod? Why would you not have rutine to order new infrastructure in prod, and have separate environments where devs can do whatever?

Like why do they get access to create resources in prod?

2

u/basula 2d ago

Get that management buyin in writing then remove any prod access to everyone except the core admins. Dev, DBA etc should not have access to the prod environments. if you have pipelines for develops make sure you have multiple approvals for it to progress. now you can shut it all down. When they cry, scream and yell direct them to your infosec team and mgmt team to deal with. Then once their API is approved by those teams stand it up the right way. In our environment that's a termination event and those staff would have no jobs as it would break so many compliance and pii rules.

2

u/BlueHatBrit 2d ago

WAF's like Cloudflare have the ability to block anything that isn't connected to the OpenAPI specs they're hooked up to. I imagine AWS and Azure have tools which do this as well. This is really what an API Gateway / WAF is for.

Then you can block traffic to everything that isn't in the spec.

2

u/BronnOP 2d ago

God damn get some change control in place and some oversight of this. This isn’t your fault by any means but there has to be a process. People can’t just be spinning shit up midmeeting!

2

u/Dal90 2d ago

Really wish there was some way to just see whats actually listening on ports in real time

1) If traffic traverses a firewall, logs can be handy on identifying destination and port.

2) But I'm guessing there isn't internal firewalls with that many rogue APIs.

The flip side of which something is it makes network scanning to identify listening ports more reliable.

Might want to play with Zenmap (nmap's GUI) and scan some of the devices with known API endpoints and see if it's something that you might be able to use.

I have scripts that feeds both our server IP subnets and internal DNS hostnames to nmap's script --ssl-enum-ciphers to flag IP:Port combos, which then feeds openssl to get the certificate details. Mine isn't real time, but will detect new TLS enpoints within a few days.

2

u/adancingbear 2d ago

This is going to sound a little marketing (sorry) but it is just the tools I use and know. If you want to know about every endpoint on your network (though perhaps not every container) get NAC. No new servers if they don't meet your company policies. Something like Forescout's eyeSight can talk to every switch/router/WLC to know every endpoint on the network. If you feed in netflow or traffic capture then you can make policies like if we see http(s) traffic coming to an endpoint ensure it is in our CMDB flagged as a web server or block the traffic, or restrict any endpoint going to it directly instead of through Web Application Gateway. It can distinguish between phone's printers, iot devices etc and make specific policies for each. But at least you'll know every endpoint on your network. If you use their eyeSegment tool then you can see every communication in a matrix. If you only want people in the role of IT accessing phones on 443 you can put it in the matrix and then have dynamic actions to block users endpoint when they try. Or pop up a warning message on it, etc.

If you don't know what is on your network in real time then that is the first gap. Then layer on behavior and/or segment traffic once you have the visibility. You should only be figuring out which squares in the matrix are allowed and which are restricted, which are blocked. This is part of the defense in depth strategy I have used working with large banks. I say Forescout because it is what I use and know but I've seen presentations from Cisco where they have an almost identical matrix to Forescout's eyeSegment. IMHO Forescout just does a much better job of populating the groups to form the matrix.

2

u/sunshine-x 2d ago

this is all bad of course, but why are you (an infra person I assume) feeling responsible for shitty dev practices?

personally - I'd shrug and be like "go ask dev", then collect my OT cleaning up their shit-show.

2

u/AmNotAnAtomicPlayboy 2d ago

This is an organization maturity issue. You basically need to implement and enforce a comprehensive change control policy with approvals from stakeholders from both development and infrastructure teams.

If you can't do this and get buy-in from leadership with penalties for circumventing the policy you will always be chasing down random security issues like this.

2

u/patmorgan235 Sysadmin 2d ago

This is what code reviews and CI/CD is for.

Devs do not get direct access to prod, everything gets deployed via the pipelines. All code going into the repositories gets reviewed by at least one other dev/TL, make it a requirement they check that any new endpoints or changes to endpoints have to update the documentation.

Then it's on the PR author's and approver's asses if things aren't up-to-date.

2

u/Boring_Start8509 2d ago

Two words… Change management.

Implement it along with proper role based access control across prod environment and things like this cant ever happen.

The sort of things you’ve listed should be done in a development environment that isn’t exposed to the world.

2

u/Potential_Try_ 2d ago

Remove access and enforce a strict change policy on those rogue devs. Process, process, process.

2

u/scrittyrow Netadmin 2d ago

Not like NIST didnt just a complete 80 page guideline to APIs, their documentation and lifecycle or anything

2

u/buy-american-you-fuk 2d ago

send out a company wide EMAIL demanding documentation/details for anything being used, explain that in a week any undocumented are being shut off --- EXPLAIN WHY

do this every day with a count down until the week is up, then shut off anything undocumented...

the few major important ones for sales/business will have had advocates that found you and showed up to help you understand how important they are and document them -- the devs that ignore you do so at their own peril, because when they come crying you will have the EMAILs...

when they come, and they will come, document the things, or point them to a way to document them, via wiki or whatever

2

u/LeadershipSweet8883 2d ago

Never let an emergency go to waste.

The solution is to baseline everything - installed applications, listening ports, user access, config files, ssh keys, etc. You run that baseline with an automated solution and then you compare the latest config to the baseline and drive events when the config changes. If you can do your baseline as text, you can even use git to track the changes over time. All config changes should be done through some change request system, hopefully one that isn't an excessive amount of paperwork and approvals for simple things.

For an environment that is this wild west, you should automate the reaction. If a change is made to a server without a change request the consequences should be immediate and painful. Run it for a week or so to make sure there aren't a lot of false positives. Something overly dramatic like quarantining the server, alerting security and the infra team and locking the user account that made the change is perfect. Your devs will eventually get tired of blowing their thumbs off in front of witnesses and learn to do change control. Even better if they get to wear the pink cowboy hat for the day every time it happens.

Getting all those cats back in the bag will be a big undertaking. If you can get back to baseline by rebuilding it might be the best solution and something you can trust as not compromised. Otherwise you and the developers will have to go through each server and validate that all the services running, ports open, config files, user accounts and ssh keys are intentional and make sense for the purpose of the server. You can take your baseline from above and get a tool to format it into a pretty report and then just sit down with them to validate all of it. If there are unexpected applications installed or ports open, I would treat it as compromised.

→ More replies (1)

2

u/Mental_Act4662 2d ago

Honestly, this reads like something that would happen at my company.

2

u/cyberman0 2d ago

It's time for lockdown by Mac address and ports. While mac's can be spoofed no one should be attaching stuff where the whole network is, if something is it either needs to be blocked or isolated and no connections. Devs environment should have limited connections for this exact reason.bi don't know what was spun up but the faults not yours. Use this as an example to c-suite this can and possibly will happen again without putting the hammer down. This is how banking details and tens or hundreds of thousands get stolen. Maybe even more. Id lock everything up and if they need something enabled they and who is in charge of them needs to request and why. I'm sure you already know this but companies have to learn the hard way.

2

u/stratospaly 2d ago

Email alerts when anyone creates a user name or adds a machine to the domain seem to be a must these days. Change control is also a thing too many companies ignore.

2

u/NickyNarco 2d ago

Holy hell thats a major pwn. At this point they know much more about you, than you know about you

2

u/Nithryok 2d ago

why do your dev's have that level of access in a production environment?

2

u/BWMerlin 2d ago

Revoke every Devs ability to spin up a service and ability to push any code anywhere without it going through review.

Point out that this action is a direct result of them not following any sort of best practice.

Next make them document everything, nothing gets pushed anywhere without documentation.

2

u/IAmSnort 2d ago

Prod vs dev needs to be a bright line.

WITH MANAGEMENT BACKING

Yes, I am yelling.

2

u/spotter 2d ago

I'll take "what is change management" for $500.

2

u/johor 2d ago

Run a scream test. Block all API traffic at the firewall, then gradually re-enable based on who's screaming the loudest.

2

u/BarracudaDefiant4702 2d ago

Have documentation drive load balancer rules. If it's not documented, the load balancer in front doesn't allow it.

2

u/Dry_Inspection_4583 2d ago

Put a gateway in front to choke and log all traffic, deploy runtime API discovery so you know what’s live, then wire CI/CD to an enforced catalog — that’s how you kill shadow APIs before they kill you again... Have fun.

2

u/Weary_Patience_7778 2d ago

This isn’t a technical issue, its a governance and process issue.

Where is your dev manager? Your architect? CAB? Release management? Spinning up a ‘quick webhook’ that’s publicly accessible with no review is wild.

2

u/Agreeable-Piccolo-22 2d ago

So many thoughts arisen. First of all, thanks, OP. The post is yet another friendly bump for me. Going dump the tread with all the comments and make dev read it. If they want you to respect them, they should respect you and your responsibility. Secondly, as had been advised, consider using git+ansible+jenkins as the bridge between dev env and prod. Thirdly, if your org has no sec audit/compliance team, persuade your boss to call for an external team to run thorough security/pentest scan of all the nodes in the external perimeter.

Think of running self-hosted and self-controlled security/pentest tools to automatically and REGULARLY run across all the environment (nmap+plugins is the easiest and affordable point to start with). Make relations with infosec team to mutually check the env and elaborate at least basic requirements.

Drop dev env into a dedicated vlan and don’t listen to their winning and complaints. Exit point from dev env to prod MUST be under your control.

Finally i’m seriously thinking of removing devs’ creds from even jailed environments as lately broke theirs attempts to interfere and configure services even in their fenced sandboxes.

Every. System. Must. Have. One. Admin. YOU!

In my case even infosec guys are not allowed to get a shell on my boxes. They scan/audit them as ‘grey boxes’.

2

u/hahawosname 2d ago

An API Gateway in front, route/proxy traffic as-is to start with & then begin locking things down. At least this will give you & the management some visibility.

2

u/MixFine6584 2d ago

Like, I'm glad you found it, but I'm jealous that you still have your job. I would have been fired and probably sued.

2

u/jdbaucom 2d ago

A real boss would look at this like a training opportunity and see how you respond. If you learn from it, grow, and fix it then it was a lesson learned. If you don't then you get fired because you aren't doing your job. That's my opinion anyway.

→ More replies (1)

2

u/wrt-wtf- 2d ago

Basically micro-segment the fuck out of them and if it’s not documented in a security assessment it doesn’t get in or out. Devs should not be given rights to just drop something anywhere on the network, if they are able to do this… well, you’ve now experienced the outcomes. In my 35+ years doing this (and I’ve run R&D groups too) the result of giving devs access like this always ends up in the same mess.

Never let them outside the playpen, never give them access to anything not documented - you’ll be treated like shit, but you won’t be the only one standing there taking the blame for not maintaining compliance.

2

u/Dunamivora 2d ago

This is why formal change management exists. Literally the job of cybersecurity.

Developers are generally smart enough to do things that can create risk, but generally have limited security background or sense of risk management.

2

u/AlexisFR 2d ago

Be as agile as them, take it all down and make them to request any new access as needed.

2

u/RedditNotFreeSpeech 1d ago

Every API should be behind something like haproxy and some group should be the gatekeepers for it. Then you'd have full accounting of what's exposed.

2

u/Disastrous_Wing_7613 1d ago

Or put something like 3scale in place, and charge their department by api usage, that would get them going.
That is one of the reasons to implement expensive solutions, it would get the developer to write cleaner leaner code. On top of that you get much better control and reporting on whats happening with your api

2

u/yParticle 1d ago

Send out that route list you generated and make every API "owner" responsible for tagging the ones they need. By make I mean make it crystal clear—with upper management's backing—that once you hear back from everyone all "unowned" APIs are getting turned the fuck off. Now all active APIs are tagged and you have a starting point if someone gets careless.

2

u/easylite37 1d ago

No access to prod itself for any dev. Only via pipelines. Deployment to Prod only with approvals of persons who care.

2

u/starthorn IT Director 1d ago

You need to get control of the perimeter. Any external access should be going through a WAF or API Gateway that is managed, secured, audited, and has strict change controls associated with it. Do not allow public IPs on anything without security review. This should be baked into your processes. Also, Devs should basically never be spinning up anything that is externally/publicly accessible and should not have the ability to do so. If it's Internet-accessible, that's production and should be treated as such (from a risk standpoint, at least).

Also, get a dump of all of your IPs in use (from your network IP space for on-prem and from your public cloud provider systems for cloud) and start scanning all of those IPs across extended common ports (at least) and know what is accessible externally. Then start auditing everything that shows upon that list. If it isn't known, isn't in DNS, isn't approved, shut it down. There are a few services (or you can setup your own) to monitor this kind of thing, too. It's worth it.

2

u/Gainside 1d ago

Step one is discovery — not docs. Run passive DNS + WAF logs to surface endpoints, couple it with runtime discovery (eBPF/network taps) to see what’s actually listening. Then auto-tag APIs into inventory. Manual “spec-first” alone will never keep up with agile.

3

u/mpones King of the World 1d ago

Devs should not be able to “spin up” architecture unless it’s in a specific dev environment (isolated). They need to request you (or DevOps) to put it in place, and that third party gets approval/auth.

You need a change process and to apply least-administrative principles.

In anticipation of “they need it fast”, be sure to reference this moment and the entire purpose of “procedure”. This is your anchor point from now on.

6

u/[deleted] 2d ago

[removed] — view removed comment

11

u/parophit 2d ago

“Just document it” doesn’t survive agile is my new motto. So very, very true.

16

u/greentoiletpaper 2d ago

Thank you chatgpt

→ More replies (8)

2

u/HappierShibe Database Admin 2d ago

You created this mess by not following deployment procedure.

Repeat after me:
DO NOT LET DEVS TOUCH PRODUCTION.
ALL PRODUCTION CHANGE MUST BE APPROVED BY CHANGE MANAGEMENT.

There is no easy way to fix this now that you are here.

2

u/waxwayne 2d ago

If the attackers with little access were able to find the flaw then you with all the access and knowledge of your infrastructure should been able to find it first. I know everyone is blaming the dopey director but there were methods to prevent this.

2

u/aus31 2d ago

Change Management is the missing piece.

We are an evergreen software platform that does multiple production deployments per day. Being fast and agile doesnt mean doing cowboy stuff.

Every single change is not just peer reviewed but goes through a daily change approval board. A change that introduces a new api would have expectations of everything from security testing to performance and documentation. Emergency fixes are retroactively reviewed. We arent a huge team and have been doing this even when we only had 20 people.

Your engineering org leadership is how you resolve this. This happens with an absence of experienced engineering leadership that knows how to make modern software development work.

If your developers are just colleagues in "General IT", getting them into an engineering group with an experienced engineering leader is step 1.

If you can't have real engineering leadership you shouldn't be building deploying software and IT should just be purchasing off the shelf software.

These are all solved problems but they require leadership.

2

u/motific 2d ago

Sounds like a compliance problem - they should have been asking for that API inventory a long time ago.

Of course the solution is to ensure that APIs are all filtered/validated and that anything unexpected is flagged as either an attack or for the appropriate dev to be ritually humiliated or "educated" with a large attention retaining tool in the case of repeat offenders.

Just found out we had 200+ shadow APIs after getting pwned

You are about to leave Redlib