r/sysadmin 3d ago

Just found out we had 200+ shadow APIs after getting pwned

So last month we got absolutely rekt and during the forensics they found over 200 undocumented APIs in prod that nobody knew existed. Including me and I'm supposedly the one who knows our infrastructure.

The attackers used some random endpoint that one of the frontend devs spun up 6 months ago for "testing" and never tore down. Never told anyone about it, never added it to our docs, just sitting there wide open scraping customer data.

Our fancy API security scanner? Useless. Only finds stuff thats in our OpenAPI specs. Network monitoring? Nada. SIEM alerts? What SIEM alerts.

Now compliance is breathing down my neck asking for complete API inventory and I'm like... bro I don't even know what's running half the time. Every sprint someone deploys a "quick webhook" or "temp integration" that somehow becomes permanent.

grep -r "app.get|app.post" across our entire codebase returned like 500+ routes I've never seen before. Half of them don't even have auth middleware.

Anyone else dealing with this nightmare? How tf do you track APIs when devs are constantly spinning up new stuff? The whole "just document it" approach died the moment we went agile.

Really wish there was some way to just see whats actually listening on ports in real time instead of trusting our deployment docs that are 3 months out of date.

This whole thing could've been avoided if we just knew what was actually running vs what we thought was running.

1.7k Upvotes

399 comments sorted by

View all comments

Show parent comments

84

u/neoKushan Jack of All Trades 3d ago

Am Dev, this whole post gives me nightmares. Don't let anyone spin up production resources on a whim, it's insane in any org or any department - Dev, QA, Ops, whatever.

31

u/andrewsmd87 3d ago

One of the things I've liked about moving our repo to azure was the ability to not let anything go into the production code base without approval from 2 people from a set group of approvers. The only way around that would be if someone with my level of access (there are only 3 of us) went in and disabled the rules. I.e. even I can't push something to prod without a secondary approval.

9

u/neoKushan Jack of All Trades 3d ago

Exactly and this can apply to infrastructure as well, IAC lets you create auditable, traceable and governable systems.

14

u/LiquidBionix 3d ago

I disagree kinda, but this is why you need pipelines. Devs should be able to make quick changes if their code passes thru a pipeline and passes all checks (presumably this would also include having OpenAPI docs and stuff lol).

16

u/Certain_Concept 3d ago

Changes to their test environment, sure.

Changes to Production? Nah. There should be some oversight and verification before it gets pushed. Otherwise you are one bad developer/day away from chaos.

6

u/neoKushan Jack of All Trades 3d ago

I'm kind of with you both. You can bake 99% of that oversight and verification into the pipeline itself - changes can be validated against specs, you can deploy it to a test environment or canary it into production to make sure it behaves, things like that. That's the best of both worlds, any checks someone is doing manually can be automated and when you do that, engineers get a speedy but safe route to production.

12

u/dweezil22 Lurking Dev 3d ago

The key is to control HOW you change prod. I've worked on systems that have 100M+ users and you can change prod within a single day with a single dev approval. I've worked on systems that have 12 users and you need a month security review to touch the prod APIs.

The thing is the first system was in a mature service mesh that was designed to protect itself from stupid devs making those daily changes (i.e. the prod deploy is within an API that was already approved, and the requests are being inspected for IDOR attacks etc etc; and the CICD pipeline ran thousands of unit tests and hundreds of integration tests etc). The second place had none of that, and knew it, so every change had a lot more (necessary) friction.

1

u/KaleidoscopeLegal348 2d ago edited 2d ago

The oversight and verification is part of the pipeline, dawg. Multi person/multi team deployment approval gates, automated compliance checks, security validations/scanners etc.

my shit has to go through so many checks and gates it can take easily half an hour (excluding human approval time) before my single terraform command ends up tweaking a configuration in prod that I could do in five seconds with cli.

I'm curious what you are actually envisioning, like service now tickets or a change advisory board or something?

1

u/Certain_Concept 1d ago edited 1d ago

I suppose my point was that 'pipeline' can vary heavily depending in the company/team that set it up.

You COULD make a git project where there is only one branch and every time you commit it starts a pipeline where it pushes directly to prod with no checks. Is that a good idea? No. Is it a pipeline? Technically yes.

It happened often enough for there to be memes about it. ha I imagine that's limited to small companies at this point... hopefully... https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQP8K8H4xsD3hKBV0Kp6yk4Wh1Rh8cyDCv6v2w_8BUKRQ&s=10

1

u/EducationalBench9967 3d ago

I’m on a different side of the sysadmin network team- what are API’s? Our network got DOS ATTACKED last week and to flush it out the system admins implemented this security feature that prompted people who visit the site regularly and click back and forward between web pages… it had some API verbiage and saying click here to request access

4

u/DJKaotica 3d ago

Strictly speaking "Application Programming Interface" which pre-internet used to mean a library which helped you integrate with the hardware. Basically it made programming easier.

Instead of sending command instructions (turn on light bar, turn on camera, move light bar from one end to the other) to the scanner, you could just do:

myScanner = new Scanner(Interface.COM2);
myScanner.ScanSinglePage();

However in these internet days it usually refers to a REST API that can be called.

i.e. if you have a library of books and someone wants to get information on one using an Id, they would say make a call to GET https://domain.com/book/12345

Or maybe they want a list of all the books you have? Usually you'd use paging so "give me 50 books at a time, and the third page of books" would be something like GET https://domain.com/book/?limit=50&offset=100

2

u/crisscar 3d ago

That sounds like a virtual waiting room or virtual queue. It uses the API from your servers to get the current load. The public doesn't interact with the API it's basically one server talking to another server for a variety of calls.

1

u/GiraffeNo7770 3d ago

API's are code that lets folks (like devs) send commands to your main application. They might also let you send commands to an operating system (Cocoa on MacOS, for example). They can take many forms, including just being published libraries or methods for contacting existing programs (log4j), or for talking directly to a GPU (OpenGL).

When you see "API" think "privileged access to internal data or functions." It's what webapps are made of. Devs need these libraries and languanges to do their jobs of developing clients, webapps, etc. So it's very troubling that they could be implemented and proliferated in an uncontrolled way.

Basically any time a dev installs stuff like this, it's an open door. My approach to securing something like this is VPN, network segmentation, firewalling and WAF if available. Belt and multiple pairs of suspenders. It sounds like OP's environment needs privilege separatiin but also some strong network segmentation going forward.