Hacker Newsnew | past | comments | ask | show | jobs | submit | niwrad's commentslogin

This is a good question -- I've wanted transparency from image models for a while. One work around is to ask for a "green screen" and to key out the background but it doesn't always work very cleanly.


> One work around is to ask for a "green screen" and to key out the background but it doesn't always work very cleanly.

I recently tried that and the model (not nano pro) added the green background as a gradient.


I've also added a text-to-image versions of the same test here: https://www.cinemodels.ai/benchmark?test=wink&type=text-to-v...

Prompt: "A white female in their 30s winks at the camera with her right eye.

She is standing in a quiet dark green forest, backlit by the sunset during golden hour."


That’s really cool — it makes me happy to hear from someone that appreciates the novelty and feat of his experiment just through perceptual matching! The precision he achieved with such limited tools is honestly mind-blowing.

What are you developing a spectral CMS for? Is it for lighting or materials or something imaging related?


It really is mind-blowing!

The CMS I'm building is for creative imaging—I want to model some specific subtractive processes from the physical world digitally. In the process, I had to come up with a way to convert RGB to spectral power, and was surprised to find that while there's a Spectral -> XYZ function (CMF), there's no industry standard inverse (XYZ -> Spectral). So I came up with a function which allows you to specify three RGB peaks, along with their widths, and it solves for the correct peak levels to reproduce an XYZ equivalent to the input color. This is my idea of fun! haha


Absolutely — these old papers are fascinating. I was also surprised by how much insight is packed into them, especially considering how non-trivial some of it is to unpack. My professor and I spent quite a bit of time reconstructing how the experiment worked from the limited figures.


Thank you!

Maxwell's "color wheel" experiment (https://cudl.lib.cam.ac.uk/view/PH-CAVENDISH-P-02000/1) is more commonly known, which preceded this experiment I wrote about. (Which by the way is also a very clever experiment which blends color by spinning a wheel with different ratios of primary colors).

Here's a link to Maxwell's original paper: https://royalsocietypublishing.org/doi/10.1098/rstl.1860.000...

You can see some diagrams of his original apparatus that I worked off of in the last two pages!


An audience-driven GenAI rom-com w/ Daily Episodes.

How We Met – https://how-we-met.c47.studio/

Each day, I create a new 30-second episode based on the plot direction voted on by the audience the day before.

I'm trying to see how far the latest Video GenAI can go with narrative content, especially episodics. I'm also curious what community-driven narratives look like!

For the past week, I've been tinkering mostly with Runway, Midjourney, and Suno for the video content. My co-creator vibe coded the platform on Lovable.


Thank you for the encouraging words! I’m glad you enjoyed it.


I was genuinely confused when I saw the difficulty in getting these IC cards in July. I’m even more confused to see them completely stop the sales all together.

These cards are ubiquitous in Japan so I’m very confused how this can happen.

Can someone that has some understanding of semiconductor supply chain explain what’s going on here?

The chips inside these cards surely don’t seem like the high-end chips that the AI crowds are going around.


Felica chips were made by Panasonic, but Panasonic sold their semiconductor division to Nuvoton in 2020. So perhaps Nuvoton changed something. General news tends to say that it's "semiconductor shortage", as if there's only one "semiconductor".


I feel that Midjourney v5 really lets you explore different worlds.

One recent feature the guide missed is the permutation and repeat features [1]. They're quite helpful for power users that want to explore multiple styles quickly.

Last week I tried putting together a short film using GPT-4 and Midjourney v5. I was stunned by the cinematic frames Midjourney v5 was able to create:

https://youtu.be/6O_tOuUcG9s

I (human) wrote the prompts for Midjourney, though.

[1] https://docs.midjourney.com/docs/permutations


Damn. It's no Harry Potter by Balenciaga, but it's surprisingly compelling given that most of it was generated (granted, with prompting) by AI tools. I notice you credited GPT-4, Midjourney, and Metavoice, but was the music AI generated as well?

I've gotta say, I've seen worse storytelling and cinematography come from actual, serious humans who were getting paid to do it. And the Balenciaga thing is obviously a joke, because that whole genre only works because the style of those videos is beyond parody and sails right over the uncanny valley. This is different. This is interesting. I like it.


Thanks for your kind comments!

I'm glad you pointed out the music. The music was also AI generated with a tool called AIVA [1]. I'd never composed a piece of music before, and I was pretty surprised by what I could "create". I spent 30~60 minutes max creating the score.

Some parts of their product still feel janky, but as an overall concept, it's quite fascinating. One of the interactions I enjoyed was that AIVA creates scores with different tracks (layers). So I was able to edit tracks I don't like (e.g., change a Piano track to Brass) or have AIVA completely regenerate certain sections of the score (e.g., redo the bridge, regenerate the chorus sections).

One difference from Midjourney is that there's no text-based prompting. Instead, you "prompt" through music inspiration.

[1] https://www.aiva.ai/


Ah, cool!

So, basically, if I want it to compose Baroque music, I can give it Vivaldi, Bach, and maybe a little early classical, tell it to go to work, and end up with something that sounds like it came out in 1765?

I wonder what the limitations are on that whole "musical prompting" deal.

Your video was better in my opinion because it has a real story. All the Balenciaga videos out there are really just realistically rendered parodies with little or no emotion to them.


Belenciaga piece is hilarious.


It is, really. There are a bunch of imitators out there now, but they're less funny to me. I don't think it's because they're generally less well done. While most of them that I've seen are less well done, in the sense that the voices are more "computer-y" sounding, and the rendering not as good, I think it's because the original really is a parody, and doing a parody of a parody just starts getting more ridiculous without getting funnier.


Really nice video! The person's hand with 2 thumbs at 0:47 can really tell it was created using Midjourney. It usually does fingers very wrong! lol


nightmare fuel indeed


Impressive. Thanks for sharing


Curious, did all your work necessitate subscribing to the $30 plan?


I ended up subscribing to the $60 plan, mainly to get access to the Stealth Mode. I used about ~4h of fast time during the project. With that said, I could have created this with the $10 plan (3.3 hours) if I had to.


Why do you need stealth mode?


wow I had no idea about permutations in Midjourney and it's an amazing feature! thank you very much!!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: